Code Graph Knowledge System

README_Neo4j.md•5.88 KiB

# Neo4j GraphRAG Technical Documentation Advanced technical documentation for the Neo4j-based GraphRAG implementation in the Code Graph Knowledge System. ## Architecture Overview ### Unified Storage Architecture - **Single Database**: Uses Neo4j 5.x built-in vector index, eliminating the need for additional vector databases - **Data Consistency**: Text, graph structure, and vectors stored in the same database - **Simplified Operations**: Only requires maintaining a single Neo4j instance ### Modern Technology Stack - **LlamaIndex**: Official GraphRAG framework recommended by LlamaIndex - **Neo4j**: World-leading graph database with built-in vector search - **Ollama**: Local LLM and embedding model services - **FastAPI**: High-performance async web framework ### Powerful Query Capabilities - **Hybrid Search**: Simultaneous graph traversal and vector similarity search - **Multi-mode Queries**: Support for pure graph, pure vector, and hybrid queries - **Intelligent Retrieval**: Automatic selection of optimal retrieval strategies ## System Requirements ### Required Services - **Neo4j 5.x**: Version with vector index support - **Ollama**: Local LLM service - **Python 3.13+**: Runtime environment ### Recommended Configuration ```bash # Neo4j Neo4j 5.15+ (Community or Enterprise) Memory: 4GB+ Storage: SSD recommended # Ollama Models LLM: llama3.2, mistral, qwen Embedding: nomic-embed-text, all-minilm ``` ## Installation and Configuration ### 1. Install Dependencies ```bash pip install -r requirements.txt ``` ### 2. Start Neo4j ```bash # Using Docker docker run \ --name neo4j-code-graph \ -p 7474:7474 -p 7687:7687 \ -d \ -v $HOME/neo4j/data:/data \ -v $HOME/neo4j/logs:/logs \ -v $HOME/neo4j/import:/var/lib/neo4j/import \ -v $HOME/neo4j/plugins:/plugins \ --env NEO4J_AUTH=neo4j/password \ neo4j:5.15 ``` ### 3. Start Ollama ```bash # Install Ollama curl -fsSL https://ollama.ai/install.sh | sh # Download models ollama pull llama3.2 ollama pull nomic-embed-text ``` ### 4. Configure Environment Variables ```bash # .env file NEO4J_URI=bolt://localhost:7687 NEO4J_USER=neo4j NEO4J_PASSWORD=password OLLAMA_BASE_URL=http://localhost:11434 OLLAMA_MODEL=llama3.2 OLLAMA_EMBEDDING_MODEL=nomic-embed-text ``` ## Query Modes ### Hybrid Mode (hybrid) ```json { "question": "What are the characteristics of Python?", "mode": "hybrid" } ``` Uses both graph traversal and vector search for comprehensive answers. ### Vector-Only Mode (vector_only) ```json { "question": "Programming language features", "mode": "vector_only" } ``` Based on semantic similarity search, suitable for conceptual queries. ### Graph-Only Mode (graph_only) ```json { "question": "Python's relationship with other languages", "mode": "graph_only" } ``` Based on graph structure traversal, suitable for relationship queries. ## Performance Optimization ### Neo4j Optimization ```cypher -- Create vector index CREATE VECTOR INDEX knowledge_vectors FOR (n:Document) ON (n.embedding) OPTIONS {indexConfig: { `vector.dimensions`: 768, `vector.similarity_function`: 'cosine' }} -- Create text index CREATE FULLTEXT INDEX document_text FOR (n:Document) ON EACH [n.text, n.title] ``` ### Query Optimization - Use appropriate `top_k` values - Set reasonable `chunk_size` - Enable query caching - Monitor query performance ## Configuration Options ### Neo4j Configuration ```python # Vector index configuration vector_index_name = "knowledge_vectors" vector_dimension = 768 # Adjust based on embedding model ``` ### LlamaIndex Configuration ```python # Document processing chunk_size = 1000 chunk_overlap = 200 # Query configuration similarity_top_k = 10 response_mode = "tree_summarize" ``` ### Ollama Configuration ```python # LLM model ollama_model = "llama3.2" temperature = 0.1 # Embedding model embedding_model = "nomic-embed-text" ``` ## Troubleshooting ### Common Issues #### Neo4j Connection Failed ```bash # Check Neo4j status docker logs neo4j-code-graph # Verify connection curl http://localhost:7474 ``` #### Ollama Model Not Found ```bash # List installed models ollama list # Download missing model ollama pull nomic-embed-text ``` #### Vector Index Error ```cypher // Check index status SHOW INDEXES // Rebuild index DROP INDEX knowledge_vectors IF EXISTS; CREATE VECTOR INDEX knowledge_vectors ... ``` ## Development Guide ### Adding New Document Types ```python # Extend document processor class CustomDocumentProcessor: def process(self, content: str) -> Document: # Custom processing logic return Document(text=content, metadata={...}) ``` ### Custom Query Strategies ```python # Implement custom retriever class CustomRetriever: def retrieve(self, query: str) -> List[Node]: # Custom retrieval logic return nodes ``` ## Architecture Diagram ``` Documents → LlamaIndex → Neo4j (Vector + Graph) Query → Single Cypher Query → Unified Results ``` ## Core Features ### Intelligent Document Processing - Automatic document chunking - Entity relationship extraction - Vector embedding generation - Graph structure construction ### Efficient Query Engine - Hybrid retrieval strategies - Context-aware responses - Multi-hop graph traversal - Semantic similarity matching ### Flexible Extensibility - Support for multiple document formats - Configurable embedding models - Custom query strategies - Plugin architecture ## Testing and Validation ### Run Tests ```bash python test_neo4j_knowledge.py ``` ### Test Coverage - Service initialization - Document addition and indexing - Multi-mode queries - Vector similarity search - Graph structure queries - File upload processing ## License MIT License - see LICENSE file for details ## Acknowledgments - [Neo4j](https://neo4j.com/) - Graph database technology - [LlamaIndex](https://www.llamaindex.ai/) - RAG framework - [Ollama](https://ollama.ai/) - Local LLM service --- **Modern GraphRAG, starting from Neo4j!** 🚀

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/royisme/codebase-rag'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

README_Neo4j.md•5.88 KiB