Skip to main content
Glama
royisme

Code Graph Knowledge System

by royisme

Code Graph Knowledge System

Enterprise knowledge management platform with Neo4j graph database, multi-interface architecture (MCP/Web/REST), and intelligent code analysis for modern software development teams.

Overview

Code Graph Knowledge System is a production-ready platform that transforms code repositories and development documentation into a queryable knowledge graph. Built on Neo4j's graph database technology and powered by large language models, the system provides three distinct interfaces for different use cases: MCP protocol for AI assistants, Web UI for human users, and REST API for programmatic access.

The platform combines vector search, graph traversal, and LLM-driven analysis to deliver intelligent code intelligence capabilities including repository analysis, dependency mapping, impact assessment, and automated documentation generation.

Core Capabilities

Multi-Interface Architecture

MCP Protocol (Port 8000) - Model Context Protocol server for AI assistant integration

  • Direct integration with Claude Desktop, Cursor, and other MCP-compatible tools

  • 25+ specialized tools for code analysis and knowledge management

  • Real-time task monitoring via Server-Sent Events

  • Supports stdio and SSE transport modes

Web UI (Port 8080) - Browser-based interface for team collaboration

  • Real-time task monitoring dashboard

  • Repository ingestion and management

  • Metrics visualization with interactive charts

  • Built with React 18, TypeScript, and shadcn/ui components

REST API (Ports 8000, 8080) - HTTP endpoints for system integration

  • Document ingestion and knowledge querying

  • Task management and monitoring

  • Prometheus metrics export

  • OpenAPI/Swagger documentation

Knowledge Graph Engine

Code Intelligence - Graph-based code analysis without requiring LLMs

  • Repository structure mapping and dependency tracking

  • Function and class relationship analysis

  • Impact analysis for code changes

  • Context pack generation for AI assistants

  • Support for 15+ programming languages

Memory Store - Project knowledge tracking with temporal awareness

  • Fact, decision, pattern, and insight recording

  • Memory evolution with superseding relationships

  • Automatic extraction from conversations, commits, and code

  • Vector search with embedding-based retrieval

Knowledge RAG - Document processing with hybrid search

  • Multi-format document ingestion (Markdown, PDF, code files)

  • Neo4j native vector indexing

  • Hybrid search combining vector similarity and graph traversal

  • Configurable chunking and embedding strategies

SQL Schema Parser - Database schema analysis with business domain classification

  • Multi-dialect support (Oracle, MySQL, PostgreSQL, SQL Server)

  • Configurable business domain templates (Insurance, E-commerce, Banking, Healthcare)

  • Automated relationship detection and documentation generation

  • Integration with knowledge graph for cross-referencing

Technology Stack

Backend Infrastructure

  • FastAPI - High-performance async web framework

  • Neo4j 5.x - Graph database with native vector indexing

  • Python 3.13+ - Modern Python with type hints

  • Uvicorn - ASGI server with WebSocket support

AI and ML Integration

  • LlamaIndex - Document processing and retrieval pipeline

  • Multiple LLM providers (Ollama, OpenAI, Gemini, OpenRouter)

  • Flexible embedding models (HuggingFace, Ollama, OpenAI)

  • Model Context Protocol (MCP) for AI assistant integration

Frontend Technology

  • React 18 - Modern UI library with concurrent features

  • TypeScript - Type-safe development

  • TanStack Router - Type-safe routing

  • shadcn/ui - Accessible component library

  • Vite - Fast build tooling

Quick Start

Prerequisites

  • Python 3.13 or higher

  • Neo4j 5.0 or higher

  • Docker (optional, for containerized deployment)

  • Node.js 18+ (for frontend development)

Querying Knowledge

# Query the knowledge base response = httpx.post("http://localhost:8000/api/v1/knowledge/query", json={ "question": "How does the authentication system work?", "mode": "hybrid", # or "graph_only", "vector_only" "use_tools": False, "top_k": 5 }) # Search similar documents response = httpx.post("http://localhost:8000/api/v1/knowledge/search", json={ "query": "user authentication", "top_k": 10 })

Installation

Clone the repository and install dependencies:

git clone https://github.com/royisme/codebase-rag.git cd codebase-rag pip install -r requirements.txt # or using uv (recommended) uv pip install -e .

Configure environment variables:

cp env.example .env # Edit .env with your Neo4j credentials and LLM provider settings

Start Neo4j database:

docker run --name neo4j-code-graph \ -p 7474:7474 -p 7687:7687 \ -e NEO4J_AUTH=neo4j/password \ -e NEO4J_PLUGINS='["apoc"]' \ neo4j:5.15

Running the System

Complete System (MCP + Web UI + REST API)

python start.py

Access points:

  • MCP SSE Service: http://localhost:8000/sse

  • Web UI: http://localhost:8080

  • REST API Documentation: http://localhost:8080/docs

  • Prometheus Metrics: http://localhost:8080/metrics

MCP Server Only

python start_mcp.py

Docker Deployment

Three deployment modes available:

Minimal Mode - Code Graph only (no LLM required)

make docker-minimal

Standard Mode - Code Graph + Memory Store (embedding model required)

make docker-standard

Full Mode - All features (LLM + embedding required)

make docker-full

Usage Examples

MCP Integration

Configure in Claude Desktop or compatible MCP client:

{ "mcpServers": { "code-graph": { "command": "python", "args": ["/path/to/start_mcp.py"], "cwd": "/path/to/codebase-rag" } } }

Available MCP tools include:

  • code_graph_ingest_repo - Ingest code repository

  • code_graph_related - Find related code elements

  • code_graph_impact - Analyze change impact

  • query_knowledge - Query knowledge base

  • add_memory - Store project knowledge

  • extract_from_conversation - Extract insights from chat

  • watch_task - Monitor task progress

REST API

Ingest a repository:

curl -X POST http://localhost:8080/api/v1/repositories/ingest \ -H "Content-Type: application/json" \ -d '{ "url": "https://github.com/user/repo.git", "mode": "incremental", "languages": ["python", "typescript"] }'

Query knowledge base:

curl -X POST http://localhost:8080/api/v1/knowledge/query \ -H "Content-Type: application/json" \ -d '{ "question": "How does authentication work in this codebase?", "mode": "hybrid", "top_k": 5 }'

Monitor tasks:

curl http://localhost:8080/api/v1/tasks?status=processing

Web UI

Navigate to http://localhost:8080 to access:

  • Dashboard - System health and quick actions

  • Tasks - Real-time task monitoring with progress indicators

  • Repositories - Repository management and ingestion

  • Metrics - System performance and usage metrics

Configuration

Key environment variables:

# Server Ports MCP_PORT=8000 # MCP SSE service WEB_UI_PORT=8080 # Web UI and REST API # Neo4j Configuration NEO4J_URI=bolt://localhost:7687 NEO4J_USER=neo4j NEO4J_PASSWORD=password NEO4J_DATABASE=neo4j # LLM Provider (ollama, openai, gemini, openrouter) LLM_PROVIDER=ollama OLLAMA_HOST=http://localhost:11434 OLLAMA_MODEL=llama3.2 # Embedding Provider (ollama, openai, gemini, openrouter) EMBEDDING_PROVIDER=ollama OLLAMA_EMBEDDING_MODEL=nomic-embed-text # Processing Configuration CHUNK_SIZE=512 CHUNK_OVERLAP=50 TOP_K=5 VECTOR_DIMENSION=384

For complete configuration options, see Configuration Guide.

Architecture

Dual-Server Design

The system employs a dual-server architecture optimized for different access patterns:

Port 8000 (Primary) - MCP SSE Service

  • Server-Sent Events endpoint for real-time communication

  • Optimized for AI assistant integration

  • Handles long-running task monitoring

  • WebSocket support for bidirectional communication

Port 8080 (Secondary) - Web UI + REST API

  • React-based monitoring interface

  • RESTful API for external integrations

  • Prometheus metrics endpoint

  • Static file serving for frontend

Both servers share the same backend services and Neo4j database, ensuring consistency across all interfaces.

Component Architecture

┌─────────────────────────────────────────────────────────┐ │ Client Interfaces │ ├──────────────┬──────────────┬──────────────────────────┤ │ MCP Client │ Web UI │ REST API │ │ (AI Tools) │ (Browser) │ (External Systems) │ └──────┬───────┴──────┬───────┴──────────┬───────────────┘ │ │ │ └──────────────┼──────────────────┘ │ ┌──────────────▼──────────────┐ │ FastAPI Application │ ├──────────────┬──────────────┤ │ Services │ Task Queue │ └──────┬───────┴──────┬───────┘ │ │ ┌──────▼──────┐ ┌───▼────┐ │ Neo4j │ │ LLM │ │ Database │ │Provider│ └─────────────┘ └────────┘

Development

Project Structure

codebase-rag/ ├── src/codebase_rag/ │ ├── api/ # FastAPI routes │ ├── core/ # Application core │ ├── services/ # Business logic │ │ ├── code_ingestor.py # Code repository processing │ │ ├── graph_service.py # Graph operations │ │ ├── memory_store.py # Project memory management │ │ ├── neo4j_knowledge_service.py # Knowledge base │ │ ├── task_queue.py # Async task processing │ │ └── sql/ # SQL parsing services │ └── mcp/ # MCP protocol handlers ├── frontend/ # React Web UI │ ├── src/ │ │ ├── components/ # UI components │ │ ├── routes/ # Page routes │ │ └── lib/ # API client │ └── package.json ├── tests/ # Test suite ├── docs/ # Documentation └── scripts/ # Utility scripts

Running Tests

# Backend tests pytest tests/ -v # Frontend tests cd frontend && npm test # Integration tests (requires Neo4j) pytest tests/ -m integration # Coverage report pytest tests/ --cov=src --cov-report=html

Code Quality

# Format code black . isort . # Linting ruff check . ruff check . --fix # Type checking mypy src/

Frontend Development

cd frontend npm install npm run dev # Start dev server at http://localhost:3000 npm run build # Build for production npm run lint # Check for issues npm test # Run tests

Deployment

Production Deployment

See Docker Deployment Guide for production deployment configurations including:

  • Multi-stage Docker builds

  • Environment-specific configurations

  • Scaling and load balancing

  • Security best practices

  • Monitoring and logging setup

System Requirements

Minimum Configuration

  • CPU: 2 cores

  • RAM: 4 GB

  • Storage: 10 GB

Recommended Configuration

  • CPU: 4+ cores

  • RAM: 8+ GB

  • Storage: 50+ GB SSD

  • Network: 100 Mbps+

Documentation

Complete documentation available at https://vantagecraft.dev/docs/code-graph

Key Documentation Sections

Community and Support

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built with excellent open source technologies:

-
security - not tested
F
license - not found
-
quality - not tested

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/royisme/codebase-rag'

If you have feedback or need assistance with the MCP directory API, please join our Discord server