GraphRAG Llama Index MCP Server

README.md•8.95 KiB

# GraphRAG LlamaIndex A decoupled GraphRAG implementation optimized for local indexing and lightweight cloud querying. ## Two-Image Architecture This project is split into two specialized Docker environments to optimize build times and deployment footprint: 1. **`graphrag-indexer` (Image A)**: - **Purpose**: Local-only heavy processing (indexing, embedding, graph building). - **Contains**: PyTorch, GLiNER, Graspologic, and all ML dependencies. 2. **`graphrag-query` (Image B)**: - **Purpose**: Lightweight cloud-ready query engine (~1-2GB). - **Contains**: DuckDB, MCP server, and search retrieval. **NO PyTorch/ML overhead**. ## Setup ### 1. Clone the repo `git clone https://github.com/T-NhanNguyen/graphRAG-LlamaIndex.git` ### 2. Copy .env.example to .env `cp .env.example .env` ### 3. Build the Images ```bash # Build both images docker compose build # Or build individually docker build -t graphrag-indexer -f Dockerfile.indexer . docker build -t graphrag-query -f Dockerfile.query . ``` ### 4. Edit .env and set your data directory > GRAPHRAG_DATA_DIR=/path/to/your/documents (You can copy and paste the windows address directly) ### 4. Shell Alias Setup (Optional but Recommended for ease of use) To simplify command usage, load the appropriate alias file for your shell: 1. Edit the `graphrag-alias.sh` and `graphrag-alias.ps1` to replace the placeholder path with your correct path to this repo 2. Copy the alias over to the bottom of your bashrc and $PROFILE 3. source `.graphrag-alias.sh` in your linux wsl, and `.graphrag-alias.ps1` in your powershell **For WSL/Bash:** ```bash # Append the source command to the end of .bashrc # (Replace the path with your actual WSL project path) echo "source /graphRAG-LlamaIndex/.graphrag-alias.sh" >> ~/.bashrc # Source it source ~/.bashrc ``` **For PowerShell:** ```powershell # 1. Create the profile file (and its folder if missing) New-Item -Path $PROFILE -Type File -Force # 2. Open it for editing code $PROFILE # 3. copy the alias to the bottom of your $PROFILE . "E:\ai-workspace\projects\graphRAG-LlamaIndex\.graphrag-alias.ps1" # 4. Restart terminal ``` Else run `. E:\ai-workspace\projects\graphRAG-LlamaIndex\.graphrag-alias.ps1` This enables you to use `graphrag <command>` instead of the full `docker compose run --rm graphrag python graphrag_cli.py <command>`. Because you need this repo around to use it, doing it like this is easier to manage. **Important Note for WSL Users:** - The alias automatically sets `GRAPHRAG_REGISTRY_DIR` to your Windows user profile's .graphrag folder. - This ensures WSL uses the same registry as PowerShell (your Windows user profile) - Without this, WSL would create a separate registry in `/home/<username>/.graphrag`, causing a "split-brain" issue ### 5. Create a database ``` docker compose run --rm graphrag python graphrag_cli.py start my-docs \ --input /app/data/<subfolder> ``` Your .env settings `GRAPHRAG_DATA_DIR=E:/ai-workspace/analysis-docs` maps to docker as `/app/data`, so you Just replace SUBFOLDER with whatever folder exists in your analysis-docs directory! ``` E:/ai-workspace/analysis-docs/ ├── converted_md/ │ └── Documents/ ← Your investment docs ├── research-papers/ ← Another collection └── quarterly-reports/ ← Another collection # Investment analysis (your current one) docker compose run --rm graphrag python graphrag_cli.py start investment-analysis ` --input /app/data/converted_md/Documents # Research papers docker compose run --rm graphrag python graphrag_cli.py start research ` --input /app/data/research-papers # Quarterly reports docker compose run --rm graphrag python graphrag_cli.py start quarterly ` --input /app/data/quarterly-reports ``` ### 5.5 Moving database **Adding an entry to ~/.graphrag/registry.json and pointing to your existing file**: ``` docker compose run --rm graphrag python graphrag_cli.py register my-database \ --db-path /app/.DuckDB/graphrag.duckdb \ --input /app/data/<located-in-another-subfolder> ``` - Immediate Access: You can now run status, search, or index using that name (e.g., graphrag search my-database "..."). - No Data Loss: It doesn't move or modify your actual .duckdb file; it just "bookmarks" it for the CLI. ```bash # Register a folder as a database graphrag start my-database --source /app/input # Index (Uses graphrag-indexer image) graphrag index my-database # Search (Uses graphrag-query image) graphrag search my-database "How does Bloom Energy work?" graphrag list ``` Guide for Window Users: - Opening the folder in File Expolorer: `explorer $env:USERPROFILE\.graphrag` - View the registry file: `cat $env:USERPROFILE\.graphrag\registry.json` - See all registered databases: `ls $env:USERPROFILE\.graphrag\databases` ### If you want to physically move it to the new "Managed" folder: - Create a folder for your database in your defined `GRAPHRAG_DATA_DIR` - Move the .duckdb file into that folder and rename it to match - Register it: ``` docker compose run --rm graphrag python graphrag_cli.py register my-project \ --db-path /app/data/my-project/my-project.duckdb ``` This design should be portable. it uses Path.home() in `workspace_config.py` to automatically resolves to: - C:\Users\<username> on Windows - /home/<username> on Linux - /Users/<username> on macOS ## Parent Directory & Design Limitations Because this is designed with docker container for portability, the current setup with a single hardcoded mount `/app/input` means all databases share the same input directory. So my advice is to make a folder somewhere on your PC and organize multiple different topics and interests input folder within. If you need complete flexibility without predefined slots, look into creating a docker-compose.override to establish a multi drive support. ## Command Cheat-sheet ``` graphrag start <db> [--source <path>] # Create database/update a database's source folder graphrag index <db> [--prune] # Index documents graphrag search <db> <query> [--type] # Query knowledge graph graphrag list # List all databases graphrag status <db> # Show stats graphrag delete <db> # Remove database graphrag register <db> --db-path /root/.graphrag/<index-vault>/<path> # Import existing .duckdb. # If the host folder is C:\Users\name\.graphrag # you're replacing that section with /root/.graphrag. ``` ## S3 Database Backup (Git-like push) You can push your local knowledge graph databases to an S3 bucket for cloud backup. 1. **Configure S3** in your `.env`: - `S3_BUCKET_NAME`: Your bucket name. - `S3_DB_VAULT_DIR`: Path to your `.graphrag/index-vault` folder. 2. **Initialize Aliases**: - `. .\.graphrag-alias.ps1` (PowerShell) 3. **Push to Cloud**: - `graphrag-push`: Backup the active database to S3. - `graphrag-push <db-name>`: Backup a specific database. ## Troubleshooting ### WSL Search Returns No Results (PowerShell Works) **Symptom**: Running the same search command in WSL returns empty results, but PowerShell returns data. **Cause**: Docker Compose resolves `~` differently in each environment: - PowerShell: `~` → `C:\Users\<username>` ✓ - WSL: `~` → `/home/<username>` (wrong location) **Solution**: Use the provided alias files which automatically set the correct registry path, or manually export: ```bash export GRAPHRAG_REGISTRY_DIR=/mnt/c/Users/<your-windows-username>/.graphrag ``` ### MCP Config Path Format Error **Symptom**: The MCP server fails to initialize with an error like: ``` Error: docker: open /mnt/e/.../.env: The system cannot find the path specified. ``` **Cause**: Your `mcp_config.json` uses WSL-style paths (`/mnt/e/...`) but Docker Desktop for Windows requires Windows-style paths (`E:/...`). This happens when your AI agent runs from a different environment than where the MCP server executes. | Environment | Path Format | Example | | ----------- | ---------------- | -------------------------- | | WSL/Linux | `/mnt/e/project` | Used by Gemini CLI in WSL | | Windows | `E:/project` | Required by Docker Desktop | **Solution**: Use the lightweight `graphrag-query` image and ensure Windows-style paths (e.g., `E:/...` instead of `/mnt/e/...`) in your `mcp_config.json`: ```json { "mcpServers": { "graphrag": { "command": "docker", "args": [ "run", "-i", "--rm", "graphrag-query", "-v", "E:/ai-workspace/projects/graphRAG-LlamaIndex:/app", "-v", "C:/Users/nhan/.graphrag:/root/.graphrag", "--env-file", "E:/ai-workspace/projects/graphRAG-LlamaIndex/.env" ] } } ``` Also ensure the registry directory is mounted (`C:/Users/nhan/.graphrag:/root/.graphrag`) so the container can find your databases.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/T-NhanNguyen/graphRAG-LlamaIndex'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

README.md•8.95 KiB