Skip to main content
Glama
lemondude21

GraphRAG MCP Server

by lemondude21

GraphRAG MCP Server & Batch Ingestion System

This repository serves as a production-ready blueprint for a custom-hosted Model Context Protocol (MCP) server and Batch Ingestion Pipeline built in Python using the official mcp SDK (utilizing FastMCP) and connected to a Neo4j AuraDB (Free/Paid Tier) graph database.

It uses a lightweight Large Language Model (gemini-2.5-flash via the official google-genai SDK) to asynchronously extract entities and relationships from documents, chunking large texts dynamically, and writing them idempotently to Neo4j to build a Knowledge Graph.


System Architecture

                                  [LOCAL INGESTION PIPELINE]
                                  ┌────────────────────────┐
                                  │   Local Folder Path    │  <- (Job Descriptions, Procedures, etc.)
                                  │  (.txt, .md, .pdf, .docx)│
                                  └───────────┬────────────┘
                                              │
                                              ▼
                                  ┌────────────────────────┐
                                  │   batch_ingestion.py   │  <- (Extracts text, chunks, runs Gemini API,
                                  │  (Runs on local machine)│      merges directly to Neo4j cloud)
                                  └───────────┬────────────┘
                                              │
                                              │ (Idempotent Cypher Bolt)
                                              ▼
┌───────────────────────────┐     ┌────────────────────────┐
│ Microsoft Copilot Studio  │     │  Neo4j AuraDB (Cloud)  │
└─────────────┬─────────────┘     └───────────▲────────────┘
              │ (HTTP/SSE Transport)          │
              ▼                               │
┌───────────────────────────┐                 │
│    Azure Container App    │                 │
│   (mcp-graphrag-server)   ├─────────────────┘
└───────────────────────────┘     (Cypher Transactions)

Related MCP server: Neo4j GraphRAG MCP Server

Key Features

  • Local Batch Ingestor: Command-line tool (batch_ingestion.py) that scans a directory and extracts text from PDF (.pdf), Markdown (.md), and plaintext (.txt) documents, merging them directly into Neo4j.

  • Official GraphRAG Integration: Powered by the official neo4j-graphrag Python package, utilizing its SimpleKGPipeline with GeminiLLM and GeminiEmbedder for robust, schema-compliant graph construction.

  • Idempotent Ingestion: Skips already processed files and writes Cypher statements utilizing strict MERGE clauses and timestamping (timestamp: datetime()). Ingesting documents multiple times updates attributes without duplicating nodes.

  • Vector Search Index: Automatically provisions a vector index (text_embeddings) on startup to support semantic similarity searches in Neo4j.

  • Interactive Graph Visualizer: A visualization script (visualize.py) that queries Neo4j and uses pyvis to generate an interactive 3D/2D network graph (graph_visualization.html) that can be opened in any web browser.

  • Test Data Bootstrapper: Includes a helper script (download_test_data.py) to easily fetch sample PDF documents (e.g., Aventro Motors manuals and FAQs) from a public source to test ingestion out of the box.

  • Cloud-Ready Transports: Configured to run in Standard Input/Output (stdio) mode for local debugging, or HTTP/SSE mode (e.g. streamable-http) for cloud deployment.

  • Azure Container Apps Support: Includes a production Dockerfile ready for Azure Container Registry (ACR) and Azure Container Apps (ACA) hosting, complete with secure secret configurations.


File Structure

  • requirements.txt: List of Python dependencies (mcp, neo4j, python-dotenv, google-genai, pydantic, pypdf, pyvis, neo4j-graphrag).

  • .env: Local configuration file containing Neo4j and Gemini API credentials (excluded from git).

  • .env.template: Template file for setting up the environment variables.

  • .gitignore: Git ignore file to prevent pushing credentials, caches, virtual environments, or raw PDF datasets.

  • database.py: Core database wrapper providing Neo4jDatabase async operations and graph merging/vector index creation logic.

  • server.py: FastMCP server exposing tools and managing server lifespan. Exposes get_graph_health, ingest_document, and query_knowledge_base tools.

  • batch_ingestion.py: Local batch script that parses a directory of documents and ingests them.

  • download_test_data.py: Downloads test PDF documents (Aventro Motors dataset) to the data/ directory.

  • visualize.py: Extracts nodes and relationships from Neo4j and outputs a visual representation.

  • graph_visualization.html: Interactive graph visualization output generated by visualize.py.

  • lib/: Frontend libraries and bindings used by the visualization output.

  • Dockerfile: Deployment configuration for Azure Container Apps.


Local Setup & Installation

1. Prerequisites

  • Python 3.10+ installed on your system.

  • Docker Desktop installed and running on your local machine.

  • A Neo4j AuraDB instance. Create one for free at Neo4j Aura Console.

  • A Gemini API Key. Get one at Google AI Studio.

2. Set Up Environment

Create your virtual environment, activate it, and install the dependencies:

# Create and activate environment
python -m venv .venv
.venv\Scripts\Activate.ps1   # Windows PowerShell
source .venv/bin/activate     # macOS / Linux

# Install dependencies
pip install -r requirements.txt

3. Configuration

Copy the template .env.template into a .env file and fill in your details:

NEO4J_URI=neo4j+s://<your-db-id>.databases.neo4j.io
NEO4J_USER=neo4j
NEO4J_PASSWORD=<your-generated-password>
GEMINI_API_KEY=<your-api-key>

Download Test Data & Batch Ingestion

To populate your Neo4j Knowledge Graph with test data:

  1. Download Test Data: Run the download helper script to fetch sample PDFs:

    python download_test_data.py

    This will download the sample Aventro Motors manuals/FAQs into a local ./data folder.

  2. Ingest Documents: Run the ingestion tool:

    # Ingest from default 'data' folder
    python batch_ingestion.py
    
    # Or ingest from a specific custom folder path
    python batch_ingestion.py "C:\Users\.."

Because of its idempotent design, the ingestion script can be safely re-run to process new documents or update existing ones without duplicating existing graph nodes.


Graph Visualization

Once ingestion is complete, you can generate an interactive visualization of the knowledge graph:

  1. Run the visualizer script:

    python visualize.py
  2. Open the newly generated graph_visualization.html in your browser. You can click on nodes (Documents, Chunks, and Entities) to inspect their metadata, zoom, and drag them to explore connections.


Running and Testing the Server Locally

To run and debug the MCP server locally using the official MCP Inspector console:

mcp dev server.py

This opens a local web UI where you can trigger get_graph_health or input a document's name and content to invoke the ingestion flow.


Deployment to Azure Container Apps (Production)

To deploy your server as an HTTP-based service reachable by Microsoft Copilot Studio:

Step 1: ACR Login and Image Build

  1. Make sure Docker Desktop is running on your machine.

  2. Log in to your Azure subscription and authenticate the local Docker CLI with your Azure Container Registry (ACR):

    az login
    az acr login --name qafcomcpregistry
  3. Compile and build the container image locally:

    docker build -t qafcomcpregistry.azurecr.io/mcp-graphrag-server:latest .
  4. Push the image to ACR:

    docker push qafcomcpregistry.azurecr.io/mcp-graphrag-server:latest

Step 2: Provision & Deploy App

Create the Container App environment, enable external Ingress on port 8080, and register your credentials securely as Container App secrets:

# Create App Environment
az containerapp env create --name MyContainerEnv --resource-group MyResourceGroup --location eastus

# Deploy Container App with Ingress and Environment secrets
az containerapp create \
  --name mcp-graphrag-server \
  --resource-group MyResourceGroup \
  --environment MyContainerEnv \
  --image qafcomcpregistry.azurecr.io/mcp-graphrag-server:latest \
  --target-port 8080 \
  --ingress external \
  --registry-server qafcomcpregistry.azurecr.io \
  --registry-username qafcomcpregistry \
  --registry-password "<your-acr-password>" \
  --secrets gemini-key="<your-gemini-key>" neo4j-uri="neo4j+s://<your-db-id>.databases.neo4j.io" neo4j-pwd="<your-db-password>" \
  --env-vars MCP_TRANSPORT=http NEO4J_USER="neo4j" GEMINI_API_KEY=secretref:gemini-key NEO4J_URI=secretref:neo4j-uri NEO4J_PASSWORD=secretref:neo4j-pwd

Step 3: Retrieve the Public URL

Use the following command to retrieve the FQDN to connect to Copilot Studio:

az containerapp show --name mcp-graphrag-server --resource-group MyResourceGroup --query properties.configuration.ingress.fqdn

Connecting to Microsoft Copilot Studio

  1. Open your agent in Copilot Studio and enable Generative Orchestration in Settings.

  2. Navigate to Tools > Add a tool > New tool > Model Context Protocol (MCP).

  3. Configure the connection using:

    • Server URL: https://<your-deployed-containerapp-fqdn> (e.g. https://mcp-graphrag-server.nicesky-09bf43ca.eastus.azurecontainerapps.io)

    • Authentication: Set to None.

  4. Save and Publish. Your Copilot agent will automatically discover the ingest_document tool schema and call it dynamically whenever users request document ingestion.

F
license - not found
-
quality - not tested
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/lemondude21/mcp-graphrag-test'

If you have feedback or need assistance with the MCP directory API, please join our Discord server