Skip to main content
Glama

RAG Server with MCP Integration

A Retrieval-Augmented Generation (RAG) system with Google AI Studio integration, featuring both a REST API and Model Context Protocol (MCP) server.

Features

Core Capabilities

  • Document Storage: Upload and store text (.txt) and Markdown (.md) documents

  • Hierarchical Chunking: Structure-aware chunking for markdown that preserves document hierarchy

  • Vector Search: Efficient similarity search using Qdrant vector database

  • Google AI Integration: Uses Google AI Studio for embeddings (text-embedding-004) and generation (gemini-1.5-flash)

  • REST API: FastAPI-based REST API with automatic OpenAPI documentation

  • MCP Server: Model Context Protocol server for seamless integration with Claude and other MCP clients

  • OpenAI-Compatible API: Supports OpenAI-compatible chat completions for web UI integration

  • Code Indexing: Index and search source code repositories with semantic understanding

  • Smart Query Routing: Automatic query classification and routing to appropriate retrieval methods

Advanced Features

  • Tag-Based Organization: Organize documents with multiple tags for easy categorization

  • Section-Aware Retrieval: Query specific sections of documentation (e.g., "Installation > Prerequisites")

  • Markdown Structure Preservation: Automatic extraction of heading hierarchy with breadcrumb paths

  • Context-Enhanced Answers: LLM receives section context for more accurate responses

  • Flexible Filtering: Filter documents by tags and/or section paths during queries

  • Document Structure API: Explore table of contents and section organization

  • GitHub Integration: Parse and extract content from GitHub URLs

  • Reference Following: Automatically follow documentation references for comprehensive answers

  • Multi-Mode Retrieval: Choose between standard, enhanced, or smart query modes

  • Rate Limiting: Built-in rate limiting for API endpoints

Project Structure

mcp-rag-docs/ config/ __init__.py settings.py # Configuration and settings rag_server/ __init__.py models.py # Pydantic models for API openai_api.py # OpenAI-compatible API endpoints openai_models.py # OpenAI API models rag_system.py # Core RAG system logic server.py # FastAPI server smart_query.py # Smart query routing mcp_server/ __init__.py server.py # MCP server implementation utils/ __init__.py code_indexer.py # Source code indexing code_index_store.py # Code index storage document_processor.py # Document processing embeddings.py # Google AI embeddings frontmatter_parser.py # YAML frontmatter parsing github_parser.py # GitHub URL parsing google_api_client.py # Google AI API client hierarchical_chunker.py # Hierarchical document chunking markdown_parser.py # Markdown parsing query_classifier.py # Query type classification rate_limit_store.py # Rate limiting reference_extractor.py # Extract doc references retrieval_router.py # Multi-mode retrieval routing source_extractor.py # Extract source code snippets text_chunker.py # Text chunking utility vector_store.py # Qdrant vector store wrapper build_code_index.py # Build code index from repository check_github_urls.py # Validate GitHub URLs check_status.py # System status checker example_usage.py # Example usage scripts ingest_docs.py # Document ingestion utility main.py # Main entry point .env.example # Example environment variables docker-compose.yml # Docker setup for Qdrant pyproject.toml # Project dependencies

Installation

Prerequisites

  • Python 3.13 or higher

  • Google AI Studio API key (Get one here)

Setup

  1. Clone or navigate to the project directory

  2. Install dependencies

# Using pip pip install -e . # Or using uv (recommended) uv pip install -e .
  1. Configure environment variables

# Copy the example env file cp .env.example .env # Edit .env and add your Google API key GOOGLE_API_KEY=your_api_key_here
  1. Start Qdrant (optional - using Docker)

docker-compose up -d

Usage

Running the FastAPI Server

Start the REST API server:

python -m rag_server.server

The server will start at http://localhost:8000. Visit http://localhost:8000/docs for interactive API documentation.

API Endpoints

Core Endpoints:

  • POST /documents - Upload a document

  • POST /query - Query the RAG system (standard mode)

  • POST /query-enhanced - Query with automatic reference following

  • POST /smart-query - Smart query with automatic routing

  • GET /documents - List all documents

  • DELETE /documents/{doc_id} - Delete a document

  • GET /stats - Get system statistics

  • GET /health - Health check

  • GET /tags - List all available tags

  • GET /documents/{doc_id}/sections - Get document structure

OpenAI-Compatible Endpoints:

  • POST /v1/chat/completions - OpenAI-compatible chat completions

  • GET /v1/models - List available models

Example Usage with curl

# Upload a document curl -X POST "http://localhost:8000/documents" \ -F "file=@example.txt" # Upload with tags curl -X POST "http://localhost:8000/documents" \ -F "file=@dagster-docs.md" \ -F "tags=dagster,python,orchestration" # Query the RAG system curl -X POST "http://localhost:8000/query" \ -H "Content-Type: application/json" \ -d '{"question": "What is the main topic of the documents?", "top_k": 5}' # Smart query with automatic routing curl -X POST "http://localhost:8000/smart-query" \ -H "Content-Type: application/json" \ -d '{"question": "How do I create a Dagster asset?"}' # OpenAI-compatible chat completion curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ -d '{ "model": "rag-smart", "messages": [{"role": "user", "content": "What is an asset in Dagster?"}], "stream": false }' # List documents curl "http://localhost:8000/documents" # Get statistics curl "http://localhost:8000/stats"

Running the MCP Server

The MCP server allows integration with Claude and other MCP-compatible clients.

python -m mcp_server.server

MCP Tools Available

  1. query_rag - Query the RAG system with a question

  2. query_rag_enhanced - Query with automatic reference following

  3. smart_query - Smart query with automatic routing and classification

  4. add_document - Add a document to the RAG system

  5. list_documents - List all stored documents

  6. delete_document - Delete a document by ID

  7. get_rag_stats - Get system statistics

  8. get_tags - List all available tags

  9. get_document_structure - Get document table of contents

Using with Claude Desktop

Add to your Claude Desktop configuration (claude_desktop_config.json):

{ "mcpServers": { "rag": { "command": "uv", "args": [ "--directory", "/path/to/mcp-rag-docs", "run", "python", "-m", "mcp_server.server" ] } } }

See QUICK_START.md for a quick setup guide.

Configuration

All configuration is managed through environment variables (defined in .env):

Variable

Description

Default

GOOGLE_API_KEY

Google AI Studio API key

(required)

CHUNK_SIZE

Size of text chunks in characters

1000

CHUNK_OVERLAP

Overlap between chunks

200

TOP_K_RESULTS

Number of chunks to retrieve

5

QDRANT_PATH

Path to Qdrant storage

./qdrant_storage

QDRANT_COLLECTION_NAME

Qdrant collection name

documents

FASTAPI_HOST

FastAPI server host

0.0.0.0

FASTAPI_PORT

FastAPI server port

8000

EMBEDDING_MODEL

Google embedding model

text-embedding-004

LLM_MODEL

Google LLM model

gemini-1.5-flash

Architecture

Document Processing Pipeline

  1. Upload - User uploads a .txt or .md file

  2. Processing - Document is read and metadata extracted (including frontmatter)

  3. Chunking - Text is split using hierarchical chunking for markdown or standard chunking for text

  4. Embedding - Each chunk is converted to a vector using Google AI embeddings

  5. Storage - Vectors and metadata are stored in Qdrant

Query Pipeline

Standard Query

  1. Query - User submits a question

  2. Embedding - Question is converted to a vector

  3. Retrieval - Similar chunks are retrieved from Qdrant

  4. Generation - Context is provided to Google AI Studio model

  5. Response - Answer is generated and returned with sources

Smart Query

  1. Classification - Query is classified (documentation, code, conceptual, etc.)

  2. Routing - Automatically selects best retrieval strategy

  3. Multi-Source - May combine documentation search, code search, and direct answers

  4. Synthesis - Generates comprehensive answer from multiple sources

Code Indexing

The system can index source code repositories:

# Build code index python build_code_index.py /path/to/repo # Query code through the API or MCP server

Code is indexed with:

  • Class and function definitions

  • Docstrings and comments

  • File structure and imports

  • Semantic embeddings for natural language queries

Development

Running Tests

# Install test dependencies pip install pytest pytest-asyncio httpx # Run tests pytest # Run specific test files pytest test_openai_api.py pytest test_mcp_integration.py

Code Style

The project follows Python best practices with type hints and docstrings.

Troubleshooting

Common Issues

Issue: GOOGLE_API_KEY not found

  • Solution: Ensure you've created a .env file and added your Google API key

Issue: Unsupported file type

  • Solution: Only .txt and .md files are supported. Convert other formats first.

Issue: Collection already exists error

  • Solution: Delete the qdrant_storage/ directory to reset the database

Issue: MCP server not connecting

  • Solution: Check that the path in your MCP config is correct and the .env file is in the project root

Advanced Usage

Tag-Based Organization

Organize your documents with tags for easy categorization and filtering:

# Upload document with tags curl -X POST "http://localhost:8000/documents" \ -F "file=@dagster-docs.md" \ -F "tags=dagster,python,orchestration" # List all available tags curl "http://localhost:8000/tags" # Query only dagster-related documents curl -X POST "http://localhost:8000/query" \ -H "Content-Type: application/json" \ -d '{"question": "How do I create a pipeline?", "tags": ["dagster"]}' # List documents filtered by tags curl "http://localhost:8000/documents?tags=dagster,python"

Hierarchical Document Structure

For markdown documents, the system automatically preserves heading hierarchy:

# Get document structure (table of contents) curl "http://localhost:8000/documents/{doc_id}/sections" # Query specific section curl -X POST "http://localhost:8000/query" \ -H "Content-Type: application/json" \ -d '{"question": "What are the prerequisites?", "section_path": "Installation > Prerequisites"}'

Section-Aware Queries

The system includes section context when generating answers:

# Example: Markdown document structure # Installation # Prerequisites # Python Version # Setup Steps # When you query about "Python version requirements" # The system will: # 1. Retrieve relevant chunks from "Installation > Prerequisites > Python Version" # 2. Include section path in context sent to LLM # 3. Cite sources with full section paths

Smart Query Modes

The system supports three query modes:

  1. Standard (/query) - Basic vector search and retrieval

  2. Enhanced (/query-enhanced) - Follows documentation references automatically

  3. Smart (/smart-query) - Automatic classification and routing

Use the OpenAI-compatible API to access different modes:

# Standard mode curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ -d '{"model": "rag-standard", "messages": [{"role": "user", "content": "What is Dagster?"}]}' # Enhanced mode with reference following curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ -d '{"model": "rag-enhanced", "messages": [{"role": "user", "content": "What is Dagster?"}]}' # Smart mode with automatic routing curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ -d '{"model": "rag-smart", "messages": [{"role": "user", "content": "What is Dagster?"}]}'

MCP Tools

The MCP server provides enhanced tools for Claude and other MCP clients:

query_rag - Query with optional tags and section filtering

{ "question": "How do I deploy?", "tags": ["dagster"], "section_path": "Deployment" }

smart_query - Smart query with automatic routing

{ "question": "What is an asset and how do I use it?" }

add_document - Upload with tags

{ "file_path": "/path/to/doc.md", "tags": ["dagster", "docs"] }

get_tags - List all tags

get_document_structure - Get table of contents

{ "doc_id": "abc123" }

API Reference

Enhanced Endpoints

POST /documents

  • Body: file (multipart), tags (comma-separated string)

  • Response: Document info with tags and chunk count

POST /query

  • Body: {"question": "...", "tags": [...], "section_path": "..."}

  • Response: Answer with section-aware sources

POST /smart-query

  • Body: {"question": "..."}

  • Response: Smart answer with automatic routing and classification

GET /tags

  • Response: {"tags": [...], "total": N}

GET /documents/{doc_id}/sections

  • Response: Document structure with section hierarchy

GET /documents?tags=tag1,tag2

  • Query filtered by tags

  • Response: List of matching documents

POST /v1/chat/completions

  • OpenAI-compatible chat completion endpoint

  • Supports models: rag-standard, rag-enhanced, rag-smart

  • Supports streaming with stream: true

GET /v1/models

  • List available RAG models

Additional Documentation

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Acknowledgments

  • Google AI Studio for embeddings and LLM capabilities

  • Qdrant for vector database

  • FastAPI for the REST API framework

  • Anthropic MCP for the Model Context Protocol

-
security - not tested
F
license - not found
-
quality - not tested

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jaimeferj/mcp-rag-docs'

If you have feedback or need assistance with the MCP directory API, please join our Discord server