Skip to main content
Glama
jaimeferj

RAG Document Server

by jaimeferj

RAG Server with MCP Integration

A Retrieval-Augmented Generation (RAG) system with Google AI Studio integration, featuring both a REST API and Model Context Protocol (MCP) server.

Features

Core Capabilities

  • Document Storage: Upload and store text (.txt) and Markdown (.md) documents

  • Hierarchical Chunking: Structure-aware chunking for markdown that preserves document hierarchy

  • Vector Search: Efficient similarity search using Qdrant vector database

  • Google AI Integration: Uses Google AI Studio for embeddings (text-embedding-004) and generation (gemini-1.5-flash)

  • REST API: FastAPI-based REST API with automatic OpenAPI documentation

  • MCP Server: Model Context Protocol server for seamless integration with Claude and other MCP clients

  • OpenAI-Compatible API: Supports OpenAI-compatible chat completions for web UI integration

  • Code Indexing: Index and search source code repositories with semantic understanding

  • Smart Query Routing: Automatic query classification and routing to appropriate retrieval methods

Advanced Features

  • Tag-Based Organization: Organize documents with multiple tags for easy categorization

  • Section-Aware Retrieval: Query specific sections of documentation (e.g., "Installation > Prerequisites")

  • Markdown Structure Preservation: Automatic extraction of heading hierarchy with breadcrumb paths

  • Context-Enhanced Answers: LLM receives section context for more accurate responses

  • Flexible Filtering: Filter documents by tags and/or section paths during queries

  • Document Structure API: Explore table of contents and section organization

  • GitHub Integration: Parse and extract content from GitHub URLs

  • Reference Following: Automatically follow documentation references for comprehensive answers

  • Multi-Mode Retrieval: Choose between standard, enhanced, or smart query modes

  • Rate Limiting: Built-in rate limiting for API endpoints

Project Structure

mcp-rag-docs/
   config/
      __init__.py
      settings.py                # Configuration and settings
   rag_server/
      __init__.py
      models.py                  # Pydantic models for API
      openai_api.py              # OpenAI-compatible API endpoints
      openai_models.py           # OpenAI API models
      rag_system.py              # Core RAG system logic
      server.py                  # FastAPI server
      smart_query.py             # Smart query routing
   mcp_server/
      __init__.py
      server.py                  # MCP server implementation
   utils/
      __init__.py
      code_indexer.py            # Source code indexing
      code_index_store.py        # Code index storage
      document_processor.py      # Document processing
      embeddings.py              # Google AI embeddings
      frontmatter_parser.py      # YAML frontmatter parsing
      github_parser.py           # GitHub URL parsing
      google_api_client.py       # Google AI API client
      hierarchical_chunker.py    # Hierarchical document chunking
      markdown_parser.py         # Markdown parsing
      query_classifier.py        # Query type classification
      rate_limit_store.py        # Rate limiting
      reference_extractor.py     # Extract doc references
      retrieval_router.py        # Multi-mode retrieval routing
      source_extractor.py        # Extract source code snippets
      text_chunker.py            # Text chunking utility
      vector_store.py            # Qdrant vector store wrapper
   build_code_index.py          # Build code index from repository
   check_github_urls.py         # Validate GitHub URLs
   check_status.py              # System status checker
   example_usage.py             # Example usage scripts
   ingest_docs.py               # Document ingestion utility
   main.py                      # Main entry point
   .env.example                 # Example environment variables
   docker-compose.yml           # Docker setup for Qdrant
   pyproject.toml               # Project dependencies

Installation

Prerequisites

  • Python 3.13 or higher

  • Google AI Studio API key (Get one here)

Setup

  1. Clone or navigate to the project directory

  2. Install dependencies

# Using pip
pip install -e .

# Or using uv (recommended)
uv pip install -e .
  1. Configure environment variables

# Copy the example env file
cp .env.example .env

# Edit .env and add your Google API key
GOOGLE_API_KEY=your_api_key_here
  1. Start Qdrant (optional - using Docker)

docker-compose up -d

Usage

Running the FastAPI Server

Start the REST API server:

python -m rag_server.server

The server will start at http://localhost:8000. Visit http://localhost:8000/docs for interactive API documentation.

API Endpoints

Core Endpoints:

  • POST /documents - Upload a document

  • POST /query - Query the RAG system (standard mode)

  • POST /query-enhanced - Query with automatic reference following

  • POST /smart-query - Smart query with automatic routing

  • GET /documents - List all documents

  • DELETE /documents/{doc_id} - Delete a document

  • GET /stats - Get system statistics

  • GET /health - Health check

  • GET /tags - List all available tags

  • GET /documents/{doc_id}/sections - Get document structure

OpenAI-Compatible Endpoints:

  • POST /v1/chat/completions - OpenAI-compatible chat completions

  • GET /v1/models - List available models

Example Usage with curl

# Upload a document
curl -X POST "http://localhost:8000/documents" \
  -F "file=@example.txt"

# Upload with tags
curl -X POST "http://localhost:8000/documents" \
  -F "file=@dagster-docs.md" \
  -F "tags=dagster,python,orchestration"

# Query the RAG system
curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{"question": "What is the main topic of the documents?", "top_k": 5}'

# Smart query with automatic routing
curl -X POST "http://localhost:8000/smart-query" \
  -H "Content-Type: application/json" \
  -d '{"question": "How do I create a Dagster asset?"}'

# OpenAI-compatible chat completion
curl -X POST "http://localhost:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "rag-smart",
    "messages": [{"role": "user", "content": "What is an asset in Dagster?"}],
    "stream": false
  }'

# List documents
curl "http://localhost:8000/documents"

# Get statistics
curl "http://localhost:8000/stats"

Running the MCP Server

The MCP server allows integration with Claude and other MCP-compatible clients.

python -m mcp_server.server

MCP Tools Available

  1. query_rag - Query the RAG system with a question

  2. query_rag_enhanced - Query with automatic reference following

  3. smart_query - Smart query with automatic routing and classification

  4. add_document - Add a document to the RAG system

  5. list_documents - List all stored documents

  6. delete_document - Delete a document by ID

  7. get_rag_stats - Get system statistics

  8. get_tags - List all available tags

  9. get_document_structure - Get document table of contents

Using with Claude Desktop

Add to your Claude Desktop configuration (claude_desktop_config.json):

{
  "mcpServers": {
    "rag": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/mcp-rag-docs",
        "run",
        "python",
        "-m",
        "mcp_server.server"
      ]
    }
  }
}

See QUICK_START.md for a quick setup guide.

Configuration

All configuration is managed through environment variables (defined in .env):

Variable

Description

Default

GOOGLE_API_KEY

Google AI Studio API key

(required)

CHUNK_SIZE

Size of text chunks in characters

1000

CHUNK_OVERLAP

Overlap between chunks

200

TOP_K_RESULTS

Number of chunks to retrieve

5

QDRANT_PATH

Path to Qdrant storage

./qdrant_storage

QDRANT_COLLECTION_NAME

Qdrant collection name

documents

FASTAPI_HOST

FastAPI server host

0.0.0.0

FASTAPI_PORT

FastAPI server port

8000

EMBEDDING_MODEL

Google embedding model

text-embedding-004

LLM_MODEL

Google LLM model

gemini-1.5-flash

Architecture

Document Processing Pipeline

  1. Upload - User uploads a .txt or .md file

  2. Processing - Document is read and metadata extracted (including frontmatter)

  3. Chunking - Text is split using hierarchical chunking for markdown or standard chunking for text

  4. Embedding - Each chunk is converted to a vector using Google AI embeddings

  5. Storage - Vectors and metadata are stored in Qdrant

Query Pipeline

Standard Query

  1. Query - User submits a question

  2. Embedding - Question is converted to a vector

  3. Retrieval - Similar chunks are retrieved from Qdrant

  4. Generation - Context is provided to Google AI Studio model

  5. Response - Answer is generated and returned with sources

Smart Query

  1. Classification - Query is classified (documentation, code, conceptual, etc.)

  2. Routing - Automatically selects best retrieval strategy

  3. Multi-Source - May combine documentation search, code search, and direct answers

  4. Synthesis - Generates comprehensive answer from multiple sources

Code Indexing

The system can index source code repositories:

# Build code index
python build_code_index.py /path/to/repo

# Query code through the API or MCP server

Code is indexed with:

  • Class and function definitions

  • Docstrings and comments

  • File structure and imports

  • Semantic embeddings for natural language queries

Development

Running Tests

# Install test dependencies
pip install pytest pytest-asyncio httpx

# Run tests
pytest

# Run specific test files
pytest test_openai_api.py
pytest test_mcp_integration.py

Code Style

The project follows Python best practices with type hints and docstrings.

Troubleshooting

Common Issues

Issue: GOOGLE_API_KEY not found

  • Solution: Ensure you've created a .env file and added your Google API key

Issue: Unsupported file type

  • Solution: Only .txt and .md files are supported. Convert other formats first.

Issue: Collection already exists error

  • Solution: Delete the qdrant_storage/ directory to reset the database

Issue: MCP server not connecting

  • Solution: Check that the path in your MCP config is correct and the .env file is in the project root

Advanced Usage

Tag-Based Organization

Organize your documents with tags for easy categorization and filtering:

# Upload document with tags
curl -X POST "http://localhost:8000/documents" \
  -F "file=@dagster-docs.md" \
  -F "tags=dagster,python,orchestration"

# List all available tags
curl "http://localhost:8000/tags"

# Query only dagster-related documents
curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{"question": "How do I create a pipeline?", "tags": ["dagster"]}'

# List documents filtered by tags
curl "http://localhost:8000/documents?tags=dagster,python"

Hierarchical Document Structure

For markdown documents, the system automatically preserves heading hierarchy:

# Get document structure (table of contents)
curl "http://localhost:8000/documents/{doc_id}/sections"

# Query specific section
curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{"question": "What are the prerequisites?", "section_path": "Installation > Prerequisites"}'

Section-Aware Queries

The system includes section context when generating answers:

# Example: Markdown document structure
# Installation
#   Prerequisites
#     Python Version
#   Setup Steps

# When you query about "Python version requirements"
# The system will:
# 1. Retrieve relevant chunks from "Installation > Prerequisites > Python Version"
# 2. Include section path in context sent to LLM
# 3. Cite sources with full section paths

Smart Query Modes

The system supports three query modes:

  1. Standard (/query) - Basic vector search and retrieval

  2. Enhanced (/query-enhanced) - Follows documentation references automatically

  3. Smart (/smart-query) - Automatic classification and routing

Use the OpenAI-compatible API to access different modes:

# Standard mode
curl -X POST "http://localhost:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{"model": "rag-standard", "messages": [{"role": "user", "content": "What is Dagster?"}]}'

# Enhanced mode with reference following
curl -X POST "http://localhost:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{"model": "rag-enhanced", "messages": [{"role": "user", "content": "What is Dagster?"}]}'

# Smart mode with automatic routing
curl -X POST "http://localhost:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{"model": "rag-smart", "messages": [{"role": "user", "content": "What is Dagster?"}]}'

MCP Tools

The MCP server provides enhanced tools for Claude and other MCP clients:

query_rag - Query with optional tags and section filtering

{
  "question": "How do I deploy?",
  "tags": ["dagster"],
  "section_path": "Deployment"
}

smart_query - Smart query with automatic routing

{
  "question": "What is an asset and how do I use it?"
}

add_document - Upload with tags

{
  "file_path": "/path/to/doc.md",
  "tags": ["dagster", "docs"]
}

get_tags - List all tags

get_document_structure - Get table of contents

{
  "doc_id": "abc123"
}

API Reference

Enhanced Endpoints

POST /documents

  • Body: file (multipart), tags (comma-separated string)

  • Response: Document info with tags and chunk count

POST /query

  • Body: {"question": "...", "tags": [...], "section_path": "..."}

  • Response: Answer with section-aware sources

POST /smart-query

  • Body: {"question": "..."}

  • Response: Smart answer with automatic routing and classification

GET /tags

  • Response: {"tags": [...], "total": N}

GET /documents/{doc_id}/sections

  • Response: Document structure with section hierarchy

GET /documents?tags=tag1,tag2

  • Query filtered by tags

  • Response: List of matching documents

POST /v1/chat/completions

  • OpenAI-compatible chat completion endpoint

  • Supports models: rag-standard, rag-enhanced, rag-smart

  • Supports streaming with stream: true

GET /v1/models

  • List available RAG models

Additional Documentation

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Acknowledgments

  • Google AI Studio for embeddings and LLM capabilities

  • Qdrant for vector database

  • FastAPI for the REST API framework

  • Anthropic MCP for the Model Context Protocol

-
security - not tested
F
license - not found
-
quality - not tested

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jaimeferj/mcp-rag-docs'

If you have feedback or need assistance with the MCP directory API, please join our Discord server