Skip to main content
Glama

RAG Document Server

by jaimeferj

RAG Server with MCP Integration

A Retrieval-Augmented Generation (RAG) system with Google AI Studio integration, featuring both a REST API and Model Context Protocol (MCP) server.

Features

Core Capabilities

  • Document Storage: Upload and store text (.txt) and Markdown (.md) documents

  • Hierarchical Chunking: Structure-aware chunking for markdown that preserves document hierarchy

  • Vector Search: Efficient similarity search using Qdrant vector database

  • Google AI Integration: Uses Google AI Studio for embeddings (text-embedding-004) and generation (gemini-1.5-flash)

  • REST API: FastAPI-based REST API with automatic OpenAPI documentation

  • MCP Server: Model Context Protocol server for seamless integration with Claude and other MCP clients

Advanced Features

  • Tag-Based Organization: Organize documents with multiple tags for easy categorization

  • Section-Aware Retrieval: Query specific sections of documentation (e.g., "Installation > Prerequisites")

  • Markdown Structure Preservation: Automatic extraction of heading hierarchy with breadcrumb paths

  • Context-Enhanced Answers: LLM receives section context for more accurate responses

  • Flexible Filtering: Filter documents by tags and/or section paths during queries

  • Document Structure API: Explore table of contents and section organization

Project Structure

rag/  config/   __init__.py   settings.py # Configuration and settings  rag_server/   __init__.py   models.py # Pydantic models for API   rag_system.py # Core RAG system logic   server.py # FastAPI server  mcp_server/   __init__.py   server.py # MCP server implementation  utils/   __init__.py   document_processor.py # Document processing   embeddings.py # Google AI embeddings   text_chunker.py # Text chunking utility   vector_store.py # Qdrant vector store wrapper  .env.example # Example environment variables  .gitignore  main.py  pyproject.toml  README.md

Installation

Prerequisites

  • Python 3.13 or higher

  • Google AI Studio API key (Get one here)

Setup

  1. Clone or navigate to the project directory

  2. Install dependencies

# Using pip pip install -e . # Or using uv (recommended) uv pip install -e .
  1. Configure environment variables

# Copy the example env file cp .env.example .env # Edit .env and add your Google API key GOOGLE_API_KEY=your_api_key_here

Usage

Running the FastAPI Server

Start the REST API server:

python -m rag_server.server

The server will start at http://localhost:8000. Visit http://localhost:8000/docs for interactive API documentation.

API Endpoints

  • POST /documents - Upload a document

  • POST /query - Query the RAG system

  • GET /documents - List all documents

  • DELETE /documents/{doc_id} - Delete a document

  • GET /stats - Get system statistics

  • GET /health - Health check

Example Usage with curl

# Upload a document curl -X POST "http://localhost:8000/documents" \ -F "file=@example.txt" # Query the RAG system curl -X POST "http://localhost:8000/query" \ -H "Content-Type: application/json" \ -d '{"question": "What is the main topic of the documents?", "top_k": 5}' # List documents curl "http://localhost:8000/documents" # Get statistics curl "http://localhost:8000/stats"

Running the MCP Server

The MCP server allows integration with Claude and other MCP-compatible clients.

python -m mcp_server.server

MCP Tools Available

  1. query_rag - Query the RAG system with a question

  2. add_document - Add a document to the RAG system

  3. list_documents - List all stored documents

  4. delete_document - Delete a document by ID

  5. get_rag_stats - Get system statistics

Using with Claude Desktop

Add to your Claude Desktop configuration (claude_desktop_config.json):

{ "mcpServers": { "rag": { "command": "python", "args": ["-m", "mcp_server.server"], "cwd": "/path/to/rag" } } }

Configuration

All configuration is managed through environment variables (defined in .env):

Variable

Description

Default

GOOGLE_API_KEY

Google AI Studio API key

(required)

CHUNK_SIZE

Size of text chunks in characters

1000

CHUNK_OVERLAP

Overlap between chunks

200

TOP_K_RESULTS

Number of chunks to retrieve

5

QDRANT_PATH

Path to Qdrant storage

./qdrant_storage

QDRANT_COLLECTION_NAME

Qdrant collection name

documents

FASTAPI_HOST

FastAPI server host

0.0.0.0

FASTAPI_PORT

FastAPI server port

8000

EMBEDDING_MODEL

Google embedding model

text-embedding-004

LLM_MODEL

Google LLM model

gemini-1.5-flash

Architecture

Document Processing Pipeline

  1. Upload - User uploads a .txt or .md file

  2. Processing - Document is read and metadata extracted

  3. Chunking - Text is split into overlapping chunks

  4. Embedding - Each chunk is converted to a vector using Google AI embeddings

  5. Storage - Vectors and metadata are stored in Qdrant

Query Pipeline

  1. Query - User submits a question

  2. Embedding - Question is converted to a vector

  3. Retrieval - Similar chunks are retrieved from Qdrant

  4. Generation - Context is provided to Google AI Studio Flash model

  5. Response - Answer is generated and returned with sources

Development

Running Tests

# Install test dependencies pip install pytest pytest-asyncio httpx # Run tests pytest

Code Style

The project follows Python best practices with type hints and docstrings.

Troubleshooting

Common Issues

Issue: GOOGLE_API_KEY not found

  • Solution: Ensure you've created a .env file and added your Google API key

Issue: Unsupported file type

  • Solution: Only .txt and .md files are supported. Convert other formats first.

Issue: Collection already exists error

  • Solution: Delete the qdrant_storage/ directory to reset the database

Advanced Usage

Tag-Based Organization

Organize your documents with tags for easy categorization and filtering:

# Upload document with tags curl -X POST "http://localhost:8000/documents" \ -F "file=@dagster-docs.md" \ -F "tags=dagster,python,orchestration" # List all available tags curl "http://localhost:8000/tags" # Query only dagster-related documents curl -X POST "http://localhost:8000/query" \ -H "Content-Type: application/json" \ -d '{"question": "How do I create a pipeline?", "tags": ["dagster"]}' # List documents filtered by tags curl "http://localhost:8000/documents?tags=dagster,python"

Hierarchical Document Structure

For markdown documents, the system automatically preserves heading hierarchy:

# Get document structure (table of contents) curl "http://localhost:8000/documents/{doc_id}/sections" # Query specific section curl -X POST "http://localhost:8000/query" \ -H "Content-Type: application/json" \ -d '{"question": "What are the prerequisites?", "section_path": "Installation > Prerequisites"}'

Section-Aware Queries

The system includes section context when generating answers:

# Example: Markdown document structure # Installation # Prerequisites # Python Version # Setup Steps # When you query about "Python version requirements" # The system will: # 1. Retrieve relevant chunks from "Installation > Prerequisites > Python Version" # 2. Include section path in context sent to LLM # 3. Cite sources with full section paths

MCP Tools

The MCP server provides enhanced tools for Claude and other MCP clients:

query_rag - Query with optional tags and section filtering

{ "question": "How do I deploy?", "tags": ["dagster"], "section_path": "Deployment" }

add_document - Upload with tags

{ "file_path": "/path/to/doc.md", "tags": ["dagster", "docs"] }

get_tags - List all tags

get_document_structure - Get table of contents

{ "doc_id": "abc123" }

API Reference

Enhanced Endpoints

POST /documents

  • Body: file (multipart), tags (comma-separated string)

  • Response: Document info with tags and chunk count

POST /query

  • Body: {"question": "...", "tags": [...], "section_path": "..."}

  • Response: Answer with section-aware sources

GET /tags

  • Response: {"tags": [...], "total": N}

GET /documents/{doc_id}/sections

  • Response: Document structure with section hierarchy

GET /documents?tags=tag1,tag2

  • Query filtered by tags

  • Response: List of matching documents

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Acknowledgments

  • Google AI Studio for embeddings and LLM capabilities

  • Qdrant for vector database

  • FastAPI for the REST API framework

  • Anthropic MCP for the Model Context Protocol

-
security - not tested
F
license - not found
-
quality - not tested

hybrid server

The server is able to function both locally and remotely, depending on the configuration or use case.

Enables semantic search and question-answering over uploaded documents using vector embeddings and Google AI. Supports document organization with tags, section-aware queries, and hierarchical markdown structure preservation.

  1. Features
    1. Core Capabilities
    2. Advanced Features
  2. Project Structure
    1. Installation
      1. Prerequisites
      2. Setup
    2. Usage
      1. Running the FastAPI Server
      2. Running the MCP Server
    3. Configuration
      1. Architecture
        1. Document Processing Pipeline
        2. Query Pipeline
      2. Development
        1. Running Tests
        2. Code Style
      3. Troubleshooting
        1. Common Issues
      4. Advanced Usage
        1. Tag-Based Organization
        2. Hierarchical Document Structure
        3. Section-Aware Queries
        4. MCP Tools
      5. API Reference
        1. Enhanced Endpoints
      6. License
        1. Contributing
          1. Acknowledgments

            MCP directory API

            We provide all the information about MCP servers via our MCP API.

            curl -X GET 'https://glama.ai/api/mcp/v1/servers/jaimeferj/mcp-rag-docs'

            If you have feedback or need assistance with the MCP directory API, please join our Discord server