Which integrations are available for this server?

Enables ingestion of documentation from any public GitHub repository, converting it into a searchable knowledge base. Uses MongoDB Atlas as a vector database for storing and retrieving document embeddings, enabling semantic search and AI-powered Q&A.

How do I use doc-mcp?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@doc-mcp How do I set up MongoDB Atlas for vector search?" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

doc-mcp

by tirth1263

Overview Schema Related Servers Score Discussions

Python

Hybrid

title: Doc-MCP Documentation RAG System emoji: 📚 colorFrom: indigo colorTo: purple sdk: gradio sdk_version: "5.34.2" app_file: app.py pinned: true license: mit short_description: GitHub docs into queryable RAG knowledge bases

Doc-MCP — Documentation RAG System

Python Gradio MongoDB Atlas Nebius AI MCP License

Transform any GitHub documentation repository into an intelligent, queryable knowledge base — in minutes.

Live Demo · Report Bug · Request Feature

Related MCP server: Docs Vector MCP

What is Doc-MCP?

Doc-MCP is an open-source Retrieval-Augmented Generation (RAG) system purpose-built for software documentation. Point it at any public GitHub repository, and within minutes you can ask natural language questions and receive precise, cited answers — all powered by state-of-the-art vector embeddings and large language models.

It also exposes its search capabilities as MCP (Model Context Protocol) tools, meaning any MCP-compatible AI assistant (like Claude Desktop) can query your documentation knowledge base directly, without manual copy-paste.

Features

Feature	Description
Semantic Search	Find answers across thousands of docs using natural language — no keyword matching required
AI-Powered Q&A	Get intelligent, contextual responses with exact source file citations
Batch Processing	Ingest entire repositories with real-time progress tracking
Incremental Updates	SHA-based change detection — only re-embeds files that actually changed
Repository Management	Full CRUD: view stats, delete repositories, manage ingested content
MCP Integration	Expose documentation search as tools for any MCP-compatible AI agent
Gradio Web UI	Clean, intuitive browser interface — no CLI knowledge required

Architecture

┌─────────────────────────────────────────────────────────────┐
│                        Gradio Web UI                        │
│   (Ingestion Tab | Q&A Tab | Management Tab | MCP Info)     │
└─────────────────┬───────────────────────────────────────────┘
                  │
         ┌────────▼────────┐
         │  GitHub Loader  │  ← Async file fetching with rate-limit handling
         └────────┬────────┘
                  │ Markdown files
         ┌────────▼────────┐
         │  Text Chunker   │  ← Header-aware recursive splitting (CHUNK_SIZE=3072)
         └────────┬────────┘
                  │ Text chunks
         ┌────────▼────────┐
         │   Nebius AI     │  ← BAAI/bge-en-icl embeddings (4096 dims)
         │   Embeddings    │
         └────────┬────────┘
                  │ Vectors
         ┌────────▼────────┐
         │  MongoDB Atlas  │  ← Vector Search index (cosine similarity)
         │  Vector Store   │
         └────────┬────────┘
                  │ Top-K results
         ┌────────▼────────┐
         │   Nebius LLM    │  ← Meta-Llama-3.1-70B-Instruct
         │  (Answer Gen)   │
         └─────────────────┘

Quick Start

Prerequisites

Python 3.13+
MongoDB Atlas account with Vector Search enabled
Nebius AI API key (for embeddings + LLM)
GitHub Personal Access Token (optional — increases rate limit from 60 to 5,000 req/hr)

Installation

# Clone the repository
git clone https://github.com/tirth1263/doc-mcp.git
cd doc-mcp

# Create virtual environment
python -m venv .venv
source .venv/bin/activate      # Linux/Mac
# .venv\Scripts\activate       # Windows

# Install dependencies
pip install -r requirements.txt

Configuration

# Copy environment template
cp .env.example .env

Edit .env with your credentials:

# Required
NEBIUS_API_KEY=your_nebius_api_key_here
MONGODB_URI=mongodb+srv://username:password@cluster.mongodb.net/

# Optional
GITHUB_API_KEY=your_github_token_here
CHUNK_SIZE=3072
SIMILARITY_TOP_K=5
GITHUB_CONCURRENT_REQUESTS=10

MongoDB Atlas Setup

Create a free cluster at cloud.mongodb.com
Enable Vector Search in your cluster
Run the database setup script:

python scripts/db_setup.py setup

This automatically creates:

doc_rag — document chunks with embeddings
ingested_repos — repository metadata
Vector search index on the embedding field

Launch

python main.py

Visit http://localhost:7860 to access the web interface.

MCP SSE endpoint: http://127.0.0.1:7860/gradio_api/mcp/sse

Usage Guide

1. Ingest Documentation

Navigate to the 📥 Documentation Ingestion tab
Enter a GitHub repository URL:
- langchain-ai/langchain
- https://github.com/facebook/react
- owner/repo
Click Load Files — the system fetches the full file tree
Select which markdown files to include
Click Ingest Selected Files — watch the progress bar as files are chunked and embedded

2. Ask Questions

Go to the 🤖 AI Documentation Assistant tab
Select your ingested repository from the dropdown
Type any natural language question
Get an AI-generated answer with source file citations

Example questions:

"How do I set up authentication?"
"What are the available configuration options?"
"Show me an example of streaming responses"
"What's the difference between X and Y?"

3. Manage Repositories

Use the 🗂️ Repository Management tab to:

View statistics (file count, chunk count, last ingested date)
Delete repositories to free up storage
Refresh the repository list

MCP Integration

Connect any MCP-compatible AI assistant to query your documentation:

Claude Desktop Configuration

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "doc-mcp": {
      "url": "http://127.0.0.1:7860/gradio_api/mcp/sse"
    }
  }
}

Available MCP Tools

`search_documentation`

Semantic similarity search across ingested documentation.

{
  "repo": "langchain-ai/langchain",
  "query": "how to use memory in chains",
  "top_k": 5
}

`ask_documentation`

AI-powered Q&A with source citations.

{
  "repo": "langchain-ai/langchain",
  "question": "What is the difference between LLMChain and ConversationChain?"
}

`list_available_repos`

List all ingested repositories.

{}

Configuration Reference

Variable	Default	Description
`NEBIUS_API_KEY`	—	Required. Nebius AI API key
`MONGODB_URI`	—	Required. MongoDB Atlas connection string
`GITHUB_API_KEY`	—	Optional. GitHub token for higher rate limits
`CHUNK_SIZE`	`3072`	Maximum characters per text chunk
`SIMILARITY_TOP_K`	`5`	Number of chunks retrieved per query
`GITHUB_CONCURRENT_REQUESTS`	`10`	Parallel GitHub API requests

Project Structure

doc-mcp/
├── app.py                  # Hugging Face Spaces entry point
├── main.py                 # Local development entry point
├── requirements.txt
├── .env.example
├── scripts/
│   └── db_setup.py         # Database initialization & status utility
└── src/
    ├── config.py           # Environment & constants
    ├── github_loader.py    # Async GitHub file fetching
    ├── embeddings.py       # Nebius embeddings + LLM answer generation
    ├── vector_store.py     # MongoDB Atlas vector operations
    ├── mcp_server.py       # MCP tool definitions
    └── ui.py               # Gradio web interface

Troubleshooting

Rate limit errors from GitHub

Add a GITHUB_API_KEY to your .env. Authenticated requests get 5,000/hr vs 60/hr unauthenticated.

No results returned from search

The MongoDB Atlas Vector Search index may still be building (can take 2-5 minutes after first setup). Check status with:
python scripts/db_setup.py status

Memory / OOM errors during ingestion

Reduce CHUNK_SIZE in your .env (e.g., CHUNK_SIZE=1024).

MongoDB connection errors

Verify your IP is whitelisted in Atlas Network Access
Confirm Vector Search is enabled on your cluster tier (M10+)
Double-check the connection string format in .env

Embedding API errors

Verify your NEBIUS_API_KEY is valid and has sufficient credits.

Tech Stack

Component	Technology
Web UI	Gradio 5
Embeddings	BAAI/bge-en-icl via Nebius AI
LLM	Meta-Llama-3.1-70B-Instruct via Nebius AI
Vector DB	MongoDB Atlas Vector Search
GitHub API	aiohttp (async)
Protocol	Model Context Protocol (MCP)

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

Distributed under the MIT License. See LICENSE for details.

Built with Python, Gradio, MongoDB Atlas, and Nebius AI

⬆ Back to top

This server cannot be installed

license - not found

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Related MCP Servers

MCP Documentation Server
RAG Systems Search Vector Databases
McKhanster
F
license
-
quality
D
maintenance
Enables semantic search and retrieval of MCP (Model Context Protocol) documentation using Redis-backed embeddings, allowing users to query and access documentation content through natural language.
Last updated 2025-11-04
Docs Vector MCP
RAG Systems Vector Databases Documentation Access
MeteorGeminy
F
license
-
quality
D
maintenance
Enables AI agents to semantically search GitHub repository documentation by automatically fetching, vectorizing, and indexing content into an Upstash Vector database. It provides a standard MCP interface for agents to retrieve relevant documentation snippets through natural language queries.
Last updated 2026-03-30
Markdown RAG MCP
RAG Systems Search
mohllal
A
license
-
quality
D
maintenance
Provides semantic search over markdown documentation using RAG, allowing natural language queries and integration with MCP clients.
Last updated 2025-10-23
1
MIT
Cortex
RAG Systems Vector Databases Knowledge & Memory
Remskill
A
license
-
quality
D
maintenance
Enables AI coding assistants to semantically search and retrieve relevant code patterns, documentation, and implementations from a codebase via MCP tools.
Last updated 2026-01-07
6
MIT

View all related MCP servers

Related MCP Connectors

agentready-mcp
Query any docs site via MCP. Submit a URL, ask questions, get cited answers.
gread
An MCP server that gives your AI access to the source code and docs of all public github repos
mcp-server
Apple Developer Documentation with Semantic Search, RAG, and AI reranking for MCP clients

View all MCP Connectors

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/tirth1263/doc-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server