doc-mcp
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@doc-mcpHow do I set up MongoDB Atlas for vector search?"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
title: Doc-MCP Documentation RAG System emoji: ๐ colorFrom: indigo colorTo: purple sdk: gradio sdk_version: "5.34.2" app_file: app.py pinned: true license: mit short_description: GitHub docs into queryable RAG knowledge bases
Doc-MCP โ Documentation RAG System
Transform any GitHub documentation repository into an intelligent, queryable knowledge base โ in minutes.
Live Demo ยท Report Bug ยท Request Feature
Related MCP server: Docs Vector MCP
What is Doc-MCP?
Doc-MCP is an open-source Retrieval-Augmented Generation (RAG) system purpose-built for software documentation. Point it at any public GitHub repository, and within minutes you can ask natural language questions and receive precise, cited answers โ all powered by state-of-the-art vector embeddings and large language models.
It also exposes its search capabilities as MCP (Model Context Protocol) tools, meaning any MCP-compatible AI assistant (like Claude Desktop) can query your documentation knowledge base directly, without manual copy-paste.
Features
Feature | Description |
Semantic Search | Find answers across thousands of docs using natural language โ no keyword matching required |
AI-Powered Q&A | Get intelligent, contextual responses with exact source file citations |
Batch Processing | Ingest entire repositories with real-time progress tracking |
Incremental Updates | SHA-based change detection โ only re-embeds files that actually changed |
Repository Management | Full CRUD: view stats, delete repositories, manage ingested content |
MCP Integration | Expose documentation search as tools for any MCP-compatible AI agent |
Gradio Web UI | Clean, intuitive browser interface โ no CLI knowledge required |
Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Gradio Web UI โ
โ (Ingestion Tab | Q&A Tab | Management Tab | MCP Info) โ
โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโผโโโโโโโโโ
โ GitHub Loader โ โ Async file fetching with rate-limit handling
โโโโโโโโโโฌโโโโโโโโโ
โ Markdown files
โโโโโโโโโโผโโโโโโโโโ
โ Text Chunker โ โ Header-aware recursive splitting (CHUNK_SIZE=3072)
โโโโโโโโโโฌโโโโโโโโโ
โ Text chunks
โโโโโโโโโโผโโโโโโโโโ
โ Nebius AI โ โ BAAI/bge-en-icl embeddings (4096 dims)
โ Embeddings โ
โโโโโโโโโโฌโโโโโโโโโ
โ Vectors
โโโโโโโโโโผโโโโโโโโโ
โ MongoDB Atlas โ โ Vector Search index (cosine similarity)
โ Vector Store โ
โโโโโโโโโโฌโโโโโโโโโ
โ Top-K results
โโโโโโโโโโผโโโโโโโโโ
โ Nebius LLM โ โ Meta-Llama-3.1-70B-Instruct
โ (Answer Gen) โ
โโโโโโโโโโโโโโโโโโโQuick Start
Prerequisites
Python 3.13+
MongoDB Atlas account with Vector Search enabled
Nebius AI API key (for embeddings + LLM)
GitHub Personal Access Token (optional โ increases rate limit from 60 to 5,000 req/hr)
Installation
# Clone the repository
git clone https://github.com/tirth1263/doc-mcp.git
cd doc-mcp
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # Linux/Mac
# .venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txtConfiguration
# Copy environment template
cp .env.example .envEdit .env with your credentials:
# Required
NEBIUS_API_KEY=your_nebius_api_key_here
MONGODB_URI=mongodb+srv://username:password@cluster.mongodb.net/
# Optional
GITHUB_API_KEY=your_github_token_here
CHUNK_SIZE=3072
SIMILARITY_TOP_K=5
GITHUB_CONCURRENT_REQUESTS=10MongoDB Atlas Setup
Create a free cluster at cloud.mongodb.com
Enable Vector Search in your cluster
Run the database setup script:
python scripts/db_setup.py setupThis automatically creates:
doc_ragโ document chunks with embeddingsingested_reposโ repository metadataVector search index on the
embeddingfield
Launch
python main.pyVisit http://localhost:7860 to access the web interface.
MCP SSE endpoint: http://127.0.0.1:7860/gradio_api/mcp/sse
Usage Guide
1. Ingest Documentation
Navigate to the ๐ฅ Documentation Ingestion tab
Enter a GitHub repository URL:
langchain-ai/langchainhttps://github.com/facebook/reactowner/repo
Click Load Files โ the system fetches the full file tree
Select which markdown files to include
Click Ingest Selected Files โ watch the progress bar as files are chunked and embedded
2. Ask Questions
Go to the ๐ค AI Documentation Assistant tab
Select your ingested repository from the dropdown
Type any natural language question
Get an AI-generated answer with source file citations
Example questions:
"How do I set up authentication?"
"What are the available configuration options?"
"Show me an example of streaming responses"
"What's the difference between X and Y?"
3. Manage Repositories
Use the ๐๏ธ Repository Management tab to:
View statistics (file count, chunk count, last ingested date)
Delete repositories to free up storage
Refresh the repository list
MCP Integration
Connect any MCP-compatible AI assistant to query your documentation:
Claude Desktop Configuration
Add to your claude_desktop_config.json:
{
"mcpServers": {
"doc-mcp": {
"url": "http://127.0.0.1:7860/gradio_api/mcp/sse"
}
}
}Available MCP Tools
search_documentation
Semantic similarity search across ingested documentation.
{
"repo": "langchain-ai/langchain",
"query": "how to use memory in chains",
"top_k": 5
}ask_documentation
AI-powered Q&A with source citations.
{
"repo": "langchain-ai/langchain",
"question": "What is the difference between LLMChain and ConversationChain?"
}list_available_repos
List all ingested repositories.
{}Configuration Reference
Variable | Default | Description |
| โ | Required. Nebius AI API key |
| โ | Required. MongoDB Atlas connection string |
| โ | Optional. GitHub token for higher rate limits |
|
| Maximum characters per text chunk |
|
| Number of chunks retrieved per query |
|
| Parallel GitHub API requests |
Project Structure
doc-mcp/
โโโ app.py # Hugging Face Spaces entry point
โโโ main.py # Local development entry point
โโโ requirements.txt
โโโ .env.example
โโโ scripts/
โ โโโ db_setup.py # Database initialization & status utility
โโโ src/
โโโ config.py # Environment & constants
โโโ github_loader.py # Async GitHub file fetching
โโโ embeddings.py # Nebius embeddings + LLM answer generation
โโโ vector_store.py # MongoDB Atlas vector operations
โโโ mcp_server.py # MCP tool definitions
โโโ ui.py # Gradio web interfaceTroubleshooting
Rate limit errors from GitHub
Add a
GITHUB_API_KEYto your.env. Authenticated requests get 5,000/hr vs 60/hr unauthenticated.
No results returned from search
The MongoDB Atlas Vector Search index may still be building (can take 2-5 minutes after first setup). Check status with:
python scripts/db_setup.py status
Memory / OOM errors during ingestion
Reduce
CHUNK_SIZEin your.env(e.g.,CHUNK_SIZE=1024).
MongoDB connection errors
Verify your IP is whitelisted in Atlas Network Access
Confirm Vector Search is enabled on your cluster tier (M10+)
Double-check the connection string format in
.env
Embedding API errors
Verify your
NEBIUS_API_KEYis valid and has sufficient credits.
Tech Stack
Component | Technology |
Web UI | |
Embeddings | BAAI/bge-en-icl via Nebius AI |
LLM | Meta-Llama-3.1-70B-Instruct via Nebius AI |
Vector DB | |
GitHub API | aiohttp (async) |
Protocol |
Contributing
Contributions are welcome! Please:
Fork the repository
Create a feature branch (
git checkout -b feature/amazing-feature)Commit your changes (
git commit -m 'Add amazing feature')Push to the branch (
git push origin feature/amazing-feature)Open a Pull Request
License
Distributed under the MIT License. See LICENSE for details.
Built with Python, Gradio, MongoDB Atlas, and Nebius AI
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/tirth1263/doc-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server