docrag
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@docragsearch documentation for BrightSign player setup"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
DocRAG - AI Documentation RAG System
A lightweight, installable Python package that provides RAG (Retrieval Augmented Generation) access to technical documentation through an MCP (Model Context Protocol) server. This enables LLMs to search and retrieve relevant documentation on-demand.
Features
🚀 Single pip-installable package with CLI and MCP server
📚 Project-based documentation collections (BrightSign, Venafi, Qumu, web frameworks)
🔍 Local vector database with efficient embedding using LanceDB
📥 Easy documentation ingestion from local files or scraped sources
🤖 Designed for use with Claude Code via MCP
Related MCP server: Gemini Docs MCP Server
Installation
Prerequisites
Python 3.10+
pipx (recommended) or pip
git (for updates)
Recommended: Install globally with pipx
# Install globally with pipx in editable mode (keeps dependencies isolated)
pipx install -e /opt/claude-ops/doc-rag
# Verify installation
docrag --help
# Optional: Install Playwright browsers (for scraping)
pipx runpip docrag install playwright
pipx run --spec docrag playwright install chromiumNote: The -e flag installs in "editable" mode, which means changes to the source code are immediately reflected without reinstalling.
Alternative: Install from source (development)
# Clone or navigate to the project directory
cd /opt/claude-ops/doc-rag
# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate
# Install in development mode
pip install -e ".[dev]"
# Install Playwright browsers (for scraping)
playwright install chromiumUpdating DocRAG
Option 1: Using the Update Script (Recommended)
cd /opt/claude-ops/doc-rag
./update.shThis script will:
Pull latest changes from git
Detect your installation method (pipx or pip)
Reinstall only if necessary (non-editable installs)
Handle editable installs automatically
Option 2: Using Make
cd /opt/claude-ops/doc-rag
make updateOption 3: Manual Update
For editable installs (installed with -e):
cd /opt/claude-ops/doc-rag
git pull origin main
# No reinstall needed - changes are already active!For regular installs (installed without -e):
cd /opt/claude-ops/doc-rag
git pull origin main
pipx uninstall docrag && pipx install -e .
# or for pip: pip install -e . --force-reinstallVerifying Updates
# Check git status
cd /opt/claude-ops/doc-rag
git log -1 --oneline
# Test the installation
docrag --version
docrag --helpQuick Start
1. Initialize DocRAG
docrag initThis creates the configuration directory at ~/.docrag/ with the following structure:
~/.docrag/
├── config.json # Global configuration
├── collections/ # Documentation collections
└── vectordb/ # LanceDB storage2. Add a Documentation Collection
# Add documentation from a local directory
docrag add brightsign --source /path/to/brightsign/docs --description "BrightSign player documentation"
# Or add without source initially
docrag add venafi --description "Venafi TPP API documentation"3. List Collections
docrag list4. Search Documentation (CLI Testing)
# Search across all active collections
docrag search "how to initialize the player"
# Search a specific collection
docrag search "authentication methods" --collection venafi --limit 105. Start the MCP Server
docrag serveThe server will listen on stdio for connections from Claude Code.
CLI Commands
docrag init
Initialize DocRAG configuration directory.
docrag add <name>
Add a new documentation collection.
Options:
-s, --source PATH- Source directory containing documentation-d, --description TEXT- Description of the collection
Example:
docrag add qumu --source ~/docs/qumu --description "Qumu video platform docs"docrag list
List all documentation collections with their status.
docrag update <name> <source>
Update an existing collection with new documents.
Example:
docrag update brightsign ~/docs/brightsign/updateddocrag remove <name>
Remove a documentation collection (with confirmation).
docrag search <query>
Search documentation from the CLI for testing.
Options:
-c, --collection TEXT- Specific collection to search-l, --limit INTEGER- Number of results (default: 5)
Example:
docrag search "websocket connection" --collection brightsigndocrag serve
Start the MCP server for Claude Code integration.
docrag scrape <url>
Scrape documentation from websites.
Options:
-o, --output PATH- Output directory (required)--smart, --use-crawl4ai- Use AI-powered Crawl4AI scraper (recommended)--no-llm- Disable LLM extraction (faster, still better than basic)--llm-provider TEXT- LLM provider (default: openai/gpt-4o-mini)--playwright- Use Playwright for dynamic content (basic scraper)--max-pages INTEGER- Maximum pages to scrape (default: 1000)
Examples:
# Basic scraping
docrag scrape https://docs.example.com --output ./docs
# Smart scraping with AI (recommended)
docrag scrape https://docs.example.com --output ./docs --smart
# Smart scraping without LLM (faster, no API key needed)
docrag scrape https://docs.example.com --output ./docs --smart --no-llm
# Limit pages
docrag scrape https://docs.example.com --output ./docs --max-pages 100Smart Scraping Features:
✨ AI-powered content extraction
🎯 Automatically removes navigation and boilerplate
📊 Better handling of complex layouts
🧠 Semantic understanding of documentation structure
⚡ Faster and more accurate than basic scraping
To enable smart scraping:
# Install Crawl4AI
pipx inject docrag crawl4ai
# Optional: Set OpenAI API key for LLM-powered extraction
export OPENAI_API_KEY='your-key-here'Using with Claude Code
1. Configure Claude Code MCP Settings
Add DocRAG to your Claude Code MCP configuration (~/.config/claude-code/mcp_settings.json or similar):
{
"mcpServers": {
"docrag": {
"command": "docrag",
"args": ["serve"],
"env": {}
}
}
}If using the full path:
{
"mcpServers": {
"docrag": {
"command": "/home/claude-admin/.local/bin/docrag",
"args": ["serve"],
"env": {}
}
}
}2. Restart Claude Code
After adding the configuration, restart Claude Code to load the MCP server.
3. Use in Claude Code
Once connected, Claude Code can use two tools:
search_docs: Search through indexed documentation collections
Query: "how to handle authentication in BrightSign"
Collection: (optional) "brightsign"
Limit: (optional) 5list_collections: List all available documentation collections
Claude will automatically use these tools when working on projects that need documentation access.
Architecture
Core Components
ConfigManager (
config.py) - Manages configuration and collection metadataEmbeddingGenerator (
embeddings.py) - Generates embeddings using sentence-transformersVectorDB (
vectordb.py) - LanceDB wrapper for vector storage and searchDocumentIndexer (
indexer.py) - Intelligent document chunking and indexingDocRAGServer (
server.py) - MCP server implementationCLI (
cli.py) - Command-line interface
Technical Stack
MCP Framework: Official Anthropic MCP package
Vector Database: LanceDB (lightweight, file-based, performant)
Embeddings: sentence-transformers with all-MiniLM-L6-v2 model (384 dims, fast, local)
Text Processing: langchain-text-splitters for intelligent chunking
CLI: Click for user-friendly commands
Web Scraping: Playwright + BeautifulSoup4 for scraping
Data Structure
~/.docrag/
├── config.json # Global configuration
│ └── {
│ "active_collections": ["brightsign", "venafi"],
│ "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
│ "chunk_size": 512,
│ "chunk_overlap": 50
│ }
├── collections/
│ ├── brightsign/
│ │ ├── metadata.json # Collection metadata
│ │ └── source_docs/ # Original documents
│ ├── venafi/
│ └── qumu/
└── vectordb/
└── lancedb/ # Vector storage (one table per collection)Configuration
Global configuration is stored in ~/.docrag/config.json:
{
"active_collections": ["brightsign", "venafi"],
"embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
"chunk_size": 512,
"chunk_overlap": 50
}Collection metadata is stored in ~/.docrag/collections/<name>/metadata.json:
{
"name": "brightsign",
"source_type": "local",
"source_path": "/path/to/docs",
"created_at": "2025-10-28T10:00:00",
"updated_at": "2025-10-28T10:00:00",
"doc_count": 150,
"description": "BrightSign player documentation"
}Development
Project Structure
docrag/
├── docrag/
│ ├── __init__.py
│ ├── cli.py # CLI commands
│ ├── server.py # MCP server
│ ├── indexer.py # Document indexing
│ ├── vectordb.py # Vector database
│ ├── embeddings.py # Embeddings
│ ├── config.py # Configuration
│ └── scrapers/ # Web scrapers
│ ├── __init__.py
│ ├── base.py
│ └── generic.py
├── tests/
├── pyproject.toml
├── README.md
└── DOCRAG_MVP_BUILD_GUIDE.mdRunning Tests
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytestCode Formatting
# Format with black
black docrag/
# Lint with ruff
ruff check docrag/Troubleshooting
"DocRAG not initialized"
Run docrag init first to create the configuration directory.
"No collections found"
Add a collection with docrag add <name> --source <path>.
"Model download fails"
The first time you run DocRAG, it will download the sentence-transformers model (~100MB). Ensure you have internet connectivity.
"Playwright not installed"
If using scrapers, run playwright install chromium.
Future Enhancements
Web scraper CLI commands
Support for more file types (PDF, HTML, RST)
Incremental indexing (only index changed files)
Collection activation/deactivation
Collection statistics and health checks
Export/import collections
Cloud sync for collections
Advanced search filters
License
MIT
Author
Ryan - Built for homelab and Claude Code integration
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/ryan-m-bishop/docrag'
If you have feedback or need assistance with the MCP directory API, please join our Discord server