RAG Documentation MCP Server
An MCP server implementation that provides tools for retrieving and processing documentation through vector search, enabling AI assistants to augment their responses with relevant documentation context.
Table of Contents
Related MCP server: RAG Documentation MCP Server
Features
Tools
search_documentation
Search through the documentation using vector search
Returns relevant chunks of documentation with source information
list_sources
List all available documentation sources
Provides metadata about each source
extract_urls
Extract URLs from text and check if they're already in the documentation
Useful for preventing duplicate documentation
remove_documentation
Remove documentation from a specific source
Cleans up outdated or irrelevant documentation
list_queue
List all items in the processing queue
Shows status of pending documentation processing
run_queue
Process all items in the queue
Automatically adds new documentation to the vector store
clear_queue
Clear all items from the processing queue
Useful for resetting the system
add_documentation
Add new documentation directly to the system by providing a URL
Automatically fetches, processes, and indexes the content
Supports various web page formats and extracts relevant content
Chunks content intelligently for optimal retrieval
Required parameter:
url(must include protocol, e.g., https://)
add_repository
Index a local code repository for documentation
Configure include/exclude patterns for files and directories
Supports different chunking strategies based on file types
Uses asynchronous processing to avoid MCP timeouts with large repositories
Provides detailed progress logging (heartbeat) to
stderrduring indexingRequired parameter:
path(absolute path to repository)
list_repositories
List all indexed repositories with their configurations
Shows include/exclude patterns and watch status
update_repository
Re-index a repository with updated configuration
Can modify include/exclude patterns and other settings
Provides detailed progress logging (heartbeat) to
stderrduring re-indexingRequired parameter:
name(repository name)
remove_repository
Remove a repository from the index
Deletes all associated documents from the vector database
Required parameter:
name(repository name)
watch_repository
Start or stop watching a repository for changes
Automatically updates the index when files change
Required parameters:
name(repository name) andaction("start" or "stop")
get_indexing_status
Get the current status of repository indexing operations
Provides detailed information about ongoing or completed indexing processes
Shows progress percentage, file counts, and timing information
Optional parameter:
name(repository name) - if not provided, returns status for all repositories
Quick Start
The RAG Documentation tool is designed for:
Enhancing AI responses with relevant documentation
Building documentation-aware AI assistants
Creating context-aware tooling for developers
Implementing semantic documentation search
Augmenting existing knowledge bases
Docker Compose Setup
The project includes a docker-compose.yml file for easy containerized deployment. To start the services:
To stop the services:
Web Interface
The system includes a web interface that can be accessed after starting the Docker Compose services:
Open your browser and navigate to:
http://localhost:3030The interface provides:
Real-time queue monitoring
Documentation source management
Search interface for testing queries
System status and health checks
Configuration
Embeddings Configuration
The system uses Ollama as the default embedding provider for local embeddings generation, with OpenAI available as a fallback option. This setup prioritizes local processing while maintaining reliability through cloud-based fallback.
Environment Variables
EMBEDDING_PROVIDER: Choose the primary embedding provider ('ollama' or 'openai', default: 'ollama')EMBEDDING_MODEL: Specify the model to use (optional)For OpenAI: defaults to 'text-embedding-3-small'
For Ollama: defaults to 'nomic-embed-text'
OPENAI_API_KEY: Required when using OpenAI as providerFALLBACK_PROVIDER: Optional backup provider ('ollama' or 'openai')FALLBACK_MODEL: Optional model for fallback provider
Cline Configuration
Add this to your cline_mcp_settings.json:
Claude Desktop Configuration
Add this to your claude_desktop_config.json:
Default Configuration
The system uses Ollama by default for efficient local embedding generation. For optimal reliability:
Install and run Ollama locally
Configure OpenAI as fallback (recommended):
{ // Ollama is used by default, no need to specify EMBEDDING_PROVIDER "EMBEDDING_MODEL": "nomic-embed-text", // optional "FALLBACK_PROVIDER": "openai", "FALLBACK_MODEL": "text-embedding-3-small", "OPENAI_API_KEY": "your-api-key-here" }
This configuration ensures:
Fast, local embedding generation with Ollama
Automatic fallback to OpenAI if Ollama fails
No external API calls unless necessary
Note: The system will automatically use the appropriate vector dimensions based on the provider:
Ollama (nomic-embed-text): 768 dimensions
OpenAI (text-embedding-3-small): 1536 dimensions
Documentation Management
Direct vs. Queue-Based Documentation Addition
The system provides two complementary approaches for adding documentation:
Direct Addition (
Immediately processes and indexes the documentation from a URL
Best for adding individual documentation sources
Provides immediate feedback on processing success/failure
Example usage:
add_documentationwithurl: "https://example.com/docs"
Queue-Based Processing
Add URLs to a processing queue (
extract_urlswithadd_to_queue: true)Process multiple URLs in batch later (
run_queue)Better for large-scale documentation ingestion
Allows for scheduled processing of many documentation sources
Provides resilience through the queue system
Choose the approach that best fits your documentation management needs. For small numbers of important documents, direct addition provides immediate results. For large documentation sets or recursive crawling, the queue-based approach offers better scalability.
Local Repository Indexing
The system supports indexing local code repositories, making their content searchable alongside web documentation:
Repository Configuration
Define which files to include/exclude using glob patterns
Configure chunking strategies per file type
Set up automatic change detection with watch mode
File Processing
Files are processed based on their type and language
Code is chunked intelligently to preserve context
Metadata like file path and language are preserved
Asynchronous Processing
Large repositories are processed asynchronously to avoid MCP timeouts
Indexing continues in the background after the initial response
Progress can be monitored using the
get_indexing_statustoolSmaller batch sizes (50 chunks per batch) improve responsiveness
Change Detection
Repositories can be watched for changes
Modified files are automatically re-indexed
Deleted files are removed from the index
Example usage:
After starting the indexing process, you can check its status:
This will return detailed information about the indexing progress:
Repository Configuration File
The system supports a repositories.json configuration file that allows you to define repositories to be automatically indexed at startup:
The configuration file is automatically updated when repositories are added, updated, or removed using the repository management tools. You can also manually edit the file to configure repositories before starting the server. The paths within the configuration file, such as the path for each repository and the implicit location of repositories.json itself, are resolved relative to the project root directory where the server is executed.
Configuration Options:
repositories: Array of repository configurationspath: Absolute path to the repository directory "name": "my-project", "include": ["/*.js", "/.ts", "**/.md"], "exclude": ["/node_modules/", "/.git/"], "watchMode": true, "watchInterval": 60000, "chunkSize": 1000, "fileTypeConfig": { ".js": { "include": true, "chunkStrategy": "semantic" }, ".ts": { "include": true, "chunkStrategy": "semantic" }, ".md": { "include": true, "chunkStrategy": "semantic" } } } ], "autoWatch": true }
Restart the MCP server
If the issue persists, check for other processes using the port:
You can also change the default port in the configuration if needed
Missing Tools in Claude Desktop
If certain tools (like add_documentation) are not appearing in Claude Desktop:
Verify that the tool is properly registered in the server's
handler-registry.tsfileMake sure the tool is included in the
ListToolsRequestSchemahandler responseCheck that your Claude Desktop configuration includes the tool in the
autoApprovearrayRestart the Claude Desktop application and the MCP server
Check the server logs for any errors related to tool registration
The most common cause of missing tools is that they are registered as handlers but not included in the tools array returned by the ListToolsRequestSchema handler.
Timeout Issues with Large Repositories
If you encounter timeout errors when indexing large repositories:
The system now uses asynchronous processing to avoid MCP timeouts
When adding a repository with
add_repository, the indexing will continue in the backgroundUse the
get_indexing_statustool to monitor progressIf you still experience issues, try these solutions:
Reduce the scope of indexing with more specific include/exclude patterns
Break up very large repositories into smaller logical units
Increase the batch size in the code if your system has more resources available
Check system resources (memory, CPU) during indexing to identify bottlenecks