Provides semantic search over local Markdown documentation using hybrid retrieval, combining vector embeddings, keyword search, and graph traversal to query technical documentation, notes, and wikis.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Markdown RAG Documentationfind documentation about authentication setup"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
mcp-markdown-ragdocs
A Model Context Protocol server that provides semantic search over local Markdown documentation using hybrid retrieval.
What it is
This is an MCP server that indexes local Markdown files and exposes a query_documents tool for hybrid semantic search. The server identifies relevant document sections using semantic search, keyword matching, and graph traversal, enabling efficient discovery without loading entire documentation collections into LLM context.
Why it exists
Technical documentation, personal notes, and project wikis are typically stored as Markdown files. Searching these collections manually or with grep is inefficient. This server provides a conversational interface to query documentation using natural language while automatically keeping the index synchronized with file changes.
Existing RAG solutions require manual database setup, explicit indexing steps, and ongoing maintenance. This server eliminates that friction with automatic file watching, zero-configuration defaults, and built-in index versioning.
Features
Hybrid search combining semantic embeddings (FAISS), keyword search (Whoosh), and graph traversal (NetworkX)
Community-based boosting: Louvain clustering detects document communities; co-community results receive score boost
Score-aware dynamic fusion: Adjusts vector/keyword weights based on score variance per query
HyDE (Hypothetical Document Embeddings):
search_with_hypothesistool for vague queriesCross-encoder re-ranking for improved precision (optional, ~50ms latency)
Query expansion via concept vocabulary for better recall
Git history search: Semantic search over commit history with metadata and delta context
Multi-project support: Manage isolated indices for multiple projects on one machine with automatic project detection
Server-Sent Events (SSE) streaming for real-time response delivery
CLI query command with rich formatted output
Automatic file watching with debounced incremental indexing
Zero-configuration operation with sensible defaults
Index versioning with automatic rebuild on configuration changes
Pluggable parser architecture: Markdown and plain text (.txt) support out-of-the-box
Rich Markdown parsing: frontmatter, wikilinks, tags, transclusions
Reciprocal Rank Fusion for multi-strategy result merging
Recency bias for recently modified documents
Memory Management System: Persistent AI memory bank with cross-corpus linking, recency boost, and ghost node graph traversal
Local-first architecture with no external dependencies
Installation
Requires Python 3.13+.
Quick Start
For VS Code / MCP Clients (Recommended)
Start the stdio-based MCP server for use with VS Code or other MCP clients:
The server will:
Scan for
*.mdand*.txtfiles in the current directoryBuild vector, keyword, and graph indices
Start file watching for automatic updates
Expose query_documents tool via stdio transport
See MCP Integration below for VS Code configuration.
For HTTP API / Development
Start the HTTP server on default port 8000:
The server will:
Index documents (same as mcp command)
Expose HTTP API at
http://127.0.0.1:8000Provide REST endpoints for queries
See API Endpoints below for HTTP usage.
Basic Usage
Configuration
Create .mcp-markdown-ragdocs/config.toml in your project directory or at ~/.config/mcp-markdown-ragdocs/config.toml:
The server searches for configuration files in this order:
.mcp-markdown-ragdocs/config.tomlin current directory.mcp-markdown-ragdocs/config.tomlin parent directories (walks up to root)~/.config/mcp-markdown-ragdocs/config.toml(global fallback)
This supports monorepo workflows where you can place a shared configuration in the repository root.
If no configuration file exists, the server uses these defaults:
Documents path:
.(current directory)Server:
127.0.0.1:8000Index storage:
.index_data/
CLI Commands
Start MCP Server (stdio)
Starts stdio-based MCP server for VS Code and compatible MCP clients. Runs persistently until stopped.
Start HTTP Server
Starts HTTP API server on port 8000 (default). Override with:
Query Documents (CLI)
Query documents directly from command line:
With options:
Configuration Management
Check your configuration:
Force a full index rebuild:
Command | Purpose | Use When |
| Stdio MCP server | Integrating with VS Code or MCP clients |
| HTTP API server | Development, testing, or HTTP-based integrations |
| CLI query | Scripting or quick document searches |
| Validate config | Debugging configuration issues |
| Force reindex (documents, git commits, vocabulary) | Config changes, corrupted indices, or force rebuild |
MCP Integration
VS Code Configuration
Configure the MCP server in VS Code user settings or workspace settings.
File: .vscode/settings.json or ~/.config/Code/User/mcp.json
With project override:
Claude Desktop Configuration
File: ~/Library/Application Support/Claude/claude_desktop_config.json (macOS)
Available Tools
The server exposes two MCP tools:
query_documents: Search indexed documents using hybrid search and return ranked document chunks.
search_git_history: Search git commit history using natural language queries. Returns relevant commits with metadata, message, and diff context.
search_with_hypothesis: Search using a hypothesis about expected documentation content. Embeds the hypothesis directly for semantic search (HyDE technique). Useful for vague queries where describing expected content yields better results than the query itself.
Parameters:
query(required): Natural language query or questiontop_n(optional): Maximum results to return (1-100, default: 5)min_score(optional): Minimum confidence threshold (0.0-1.0, default: 0.3)similarity_threshold(optional): Semantic deduplication threshold (0.5-1.0, default: 0.85)show_stats(optional): Show compression statistics (default: false)
Note: Compression is enabled by default (min_score=0.3, max_chunks_per_doc=2, dedup_enabled=true) to reduce token overhead by 40-60%. Results use compact format: [N] file § section (score)\ncontent
Usage Pattern:
Call
query_documentsto identify relevant sectionsReview returned chunks to locate specific files and sections
Use file reading tools to access full document context
Example query from MCP client:
The server returns ranked document chunks with file paths, header hierarchies, and relevance scores.
search_git_history: Search git commit history using natural language queries.
Parameters:
query(required): Natural language query describing commits to findtop_n(optional): Maximum commits to return (1-100, default: 5)min_score(optional): Minimum relevance threshold (0.0-1.0, default: 0.0)file_pattern(optional): Glob pattern to filter by changed files (e.g.,src/**/*.py)author(optional): Filter commits by author name or emailafter(optional): Unix timestamp to filter commits after this datebefore(optional): Unix timestamp to filter commits before this date
Note: Git history search indexes up to 200 lines of diff per commit. Indexing processes 60 commits/sec on average. Search latency averages 5ms for 10k commits.
Example query:
The server returns ranked commits with hash, title, author, timestamp, message, files changed, and truncated diff.
Memory Management
The server supports an AI memory bank for persistent cross-session knowledge storage.
Enable in configuration:
Available tools:
create_memory,read_memory,update_memory,append_memory,delete_memory: CRUD operations (system auto-generates frontmatter forcreate_memory)search_memories: Hybrid search with recency boost and tag/type filteringsearch_linked_memories: Find memories linking to a specific document via ghost nodesget_memory_stats: Memory bank statisticsmerge_memories: Consolidate multiple memories into one
See Memory Management for complete documentation.
API Endpoints
Health check:
Server status (document count, queue size, failed files):
Query endpoint (standard):
Query endpoint (streaming SSE):
The streaming endpoint returns Server-Sent Events:
Example response (standard endpoint):
MCP Stdio Format (Compact):
For MCP clients (VS Code, Claude Desktop), results use compact format:
Factual queries (e.g., "getUserById function", "configure auth") truncate content to 200 characters: