Utilizes Google Gemini's embedding models to provide semantic search and document retrieval capabilities.
Provides specialized tools for indexing, updating, and performing semantic search across Markdown-formatted documentation.
Integrates with local Ollama instances to generate vector embeddings for semantic search over markdown files.
Uses PostgreSQL with the pgvector extension for scalable storage and retrieval of document embeddings.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Markdown RAGFind instructions on how to set up the PostgreSQL database and pgvector."
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Markdown RAG
A Retrieval Augmented Generation (RAG) system for markdown documentation with intelligent rate limiting and MCP server integration.
Features
Semantic Search: Vector-based similarity search using Google Gemini or Ollama embeddings
Markdown-Aware Chunking: Intelligent document splitting that preserves semantic boundaries
Rate Limiting: Sophisticated sliding window algorithm with token counting and batch optimization
MCP Server: Model Context Protocol server for AI assistant integration
PostgreSQL Vector Store: Scalable storage using pgvector extension
Incremental Updates: Smart deduplication prevents reprocessing existing documents
Production Ready: Type-safe configuration, comprehensive logging, and error handling
Installation
git clone https://github.com/yourusername/markdown-rag.gitPrerequisites
Python 3.11+
PostgreSQL 12+ with pgvector extension installed
Google Gemini API key (if using Google embeddings)
Ollama (if using local embeddings)
MCP-compatible client (Claude Desktop, Cline, etc.)
Quick Start
1. (Optional) Set Up PostgreSQL
createdb embeddingsIf you do not create a database, the tool will create one for you. The pgvector extension will be automatically enabled when you first run the tool.
2. Ingest Documents
cd markdown-rag
# Use Google Gemini
uv run markdown-rag /path/to/docs --command ingest --engine google
# Or use Ollama
uv run markdown-rag /path/to/docs --command ingest --engine ollamaRequired environment variables (create .env or export):
POSTGRES_PASSWORD=your_password
GOOGLE_API_KEY=your_gemini_api_key # Only if using Google engine3. Configure MCP Client
Add to your MCP client configuration (e.g., claude_desktop_config.json). The client will automatically start the server.
Minimal configuration:
{
"mcpServers": {
"markdown-rag": {
"command": "uv",
"args": [
"run",
"--directory"
"/absolute/path/to/markdown-rag",
"markdown-rag",
"/absolute/path/to/docs",
"--command",
"mcp"
],
"env": {
"POSTGRES_PASSWORD": "your_password",
"GOOGLE_API_KEY": "your_api_key"
}
}
}
}Full configuration:
{
"mcpServers": {
"markdown-rag": {
"command": "uv",
"args": [
"run",
"--directory"
"/absolute/path/to/markdown-rag",
"markdown-rag",
"/absolute/path/to/docs",
"--command",
"mcp"
],
"env": {
"POSTGRES_USER": "postgres_username",
"POSTGRES_PASSWORD": "your_password",
"DISABLED_TOOLS": "delete_document,update_document",
"CHUNK_OVERLAP": 50,
# Google Configuration
"GOOGLE_API_KEY": "your_api_key",
"GOOGLE_MODEL": "models/gemini-embedding-001",
"RATE_LIMIT_REQUESTS_PER_DAY": "1000",
"RATE_LIMIT_REQUESTS_PER_MINUTE": "100",
# Ollama Configuration
"OLLAMA_HOST": "http://localhost:11434",
"OLLAMA_MODEL": "mxbai-embed-large",
}
}
}
}4. Query via MCP
The server exposes several tools:
query
Semantic search over documentation
Arguments:
query(string),num_results(integer, optional, default: 4)
list_documents
List all ingested documents
Arguments: none
delete_document
Remove a document from the index
Arguments:
filename(string)
update_document
Re-ingest a specific document
Arguments:
filename(string)
refresh_index
Scan directory and ingest new/modified files
Arguments: none
To disable tools (e.g., in production), set DISABLED_TOOLS environment variable:
DISABLED_TOOLS=delete_document,update_document,refresh_indexConfiguration
Environment Variables
Variable | Default | Required | Description |
|
| No | PostgreSQL username |
| - | Yes | PostgreSQL password |
|
| No | PostgreSQL host |
|
| No | PostgreSQL port |
|
| No | Database name |
| - | Yes* | Google Gemini API key (*if using Google) |
|
| No | Google embedding model |
|
| No | Ollama host URL |
|
| No | Ollama embedding model |
|
| No | Max API requests per minute |
|
| No | Max API requests per day |
| - | No | Comma-separated list of tools to disable |
Command Line Options
uv run markdown-rag <directory> [OPTIONS]Arguments:
<directory>: Path to markdown files directory (required)
Options:
-c, --command {ingest|mcp}: Operation mode (default:mcp)ingest: Process and store documentsmcp: Start MCP server for queries
-e, --engine {google|ollama}: Embedding engine (default:google)-l, --level {debug|info|warning|error}: Logging level (default:warning)
Examples:
uv run markdown-rag ./docs --command ingest --level info --engine ollama
uv run markdown-rag /var/docs -c ingest -l debug -e googleArchitecture
System Components
The following diagram shows how the system components interact:
graph TD
A[MCP Client<br/>Claude, ChatGPT, etc.] --> B[FastMCP Server<br/>Tool: query]
B --> C[MarkdownRAG]
C --> D[Text Splitters]
C --> E[Rate Limited Embeddings]
E --> F[Google Gemini<br/>Embeddings API]
C --> G[PostgreSQL<br/>+ pgvector]Rate Limiting Strategy
The system implements a dual-window sliding algorithm:
Request Limits: Tracks requests per minute and per day
Token Limits: Counts tokens before API calls
Batch Optimization: Calculates maximum safe batch sizes
Smart Waiting: Minimal delays with automatic retry
See Architecture Documentation for detailed diagrams.
Development
Setup Development Environment
git clone https://github.com/yourusername/markdown-rag.git
cd markdown-rag
uv syncRun Linters
uv run ruff check .
uv run mypy .Code Style
This project follows:
Linting: Ruff with Google docstring convention
Type Checking: mypy with strict settings
Line Length: 79 characters
Import Sorting: Alphabetical with isort
Project Structure
markdown-rag/
├── src/markdown_rag/
│ ├── __init__.py
│ ├── main.py # Entry point and MCP server
│ ├── config.py # Environment and CLI configuration
│ ├── models.py # Pydantic data models
│ ├── rag.py # Core RAG logic
│ ├── embeddings.py # Rate-limited embeddings wrapper
│ └── rate_limiter.py # Rate limiting algorithm
├── docs/
│ ├── api-reference.md # API documentation
│ ├── architecture.md # Architecture documentation
│ ├── mcp-integration.md # MCP server integration guide
│ └── user-guide.md # User guide
├── pyproject.toml # Project configuration
├── .env # Environment variables (not in git)
└── README.mdTroubleshooting
Common Issues
"Failed to start store: connection refused"
PostgreSQL not running or wrong connection settings. Check your connection parameters in environment variables.
"Rate limit exceeded"
Adjust rate limits in environment variables:
RATE_LIMIT_REQUESTS_PER_MINUTE=50
RATE_LIMIT_REQUESTS_PER_DAY=500"pgvector extension not found"
The pgvector PostgreSQL extension is not installed. Follow the pgvector installation guide for your platform.
"Skipping all files (already in vector store)"
Expected behavior. The system prevents duplicate ingestion.
Logging
uv run markdown-rag ./docs --command ingest --level debugSecurity
Best Practices
Never commit - Add to
.gitignoreUse environment variables for all secrets
Restrict database access - Use firewall rules
Rotate API keys regularly
Use read-only database users for query-only deployments
Secrets Management
All secrets use SecretStr type to prevent accidental logging:
from pydantic import SecretStr
api_key = SecretStr("secret_value")Contributing
Fork the repository
Create a feature branch (
git checkout -b feature/amazing-feature)Make changes and add tests
Run linters (
uv run ruff check .)Run type checks (
uv run mypy .)Commit changes (
git commit -m 'feat: add amazing feature')Push to branch (
git push origin feature/amazing-feature)Open a Pull Request
Commit Message Format
Follow conventional commits:
feat: add new feature
fix: resolve bug
docs: update documentation
refactor: improve code structure
test: add tests
chore: update dependenciesTODOS
Management of embeddings store via MCP tool.
Add support for other embeddings models.
Add support for other vector stores.
License
This project is licensed under the MIT License.
Acknowledgments
LangChain - RAG framework
Google Gemini - Embedding model
pgvector - Vector similarity search
FastMCP - MCP server framework
Support
Documentation: docs/architecture.md
Issues: GitHub Issues
Discussions: GitHub Discussions
This server cannot be installed
Resources
Looking for Admin?
Admins can modify the Dockerfile, update the server description, and track usage metrics. If you are the server author, to access the admin panel.