Markdown RAG

README.md•10.5 KiB

# Markdown RAG A Retrieval Augmented Generation (RAG) system for markdown documentation with intelligent rate limiting and MCP server integration. ## Features - **Semantic Search**: Vector-based similarity search using Google Gemini or Ollama embeddings - **Markdown-Aware Chunking**: Intelligent document splitting that preserves semantic boundaries - **Rate Limiting**: Sophisticated sliding window algorithm with token counting and batch optimization - **MCP Server**: Model Context Protocol server for AI assistant integration - **PostgreSQL Vector Store**: Scalable storage using pgvector extension - **Incremental Updates**: Smart deduplication prevents reprocessing existing documents - **Production Ready**: Type-safe configuration, comprehensive logging, and error handling ## Installation ```bash git clone https://github.com/yourusername/markdown-rag.git ``` ## Prerequisites - Python 3.11+ - PostgreSQL 12+ with [pgvector extension installed](https://github.com/pgvector/pgvector#installation) - Google Gemini API key (if using Google embeddings) - Ollama (if using local embeddings) - MCP-compatible client (Claude Desktop, Cline, etc.) ## Quick Start ### 1. (Optional) Set Up PostgreSQL ```bash createdb embeddings ``` If you do not create a database, the tool will create one for you. The pgvector extension will be automatically enabled when you first run the tool. ### 2. Ingest Documents ```bash cd markdown-rag # Use Google Gemini uv run markdown-rag /path/to/docs --command ingest --engine google # Or use Ollama uv run markdown-rag /path/to/docs --command ingest --engine ollama ``` **Required environment variables** (create `.env` or export): ```env POSTGRES_PASSWORD=your_password GOOGLE_API_KEY=your_gemini_api_key # Only if using Google engine ``` ### 3. Configure MCP Client Add to your MCP client configuration (e.g., `claude_desktop_config.json`). The client will automatically start the server. **Minimal configuration:** ```json { "mcpServers": { "markdown-rag": { "command": "uv", "args": [ "run", "--directory" "/absolute/path/to/markdown-rag", "markdown-rag", "/absolute/path/to/docs", "--command", "mcp" ], "env": { "POSTGRES_PASSWORD": "your_password", "GOOGLE_API_KEY": "your_api_key" } } } } ``` **Full configuration:** ```json { "mcpServers": { "markdown-rag": { "command": "uv", "args": [ "run", "--directory" "/absolute/path/to/markdown-rag", "markdown-rag", "/absolute/path/to/docs", "--command", "mcp" ], "env": { "POSTGRES_USER": "postgres_username", "POSTGRES_PASSWORD": "your_password", "DISABLED_TOOLS": "delete_document,update_document", "CHUNK_OVERLAP": 50, # Google Configuration "GOOGLE_API_KEY": "your_api_key", "GOOGLE_MODEL": "models/gemini-embedding-001", "RATE_LIMIT_REQUESTS_PER_DAY": "1000", "RATE_LIMIT_REQUESTS_PER_MINUTE": "100", # Ollama Configuration "OLLAMA_HOST": "http://localhost:11434", "OLLAMA_MODEL": "mxbai-embed-large", } } } } ``` ### 4. Query via MCP The server exposes several tools: #### query - Semantic search over documentation - Arguments: `query` (string), `num_results` (integer, optional, default: 4) #### list_documents - List all ingested documents - Arguments: none #### delete_document - Remove a document from the index - Arguments: `filename` (string) #### update_document - Re-ingest a specific document - Arguments: `filename` (string) #### refresh_index - Scan directory and ingest new/modified files - Arguments: none To disable tools (e.g., in production), set `DISABLED_TOOLS` environment variable: ```env DISABLED_TOOLS=delete_document,update_document,refresh_index ``` ## Configuration ### Environment Variables | Variable | Default | Required | Description | | -------------------------------- | ------------ | -------- | ---------------------------------------- | | `POSTGRES_USER` | `postgres` | No | PostgreSQL username | | `POSTGRES_PASSWORD` | - | **Yes** | PostgreSQL password | | `POSTGRES_HOST` | `localhost` | No | PostgreSQL host | | `POSTGRES_PORT` | `5432` | No | PostgreSQL port | | `POSTGRES_DB` | `[engine]_embeddings` | No | Database name | | `GOOGLE_API_KEY` | - | **Yes*** | Google Gemini API key (*if using Google) | | `GOOGLE_MODEL` | `models/gemini...` | No | Google embedding model | | `OLLAMA_HOST` | `http://localhost...` | No | Ollama host URL | | `OLLAMA_MODEL` | `mxbai-embed-large` | No | Ollama embedding model | | `RATE_LIMIT_REQUESTS_PER_MINUTE` | `100` | No | Max API requests per minute | | `RATE_LIMIT_REQUESTS_PER_DAY` | `1000` | No | Max API requests per day | | `DISABLED_TOOLS` | - | No | Comma-separated list of tools to disable | ### Command Line Options ```bash uv run markdown-rag <directory> [OPTIONS] ``` **Arguments:** - `<directory>`: Path to markdown files directory (required) **Options:** - `-c, --command {ingest|mcp}`: Operation mode (default: `mcp`) - `ingest`: Process and store documents - `mcp`: Start MCP server for queries - `-e, --engine {google|ollama}`: Embedding engine (default: `google`) - `-l, --level {debug|info|warning|error}`: Logging level (default: `warning`) **Examples:** ```bash uv run markdown-rag ./docs --command ingest --level info --engine ollama uv run markdown-rag /var/docs -c ingest -l debug -e google ``` ## Architecture ### System Components The following diagram shows how the system components interact: ```mermaid graph TD A[MCP Client<br/>Claude, ChatGPT, etc.] --> B[FastMCP Server<br/>Tool: query] B --> C[MarkdownRAG] C --> D[Text Splitters] C --> E[Rate Limited Embeddings] E --> F[Google Gemini<br/>Embeddings API] C --> G[PostgreSQL<br/>+ pgvector] ``` ### Rate Limiting Strategy The system implements a dual-window sliding algorithm: - **Request Limits**: Tracks requests per minute and per day - **Token Limits**: Counts tokens before API calls - **Batch Optimization**: Calculates maximum safe batch sizes - **Smart Waiting**: Minimal delays with automatic retry See [Architecture Documentation](docs/architecture.md) for detailed diagrams. ## Development ### Setup Development Environment ```bash git clone https://github.com/yourusername/markdown-rag.git cd markdown-rag uv sync ``` ### Run Linters ```bash uv run ruff check . uv run mypy . ``` ### Code Style This project follows: - **Linting**: Ruff with Google docstring convention - **Type Checking**: mypy with strict settings - **Line Length**: 79 characters - **Import Sorting**: Alphabetical with isort ### Project Structure ```text markdown-rag/ ├── src/markdown_rag/ │ ├── __init__.py │ ├── main.py # Entry point and MCP server │ ├── config.py # Environment and CLI configuration │ ├── models.py # Pydantic data models │ ├── rag.py # Core RAG logic │ ├── embeddings.py # Rate-limited embeddings wrapper │ └── rate_limiter.py # Rate limiting algorithm ├── docs/ │ ├── api-reference.md # API documentation │ ├── architecture.md # Architecture documentation │ ├── mcp-integration.md # MCP server integration guide │ └── user-guide.md # User guide ├── pyproject.toml # Project configuration ├── .env # Environment variables (not in git) └── README.md ``` ## Troubleshooting ### Common Issues #### "Failed to start store: connection refused" PostgreSQL not running or wrong connection settings. Check your connection parameters in environment variables. #### "Rate limit exceeded" Adjust rate limits in environment variables: ```env RATE_LIMIT_REQUESTS_PER_MINUTE=50 RATE_LIMIT_REQUESTS_PER_DAY=500 ``` #### "pgvector extension not found" The pgvector PostgreSQL extension is not installed. Follow the [pgvector installation guide](https://github.com/pgvector/pgvector#installation) for your platform. #### "Skipping all files (already in vector store)" Expected behavior. The system prevents duplicate ingestion. ### Logging ```bash uv run markdown-rag ./docs --command ingest --level debug ``` ## Security ### Best Practices - **Never commit `.env` files** - Add to `.gitignore` - **Use environment variables** for all secrets - **Restrict database access** - Use firewall rules - **Rotate API keys** regularly - **Use read-only database users** for query-only deployments ### Secrets Management All secrets use `SecretStr` type to prevent accidental logging: ```python from pydantic import SecretStr api_key = SecretStr("secret_value") ``` ## Contributing 1. Fork the repository 1. Create a feature branch (`git checkout -b feature/amazing-feature`) 1. Make changes and add tests 1. Run linters (`uv run ruff check .`) 1. Run type checks (`uv run mypy .`) 1. Commit changes (`git commit -m 'feat: add amazing feature'`) 1. Push to branch (`git push origin feature/amazing-feature`) 1. Open a Pull Request ### Commit Message Format Follow conventional commits: ```text feat: add new feature fix: resolve bug docs: update documentation refactor: improve code structure test: add tests chore: update dependencies ``` ## TODOS - Management of embeddings store via MCP tool. - Add support for other embeddings models. - Add support for other vector stores. ## License This project is licensed under the MIT License. ## Acknowledgments - [LangChain](https://github.com/langchain-ai/langchain) - RAG framework - [Google Gemini](https://ai.google.dev/) - Embedding model - [pgvector](https://github.com/pgvector/pgvector) - Vector similarity search - [FastMCP](https://github.com/anthropics/fastmcp) - MCP server framework ## Support - Documentation: [docs/architecture.md](docs/architecture.md) - Issues: [GitHub Issues](https://github.com/yourusername/markdown-rag/issues) - Discussions: [GitHub Discussions](https://github.com/yourusername/markdown-rag/discussions)

Loading blob content...

Latest Blog Posts

Don't Use Large Strings as Cache Keys
By punkpeye on January 11, 2026.
markdown
node-js
cache
What are Claude Skills?
By punkpeye on January 10, 2026.
mcp
skills
How to Test MCP Streamable HTTP Endpoints Using cURL
By punkpeye on January 2, 2026.
tutorial
bash

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ashrobertsdragon/rag-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

README.md•10.5 KiB