Markdown RAG Documentation

Overview Schema Related Servers Score Discussions

ragdocs-mcp
specs

02-architecture-and-tech-stack.md•5.8 KiB

# 2. Architecture and Technology Stack (Hybrid Search) ## 2.1. High-Level Architecture The architecture is designed around a **hybrid search** model and a **pluggable parsing** system. It is composed of an **Indexing Service** that populates multiple data stores and a **Query Orchestrator** that fuses their results. - **Indexing Service:** Triggered by file changes, this service uses a **Parser Dispatcher** to select the correct parser for a given file type. The parser extracts structured data, which is then used to update three distinct indices: a Vector Index, a Keyword Index, and a Graph Store. - **Query Orchestrator:** Dispatches queries to the relevant searchers and applies a fusion algorithm (RRF) to produce a final, re-ranked list of documents for the LLM. ``` ┌──────────────────┐ File──────▶│ File Watcher │ Changes │ (Watchdog) │ └─────────┬────────┘ ▼ ┌──────────────────────────────────────────────────────────┐ │ Indexing Service │ │┌──────────────────┐ ┌────────────────┐ ┌─────────────────┐ │ ││ Parser Dispatcher│▶│ Markdown Parser│▶│ Update Vector │ │ ││ (by file type) │ │ (Tree-sitter) │ │ Index (FAISS) │ │ │└──────────────────┘ └───────┬────────┘ └──────┬──────────┘ │ │ │ │ │ │ ┌──────────────────────┘ │ │ │ │ │ │ │ └──────────▶┌──────────────────┐ ┌────────▼─────────┐ │ │ │ Update Keyword │ │ Update Graph │ │ │ │ Index (Whoosh) │ │ Store (NetworkX) │ │ │ └──────────────────┘ └──────────────────┘ │ └──────────────────────────────────────────────────────────┘ ... Query Flow remains the same ... ``` ## 2.2. Technology Stack - **Language:** Python 3.11+ / **Core Framework:** [Asyncio](https://docs.python.org/3/library/asyncio.html) - **MCP Server Framework:** [FastAPI](https://fastapi.tiangolo.com/) - **Pluggable Parsers:** The system will feature a pluggable architecture for document parsing. - **Design:** An abstract `DocumentParser` base class will define the interface. Concrete implementations (`MarkdownParser`, `PlainTextParser`, etc.) will handle specific file types. - **Default Parser:** The initial implementation will be a `MarkdownParser` using [Tree-sitter](https://tree-sitter.github.io/tree-sitter/) to extract a rich AST (links, tags, frontmatter, etc.). - **Grammar Management:** To ensure a zero-friction setup, tree-sitter's compiled language grammars will be managed as Python dependencies using a library like `py-tree-sitter-languages`. This automates the download and build process of the required grammars (e.g., for Markdown) during `uv pip install`. - **Hybrid Search Components:** - **Semantic Search:** [LlamaIndex](https://www.llamaindex.ai/) with **Faiss**. - **Keyword Search:** [Whoosh](https://whoosh.readthedocs.io/en/latest/index.html). - **Graph Storage & Analysis:** [NetworkX](https://networkx.org/). - **File System Watching:** [Watchdog](https://github.com/gorakhargosh/watchdog). - **Dependency Management:** [uv](https://github.com/astral-sh/uv). ## 2.3. Data Flow ### 2.3.1. Indexing Flow (Background) 1. **On Startup:** The server runs the full indexing pipeline for all discoverable files based on the configuration. It also checks the index version manifest and triggers a full rebuild if necessary. 2. **File Change Event:** `watchdog` detects a file change and places it on the `asyncio.Queue`. 3. **Debounce & Process:** The Indexing Service debounces and batches events. 4. **For each file:** a. **Parser Dispatch:** The service selects the appropriate parser from a registry based on the file's extension (e.g., `.md` -> `MarkdownParser`). b. **Parsing:** The selected parser processes the file content and returns a structured `Document` object containing text, metadata, links, etc. c. **Index Updates:** The data from the `Document` object is used to update all three indices: the Vector Index (LlamaIndex), the Keyword Index (Whoosh), and the Graph Store (NetworkX). d. The indices and stores are persisted to disk. ### 2.3.2. Query Flow (Foreground) *(This remains the same as the previous version: Request -> Orchestration -> Parallel Search -> Fusion -> Retrieval & Synthesis -> Response)* ## 2.4. Management and Observability To ensure the system is transparent and maintainable, it includes several key features for observability and administration. These are detailed in `specs/07-observability-and-management.md` and include: - **Status Endpoints:** `/health` and `/status` for real-time monitoring. - **Structured Logging:** For clear, machine-readable log output. - **Index Versioning:** To automatically handle index rebuilds when configurations change. - **CLI:** For administrative tasks like forcing a re-index.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/andnp/ragdocs-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

02-architecture-and-tech-stack.md•5.8 KiB