Qdrant RAG MCP Server

documentation-indexer-plan.md•4.94 KiB

# Documentation Indexer Implementation Plan - v0.2.3 ## Overview The Documentation Indexer will provide specialized handling for markdown and other documentation files, enabling proper indexing and search of documentation alongside code. ## Problem Statement Currently: - Markdown files (*.md) are not indexed at all - Documentation is crucial for understanding codebases but is ignored - No way to search README files, CHANGELOG, or docs folders - Code and Config indexers are not suitable for prose/documentation ## Solution Design ### 1. New DocumentationIndexer Class Location: `src/indexers/documentation_indexer.py` ```python class DocumentationIndexer: """Specialized indexer for markdown and documentation files.""" def __init__(self): self.supported_extensions = ['.md', '.markdown', '.rst', '.txt'] self.chunk_size = 2000 # Larger chunks for documentation self.chunk_overlap = 400 ``` ### 2. Chunking Strategy Unlike code/config indexers, documentation should be chunked by: 1. **Heading-based chunking**: - Split at major headings (##, ###) - Keep heading hierarchy in metadata - Preserve context by including parent headings 2. **Smart paragraph chunking**: - Keep related paragraphs together - Don't break in middle of lists or code blocks - Respect markdown structure ### 3. Metadata Extraction Extract and store: - Document title (from # heading or filename) - Heading hierarchy (breadcrumb trail) - Code block languages - Internal/external links - Table of contents structure - Frontmatter (if present) ### 4. Implementation Steps #### Phase 1: Basic Implementation (Day 1) - [ ] Create `documentation_indexer.py` - [ ] Implement basic markdown parsing - [ ] Add heading-based chunking - [ ] Extract basic metadata (title, headings) #### Phase 2: Enhanced Features (Day 2) - [ ] Add code block extraction with language detection - [ ] Implement link extraction and tracking - [ ] Add frontmatter parsing - [ ] Handle special markdown elements (tables, lists) #### Phase 3: Integration (Day 3) - [ ] Add `index_documentation()` function to main server - [ ] Create `document_collection` in Qdrant - [ ] Add `*.md` to default patterns in `index_directory()` - [ ] Implement `search_docs()` function #### Phase 4: Testing & Refinement (Day 4) - [ ] Test with various markdown formats - [ ] Ensure proper handling of edge cases - [ ] Add documentation-specific ranking signals - [ ] Update existing documentation ## Technical Details ### Chunking Algorithm ```python def chunk_by_sections(self, content: str) -> List[Dict[str, Any]]: """ Split markdown content by sections while preserving context. Returns chunks with: - content: The text content - heading: The immediate heading - heading_hierarchy: List of parent headings - chunk_type: 'heading', 'paragraph', 'code_block', 'list' - metadata: Additional info (links, code languages, etc.) """ ``` ### Metadata Schema ```python { "file_path": "/docs/README.md", "doc_type": "markdown", "title": "Project Documentation", "heading": "Installation Guide", "heading_hierarchy": ["Project Documentation", "Getting Started", "Installation Guide"], "heading_level": 3, "chunk_index": 2, "chunk_type": "section", "has_code_blocks": True, "code_languages": ["python", "bash"], "internal_links": ["../api/reference.md", "#configuration"], "external_links": ["https://example.com"], "word_count": 256, "modified_at": "2025-05-29T10:00:00Z" } ``` ### Search Enhancements 1. **Documentation-specific search mode**: ```python @mcp.tool() def search_docs(query: str, doc_type: str = None, n_results: int = 5) -> Dict[str, Any]: """Search specifically in documentation files.""" ``` 2. **Ranking adjustments**: - Boost exact heading matches - Prioritize README and main documentation - Consider heading hierarchy in relevance ### Collection Configuration ```python # Separate collection for documentation document_collection = "project_documentation" # Schema includes documentation-specific fields vector_config = VectorParams( size=384, # Same embedding model distance=Distance.COSINE ) ``` ## Benefits 1. **Better Documentation Discovery**: Find relevant docs quickly 2. **Context-Aware Search**: Understanding section hierarchy 3. **Code Example Finding**: Search for code snippets in docs 4. **Cross-Reference Support**: Track documentation relationships 5. **Migration Guide Access**: Easy access to upgrade instructions ## Success Metrics - Successfully index all .md files in a project - Search returns relevant documentation sections - Code blocks in docs are searchable - Heading hierarchy provides useful context - No performance degradation ## Future Enhancements (v0.2.4+) 1. Support for other formats (.rst, .adoc, .textile) 2. Wiki-style link graph analysis 3. Documentation quality scoring 4. Auto-generated documentation detection 5. API documentation special handling

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ancoleman/qdrant-rag-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

documentation-indexer-plan.md•4.94 KiB