Code-Index-MCP

MARKDOWN_PLUGIN_SUMMARY.md•3.99 KiB

# Markdown Plugin Implementation Summary ## Overview A complete Markdown plugin has been implemented for the MCP Server with comprehensive document processing capabilities. ## Directory Structure ``` /app/mcp_server/plugins/markdown_plugin/ ├── __init__.py # Package initialization ├── plugin.py # Main MarkdownPlugin class ├── document_parser.py # Markdown AST parser ├── section_extractor.py # Hierarchical section extraction ├── frontmatter_parser.py # YAML/TOML frontmatter parsing └── chunk_strategies.py # Smart document chunking ``` ## Key Features ### 1. **Markdown Parsing** - Full Markdown AST parsing with support for: - Headings (ATX and Setext styles) - Code blocks (fenced and indented) - Lists (ordered and unordered) - Blockquotes - Tables - Links and images - Inline formatting (bold, italic, code) - Horizontal rules ### 2. **Frontmatter Support** - YAML frontmatter parsing (between `---` delimiters) - TOML frontmatter parsing (between `+++` delimiters) - Metadata extraction and normalization - Support for common fields: title, author, date, tags, categories ### 3. **Document Structure Extraction** - Hierarchical section detection - Section content extraction - Table of contents generation - Section metadata (word count, code blocks, links, images) ### 4. **Smart Chunking Strategies** - Semantic boundary-aware chunking - Section-based chunking with hierarchy preservation - Configurable chunk sizes and overlap - Context preservation for embeddings - Optimization for semantic search ### 5. **Symbol Extraction** - Document symbols (title, sections) - Heading symbols with hierarchy - Code symbol extraction from code blocks - Support for multiple programming languages ## Integration Points ### Base Classes - Inherits from `BaseDocumentPlugin` - Integrates with the document processing framework - Compatible with semantic search infrastructure ### Key Methods - `extract_structure()` - Extract document structure - `extract_metadata()` - Parse frontmatter and metadata - `parse_content()` - Convert to plain text - `chunk_document()` - Create optimized chunks - `indexFile()` - Index document with symbols ## Usage Example ```python from mcp_server.plugins.markdown_plugin import MarkdownPlugin # Initialize plugin plugin = MarkdownPlugin(enable_semantic=True) # Index a Markdown file with open('document.md', 'r') as f: content = f.read() result = plugin.indexFile('document.md', content) # Access symbols for symbol in result['symbols']: print(f"{symbol['kind']}: {symbol['symbol']}") ``` ## Chunking Configuration The plugin supports configurable chunking: - `max_chunk_size`: Maximum chunk size (default: 1000 chars) - `min_chunk_size`: Minimum chunk size (default: 100 chars) - `overlap_size`: Overlap between chunks (default: 50 chars) - `prefer_semantic_boundaries`: Use semantic boundaries (default: True) ## Future Enhancements 1. **Enhanced Code Block Processing** - More language-specific symbol extraction - Syntax validation - Import/dependency tracking 2. **Advanced Metadata** - Custom metadata schemas - Validation rules - Metadata inheritance 3. **Cross-Document Features** - Wiki-style link resolution - Reference tracking - Document graph building 4. **Performance Optimizations** - Parallel chunk processing - Incremental updates - Cache warming strategies ## Testing A comprehensive test suite is included: - Basic functionality tests - Frontmatter parsing tests - Structure extraction tests - Chunking strategy tests - Symbol extraction tests Run tests with: ```bash python test_markdown_plugin.py ``` ## Conclusion The Markdown plugin provides a robust foundation for processing Markdown documents with: - Complete parsing capabilities - Intelligent chunking for semantic search - Rich metadata and structure extraction - Seamless integration with the MCP Server infrastructure The implementation is production-ready and extensible for future enhancements.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ViperJuice/Code-Index-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

MARKDOWN_PLUGIN_SUMMARY.md•3.99 KiB