Code-Index-MCP

PLAINTEXT_PLUGIN_SUMMARY.md•3.92 KiB

# Plain Text Plugin Implementation Summary ## Overview Created a comprehensive Plain Text plugin for the code-index-mcp project that provides natural language processing capabilities for plain text documents. ## Implementation Details ### Directory Structure ``` /app/mcp_server/plugins/plaintext_plugin/ ├── __init__.py # Package initialization ├── plugin.py # Main plugin implementation (PlainTextPlugin class) ├── nlp_processor.py # Natural language processing engine ├── paragraph_detector.py # Intelligent paragraph detection ├── sentence_splitter.py # Accurate sentence boundary detection └── topic_extractor.py # Topic modeling and keyword extraction ``` ### Key Features 1. **Natural Language Processing** - Text type detection (technical, narrative, instructional, conversational) - Readability scoring - Vocabulary richness analysis - Summary sentence extraction 2. **Document Structure Analysis** - Intelligent heading detection - Hierarchical outline generation - Section identification - Paragraph boundary detection with support for various formats 3. **Advanced Text Processing** - Sentence splitting with abbreviation handling - URL and email preservation - Decimal number and ellipsis handling - Multiple line ending style support 4. **Topic and Keyword Extraction** - TF-IDF based keyword extraction - Co-occurrence based topic modeling - Key phrase extraction (n-grams) - Technical term identification 5. **Semantic Chunking** - Context-aware text chunking - Optimized embedding text generation - Chunk metadata with topics and sections - Paragraph merging for coherent chunks 6. **Search Capabilities** - Enhanced full-text search - Query expansion with keywords - Relevance scoring - Contextual snippet generation 7. **Text Type Specific Processing** - Technical documents: code snippet and formula extraction - Narrative text: summary generation - Instructional text: step and tip extraction - Conversational text: speaker and question identification ### Supported File Extensions - `.txt` - `.text` - `.log` - `.readme` - `.md` (basic support) - `.markdown` (basic support) - `.rst` (basic support) ### Integration Points 1. **Inherits from BaseDocumentPlugin** - Leverages document processing infrastructure - Integrates with semantic search capabilities - Uses standard chunking and caching mechanisms 2. **Plugin Registry** - Added to `language_registry.py` as 'plaintext' - Registered in `plugin_factory.py` for automatic instantiation ### Testing Created comprehensive test files: - `test_plaintext_plugin.py` - Basic functionality tests - `demo_plaintext_plugin.py` - Feature demonstration with multiple document types ### Usage Example ```python from mcp_server.plugins.plaintext_plugin import PlainTextPlugin # Initialize plugin language_config = { 'name': 'plaintext', 'code': 'plaintext', 'extensions': ['.txt', '.text', '.log', '.readme'] } plugin = PlainTextPlugin(language_config, enable_semantic=True) # Process a document metadata = plugin.extract_metadata(content, file_path) structure = plugin.extract_structure(content, file_path) chunks = plugin.chunk_document(content, file_path) # Search results = plugin.search("query text", {"semantic": False, "limit": 10}) ``` ### Edge Cases Handled - Mixed line endings (CRLF, LF, CR) - Unicode text - URLs and email addresses in text - Abbreviations and decimal numbers - Empty documents - Very long lines - Nested lists and code blocks ## Benefits 1. **Improved Search** - Semantic understanding of plain text documents 2. **Better Organization** - Automatic structure extraction from unstructured text 3. **Content Intelligence** - Topic modeling and keyword extraction 4. **Flexible Processing** - Adapts to different text types automatically 5. **Robust Parsing** - Handles various text formats and edge cases

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ViperJuice/Code-Index-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

PLAINTEXT_PLUGIN_SUMMARY.md•3.92 KiB