MCP PDF

MCPMIXIN_ARCHITECTURE.md•10.3 KiB

# MCPMixin Architecture Guide ## Overview This document explains how to refactor large FastMCP servers using the **MCPMixin pattern** for better organization, maintainability, and modularity. ## Current vs MCPMixin Architecture ### Current Monolithic Structure ``` server.py (6500+ lines) ├── 24+ tools with @mcp.tool() decorators ├── Security utilities scattered throughout ├── PDF processing helpers mixed in └── Single main() function ``` **Problems:** - Single file responsibility overload - Difficult to test individual components - Hard to add new tool categories - Security logic scattered throughout - No clear separation of concerns ### MCPMixin Modular Structure ``` mcp_pdf/ ├── server.py (main entry point, ~100 lines) ├── security.py (centralized security utilities) ├── mixins/ │ ├── __init__.py │ ├── base.py (MCPMixin base class) │ ├── text_extraction.py (extract_text, ocr_pdf, is_scanned_pdf) │ ├── table_extraction.py (extract_tables with fallbacks) │ ├── document_analysis.py (metadata, structure, health) │ ├── image_processing.py (extract_images, pdf_to_markdown) │ ├── form_management.py (create/fill/extract forms) │ ├── document_assembly.py (merge, split, reorder) │ └── annotations.py (sticky notes, highlights, multimedia) └── tests/ ├── test_mixin_architecture.py ├── test_text_extraction.py ├── test_table_extraction.py └── ... (individual mixin tests) ``` ## Key Benefits of MCPMixin Architecture ### 1. **Modular Design** - Each mixin handles one functional domain - Clear separation of concerns - Easy to understand and maintain individual components ### 2. **Auto-Registration** - Tools automatically discovered and registered - Consistent naming and description patterns - No manual tool registration needed ### 3. **Testability** - Each mixin can be tested independently - Mock dependencies easily - Focused unit tests per domain ### 4. **Scalability** - Add new tool categories by creating new mixins - Compose servers with different mixin combinations - Progressive disclosure of capabilities ### 5. **Security Centralization** - Shared security utilities in single module - Consistent validation across all tools - Centralized error handling and sanitization ### 6. **Configuration Management** - Centralized configuration in server class - Mixin-specific configuration passed during initialization - Environment variable management in one place ## MCPMixin Base Class Features ### Auto-Registration ```python class TextExtractionMixin(MCPMixin): @mcp_tool(name="extract_text", description="Extract text from PDF") async def extract_text(self, pdf_path: str) -> Dict[str, Any]: # Implementation automatically registered as MCP tool pass ``` ### Permission System ```python def get_required_permissions(self) -> List[str]: return ["read_files", "ocr_processing"] ``` ### Component Discovery ```python def get_registered_components(self) -> Dict[str, Any]: return { "mixin": "TextExtraction", "tools": ["extract_text", "ocr_pdf", "is_scanned_pdf"], "resources": [], "prompts": [], "permissions_required": ["read_files", "ocr_processing"] } ``` ## Implementation Examples ### Text Extraction Mixin ```python from .base import MCPMixin, mcp_tool from ..security import validate_pdf_path, sanitize_error_message class TextExtractionMixin(MCPMixin): def get_mixin_name(self) -> str: return "TextExtraction" def get_required_permissions(self) -> List[str]: return ["read_files", "ocr_processing"] @mcp_tool(name="extract_text", description="Extract text with intelligent method selection") async def extract_text(self, pdf_path: str, method: str = "auto") -> Dict[str, Any]: try: validated_path = await validate_pdf_path(pdf_path) # Implementation here... return {"success": True, "text": extracted_text} except Exception as e: return {"success": False, "error": sanitize_error_message(str(e))} ``` ### Server Composition ```python class PDFToolsServer: def __init__(self): self.mcp = FastMCP("pdf-tools") self.mixins = [] # Initialize mixins mixin_classes = [ TextExtractionMixin, TableExtractionMixin, DocumentAnalysisMixin, # ... other mixins ] for mixin_class in mixin_classes: mixin = mixin_class(self.mcp, **self.config) self.mixins.append(mixin) ``` ## Migration Strategy ### Phase 1: Setup Infrastructure 1. Create `mixins/` directory structure 2. Implement `MCPMixin` base class 3. Extract security utilities to `security.py` 4. Set up testing framework ### Phase 2: Extract First Mixin 1. Start with `TextExtractionMixin` 2. Move text extraction tools from server.py 3. Update imports and dependencies 4. Test thoroughly ### Phase 3: Iterative Migration 1. Extract one mixin at a time 2. Test each migration independently 3. Update server.py to use new mixins 4. Maintain backward compatibility ### Phase 4: Cleanup and Optimization 1. Remove original server.py code 2. Optimize mixin interactions 3. Add advanced features (progressive disclosure, etc.) 4. Final testing and documentation ## Testing Strategy ### Unit Testing Per Mixin ```python class TestTextExtractionMixin: def setup_method(self): self.mcp = FastMCP("test") self.mixin = TextExtractionMixin(self.mcp) @pytest.mark.asyncio async def test_extract_text_validation(self): result = await self.mixin.extract_text("") assert not result["success"] ``` ### Integration Testing ```python class TestMixinComposition: def test_no_tool_name_conflicts(self): # Ensure no tools have conflicting names pass def test_comprehensive_coverage(self): # Ensure all original tools are covered pass ``` ### Auto-Discovery Testing ```python def test_mixin_auto_registration(self): mixin = TextExtractionMixin(mcp) components = mixin.get_registered_components() assert "extract_text" in components["tools"] ``` ## Advanced Patterns ### Progressive Tool Disclosure ```python class SecureTextExtractionMixin(TextExtractionMixin): def __init__(self, mcp_server, permissions=None, **kwargs): self.user_permissions = permissions or [] super().__init__(mcp_server, **kwargs) def _should_auto_register_tool(self, name: str, method: Callable) -> bool: # Only register tools user has permission for required_perms = self._get_tool_permissions(name) return all(perm in self.user_permissions for perm in required_perms) ``` ### Dynamic Tool Visibility ```python @mcp_tool(name="advanced_ocr", description="Advanced OCR with ML") async def advanced_ocr(self, pdf_path: str) -> Dict[str, Any]: if not self._check_premium_features(): return {"error": "Premium feature not available"} # Implementation... ``` ### Bulk Operations ```python class BulkProcessingMixin(MCPMixin): @mcp_tool(name="bulk_extract_text", description="Process multiple PDFs") async def bulk_extract_text(self, pdf_paths: List[str]) -> Dict[str, Any]: # Leverage other mixins for bulk operations pass ``` ## Performance Considerations ### Lazy Loading - Mixins only initialize when first used - Heavy dependencies loaded on-demand - Configurable mixin selection ### Memory Management - Clear separation prevents memory leaks - Each mixin manages its own resources - Proper cleanup in error cases ### Startup Time - Fast initialization with auto-registration - Parallel mixin initialization possible - Tool registration is cached ## Security Enhancements ### Centralized Validation ```python # security.py async def validate_pdf_path(pdf_path: str) -> Path: # Single source of truth for PDF validation pass def sanitize_error_message(error_msg: str) -> str: # Consistent error sanitization pass ``` ### Permission-Based Access ```python class SecureMixin(MCPMixin): def get_required_permissions(self) -> List[str]: return ["read_files", "specific_operation"] def _check_permissions(self, required: List[str]) -> bool: return all(perm in self.user_permissions for perm in required) ``` ## Deployment Configurations ### Development Server ```python # All mixins enabled, debug logging server = PDFToolsServer( mixins="all", debug=True, security_mode="relaxed" ) ``` ### Production Server ```python # Selected mixins, strict security server = PDFToolsServer( mixins=["TextExtraction", "TableExtraction"], security_mode="strict", rate_limiting=True ) ``` ### Specialized Deployment ```python # OCR-only server server = PDFToolsServer( mixins=["TextExtraction"], tools=["ocr_pdf", "is_scanned_pdf"], gpu_acceleration=True ) ``` ## Comparison with Current Approach | Aspect | Current FastMCP | MCPMixin Pattern | |--------|----------------|------------------| | **Organization** | Single 6500+ line file | Modular mixins (~200-500 lines each) | | **Testability** | Hard to test individual tools | Easy isolated testing | | **Maintainability** | Difficult to navigate/modify | Clear separation of concerns | | **Extensibility** | Add to monolithic file | Create new mixin | | **Security** | Scattered validation | Centralized security utilities | | **Performance** | All tools loaded always | Lazy loading possible | | **Reusability** | Monolithic server only | Mixins reusable across projects | | **Debugging** | Hard to isolate issues | Clear component boundaries | ## Conclusion The MCPMixin pattern transforms large, monolithic FastMCP servers into maintainable, testable, and scalable architectures. While it requires initial refactoring effort, the long-term benefits in maintainability, testability, and extensibility make it worthwhile for any server with 10+ tools. The pattern is particularly valuable for: - **Complex servers** with multiple tool categories - **Team development** where different developers work on different domains - **Production deployments** requiring security and reliability - **Long-term maintenance** and feature evolution For your MCP PDF server with 24+ tools, the MCPMixin pattern would provide significant improvements in code organization, testing capabilities, and future extensibility.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/rsp2k/mcp-pdf'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

MCPMIXIN_ARCHITECTURE.md•10.3 KiB