Skip to main content
Glama
Architecture-Overview.md17.3 kB
# Architecture Overview Technical documentation of the OpenZIM MCP system architecture and design. ## System Architecture OpenZIM MCP follows a modular, layered architecture designed for performance, security, and maintainability. ``` ┌─────────────────────────────────────────────────────────────┐ │ MCP Client Layer │ │ (Claude, Custom Clients, etc.) │ └─────────────────────┬───────────────────────────────────────┘ │ MCP Protocol ┌─────────────────────▼───────────────────────────────────────┐ │ OpenZIM MCP Server │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ │ │ Server │ │ Security │ │ Instance Tracker │ │ │ │ Core │ │ Layer │ │ & Health Monitor │ │ │ └─────────────┘ └─────────────┘ └─────────────────────┘ │ └─────────────────────┬───────────────────────────────────────┘ │ ┌─────────────────────▼───────────────────────────────────────┐ │ Business Logic Layer │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ │ │ Cache │ │ Content │ │ ZIM Operations │ │ │ │ Manager │ │ Processor │ │ & Smart Retrieval │ │ │ └─────────────┘ └─────────────┘ └─────────────────────┘ │ └─────────────────────┬───────────────────────────────────────┘ │ ┌─────────────────────▼───────────────────────────────────────┐ │ Data Access Layer │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ │ │ libzim │ │ File System │ │ Configuration │ │ │ │ Interface │ │ Access │ │ & Validation │ │ │ └─────────────┘ └─────────────┘ └─────────────────────┘ │ └─────────────────────┬───────────────────────────────────────┘ │ ┌─────────────────────▼───────────────────────────────────────┐ │ Storage Layer │ │ ZIM Files, Cache, Logs, Instance Tracking │ └─────────────────────────────────────────────────────────────┘ ``` ## Core Components ### 1. Server Core (`server.py`) **Responsibilities**: - MCP protocol implementation - Request routing and handling - Tool registration and execution - Error handling and response formatting **Key Features**: - Asynchronous request processing - Structured logging - Health monitoring - Graceful shutdown handling ### 2. Security Layer (`security.py`) **Responsibilities**: - Input validation and sanitization - Path traversal protection - Access control enforcement - Security policy implementation **Security Features**: - Whitelist-based directory access - Path normalization and validation - Input length limits - File extension validation ### 3. Cache Manager (`cache.py`) **Responsibilities**: - LRU cache with TTL support - Cache key generation and management - Performance metrics collection - Memory usage optimization **Cache Strategy**: - Content-based caching for search results - Entry path mapping cache - Metadata caching - Configurable size and TTL limits ### 4. Content Processor (`content_processor.py`) **Responsibilities**: - HTML to text conversion - Content formatting and cleanup - Snippet generation - Link extraction **Processing Features**: - Preserves formatting structure - Handles various content types - Configurable content limits - Smart truncation ### 5. ZIM Operations (`zim_operations.py`) **Responsibilities**: - ZIM file access and management - Search operations - Entry retrieval - Metadata extraction **Smart Features**: - Automatic path resolution - Fallback search mechanisms - Namespace browsing - Article structure analysis ### 6. Instance Tracker (`instance_tracker.py`) **Responsibilities**: - Multi-instance management - Conflict detection and resolution - Process monitoring - Configuration validation **Enterprise Features**: - Automatic instance registration - Stale instance cleanup - Configuration hash comparison - Health monitoring integration ### 7. Smart Retrieval System **Responsibilities**: - Intelligent entry path resolution - Path mapping cache management - Automatic fallback strategies - Performance optimization **Advanced Capabilities**: - Pattern learning and recognition - Confidence-based caching - Multiple search strategies - Transparent operation ## Request Flow ### Typical Request Processing ``` 1. MCP Client Request ↓ 2. Server Core (request validation) ↓ 3. Security Layer (authorization check) ↓ 4. Cache Manager (cache lookup) ↓ (cache miss) 5. ZIM Operations (data retrieval) ↓ 6. Content Processor (formatting) ↓ 7. Cache Manager (cache storage) ↓ 8. Server Core (response formatting) ↓ 9. MCP Client Response ``` ### Smart Retrieval Flow ``` 1. Direct Entry Access Attempt ↓ (fails) 2. Search-Based Fallback ↓ 3. Path Mapping Cache Check ↓ (miss) 4. Multiple Search Strategies ↓ 5. Best Match Selection ↓ 6. Path Mapping Cache Update ↓ 7. Content Retrieval ``` ## Module Structure ### Core Modules ``` openzim_mcp/ ├── __init__.py # Package initialization and version ├── __main__.py # CLI entry point ├── main.py # Application entry point ├── server.py # MCP server implementation ├── config.py # Configuration management ├── security.py # Security and validation ├── cache.py # Caching functionality ├── content_processor.py # Content processing ├── zim_operations.py # ZIM file operations ├── instance_tracker.py # Multi-instance management ├── exceptions.py # Custom exceptions └── constants.py # Application constants ``` ### Enhanced Module Responsibilities #### Core Infrastructure - **`server.py`**: Enhanced with health monitoring and diagnostics - **`config.py`**: Expanded configuration with validation and profiles - **`security.py`**: Advanced security features and input validation #### Business Logic - **`zim_operations.py`**: Smart retrieval system integration - **`cache.py`**: Multi-layer caching with performance metrics - **`content_processor.py`**: Enhanced content analysis and link extraction #### Enterprise Features - **`instance_tracker.py`**: Multi-instance management and conflict resolution - **Smart Retrieval**: Integrated path resolution and fallback mechanisms - **Health Monitoring**: Comprehensive system diagnostics and metrics ### Configuration System ```python # Hierarchical configuration with validation class OpenZimMcpConfig: cache: CacheConfig content: ContentConfig logging: LoggingConfig server: ServerConfig security: SecurityConfig instance: InstanceConfig ``` ### Dependency Injection ```python # Modular design with dependency injection class OpenZimMcpServer: def __init__( self, config: OpenZimMcpConfig, cache_manager: CacheManager, content_processor: ContentProcessor, zim_operations: ZimOperations, security_validator: SecurityValidator, instance_tracker: InstanceTracker ): # Component initialization ``` ## Design Patterns ### 1. Strategy Pattern **Used for**: Content processing strategies ```python class ContentProcessor: def __init__(self, strategies: Dict[str, ProcessingStrategy]): self.strategies = strategies def process(self, content_type: str, content: str) -> str: strategy = self.strategies.get(content_type, self.default_strategy) return strategy.process(content) ``` ### 2. Factory Pattern **Used for**: ZIM file handler creation ```python class ZimHandlerFactory: @staticmethod def create_handler(zim_file_path: str) -> ZimHandler: # Create appropriate handler based on file characteristics return ZimHandler(zim_file_path) ``` ### 3. Observer Pattern **Used for**: Health monitoring and metrics ```python class HealthMonitor: def __init__(self): self.observers = [] def notify_health_change(self, health_data: HealthData): for observer in self.observers: observer.on_health_update(health_data) ``` ### 4. Decorator Pattern **Used for**: Caching and logging ```python @cache_result(ttl=3600) @log_performance def search_zim_file(self, zim_file_path: str, query: str) -> List[SearchResult]: # Implementation ``` ## Performance Architecture ### Caching Strategy ``` ┌─────────────────────────────────────────────────────────────┐ │ Cache Layers │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ │ │ L1 Cache │ │ L2 Cache │ │ L3 Cache │ │ │ │ (Memory) │ │ (Metadata) │ │ (Path Mapping) │ │ │ │ │ │ │ │ │ │ │ │ • Search │ │ • ZIM Meta │ │ • Entry Paths │ │ │ │ • Content │ │ • Structure │ │ • Namespace Info │ │ │ │ • Links │ │ • Health │ │ • Suggestions │ │ │ └─────────────┘ └─────────────┘ └─────────────────────┘ │ └─────────────────────────────────────────────────────────────┘ ``` ### Asynchronous Processing ```python # Non-blocking operations for better performance async def handle_request(self, request: McpRequest) -> McpResponse: # Asynchronous request processing result = await self.process_async(request) return self.format_response(result) ``` ### Resource Management ```python # Efficient resource cleanup class ZimFileManager: def __init__(self): self.open_files = {} self.file_locks = {} def __enter__(self): return self def __exit__(self, exc_type, exc_val, exc_tb): self.cleanup_resources() ``` ## Security Architecture ### Defense in Depth ``` ┌─────────────────────────────────────────────────────────────┐ │ Security Layers │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ │ │ Input │ │ Path │ │ Access │ │ │ │ Validation │ │ Validation │ │ Control │ │ │ │ │ │ │ │ │ │ │ │ • Sanitize │ │ • Normalize │ │ • Directory Limits │ │ │ │ • Length │ │ • Traversal │ │ • File Extensions │ │ │ │ • Type │ │ • Resolve │ │ • Permission Check │ │ │ └─────────────┘ └─────────────┘ └─────────────────────┘ │ └─────────────────────────────────────────────────────────────┘ ``` ### Security Validation Pipeline ```python def validate_request(self, request: McpRequest) -> ValidationResult: # 1. Input validation self.validate_input(request.params) # 2. Path validation self.validate_paths(request.file_paths) # 3. Access control self.check_access_permissions(request.file_paths) # 4. Rate limiting (future) self.check_rate_limits(request.client_id) return ValidationResult.VALID ``` ## Monitoring and Observability ### Health Monitoring ```python class HealthMonitor: def collect_metrics(self) -> HealthMetrics: return HealthMetrics( cache_performance=self.cache_manager.get_metrics(), memory_usage=self.get_memory_usage(), request_metrics=self.get_request_metrics(), instance_status=self.instance_tracker.get_status() ) ``` ### Structured Logging ```python # Consistent logging structure logger.info( "Request processed", extra={ "request_id": request.id, "tool_name": request.tool, "duration_ms": duration, "cache_hit": cache_hit, "zim_file": zim_file_path } ) ``` ## Multi-Instance Management ### Instance Tracking ```python class InstanceTracker: def register_instance(self) -> InstanceInfo: instance = InstanceInfo( pid=os.getpid(), config_hash=self.config.get_hash(), start_time=datetime.now(), directories=self.config.allowed_directories ) self.save_instance_file(instance) return instance ``` ### Conflict Detection ```python def detect_conflicts(self) -> List[Conflict]: conflicts = [] active_instances = self.get_active_instances() for instance in active_instances: if self.has_config_conflict(instance): conflicts.append(ConfigConflict(instance)) if self.has_directory_conflict(instance): conflicts.append(DirectoryConflict(instance)) return conflicts ``` ## Testing Architecture ### Test Structure ``` tests/ ├── unit/ # Unit tests with mocks ├── integration/ # Integration tests with real ZIM files ├── security/ # Security and validation tests ├── performance/ # Performance and load tests ├── fixtures/ # Test data and fixtures └── conftest.py # Pytest configuration ``` ### Test Categories 1. **Unit Tests**: Fast, isolated component testing 2. **Integration Tests**: End-to-end functionality with real ZIM files 3. **Security Tests**: Path traversal and input validation 4. **Performance Tests**: Cache performance and resource usage ## Scalability Considerations ### Horizontal Scaling - **Multi-instance support**: Conflict detection and resolution - **Load balancing**: Multiple server instances - **Shared caching**: Future Redis integration ### Vertical Scaling - **Memory optimization**: Efficient cache management - **CPU optimization**: Asynchronous processing - **I/O optimization**: Smart file access patterns --- **Want to contribute?** Check the [Contributing Guidelines](https://github.com/cameronrye/openzim-mcp/blob/main/CONTRIBUTING.md) for development setup and coding standards.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/cameronrye/openzim-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server