OpenZIM MCP Server

Overview Schema Related Servers Score Discussions

openzim-mcp
wiki-content

Architecture-Overview.md•16.9 KiB

# Architecture Overview Technical documentation of the OpenZIM MCP system architecture and design. ## System Architecture OpenZIM MCP follows a modular, layered architecture designed for performance, security, and maintainability. ``` ┌─────────────────────────────────────────────────────────────┐ │ MCP Client Layer │ │ (Claude, Custom Clients, etc.) │ └─────────────────────┬───────────────────────────────────────┘ │ MCP Protocol ┌─────────────────────▼───────────────────────────────────────┐ │ OpenZIM MCP Server │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ │ │ Server │ │ Security │ │ Instance Tracker │ │ │ │ Core │ │ Layer │ │ & Health Monitor │ │ │ └─────────────┘ └─────────────┘ └─────────────────────┘ │ └─────────────────────┬───────────────────────────────────────┘ │ ┌─────────────────────▼───────────────────────────────────────┐ │ Business Logic Layer │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ │ │ Cache │ │ Content │ │ ZIM Operations │ │ │ │ Manager │ │ Processor │ │ & Smart Retrieval │ │ │ └─────────────┘ └─────────────┘ └─────────────────────┘ │ └─────────────────────┬───────────────────────────────────────┘ │ ┌─────────────────────▼───────────────────────────────────────┐ │ Data Access Layer │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ │ │ libzim │ │ File System │ │ Configuration │ │ │ │ Interface │ │ Access │ │ & Validation │ │ │ └─────────────┘ └─────────────┘ └─────────────────────┘ │ └─────────────────────┬───────────────────────────────────────┘ │ ┌─────────────────────▼───────────────────────────────────────┐ │ Storage Layer │ │ ZIM Files, Cache, Logs, Instance Tracking │ └─────────────────────────────────────────────────────────────┘ ``` ## Core Components ### 1. Server Core (`server.py`) **Responsibilities**: - MCP protocol implementation - Request routing and handling - Tool registration and execution - Error handling and response formatting **Key Features**: - Asynchronous request processing - Structured logging - Health monitoring - Graceful shutdown handling ### 2. Security Layer (`security.py`) **Responsibilities**: - Input validation and sanitization - Path traversal protection - Access control enforcement - Security policy implementation **Security Features**: - Whitelist-based directory access - Path normalization and validation - Input length limits - File extension validation ### 3. Cache Manager (`cache.py`) **Responsibilities**: - LRU cache with TTL support - Cache key generation and management - Performance metrics collection - Memory usage optimization **Cache Strategy**: - Content-based caching for search results - Entry path mapping cache - Metadata caching - Configurable size and TTL limits ### 4. Content Processor (`content_processor.py`) **Responsibilities**: - HTML to text conversion - Content formatting and cleanup - Snippet generation - Link extraction **Processing Features**: - Preserves formatting structure - Handles various content types - Configurable content limits - Smart truncation ### 5. ZIM Operations (`zim_operations.py`) **Responsibilities**: - ZIM file access and management - Search operations - Entry retrieval - Metadata extraction **Smart Features**: - Automatic path resolution - Fallback search mechanisms - Namespace browsing - Article structure analysis ### 6. Instance Tracker (`instance_tracker.py`) **Responsibilities**: - Multi-instance management - Conflict detection and resolution - Process monitoring - Configuration validation **Enterprise Features**: - Automatic instance registration - Stale instance cleanup - Configuration hash comparison - Health monitoring integration ### 7. Smart Retrieval System **Responsibilities**: - Intelligent entry path resolution - Path mapping cache management - Automatic fallback strategies - Performance optimization **Advanced Capabilities**: - Pattern learning and recognition - Confidence-based caching - Multiple search strategies - Transparent operation ## Request Flow ### Typical Request Processing ``` 1. MCP Client Request ↓ 2. Server Core (request validation) ↓ 3. Security Layer (authorization check) ↓ 4. Cache Manager (cache lookup) ↓ (cache miss) 5. ZIM Operations (data retrieval) ↓ 6. Content Processor (formatting) ↓ 7. Cache Manager (cache storage) ↓ 8. Server Core (response formatting) ↓ 9. MCP Client Response ``` ### Smart Retrieval Flow ``` 1. Direct Entry Access Attempt ↓ (fails) 2. Search-Based Fallback ↓ 3. Path Mapping Cache Check ↓ (miss) 4. Multiple Search Strategies ↓ 5. Best Match Selection ↓ 6. Path Mapping Cache Update ↓ 7. Content Retrieval ``` ## Module Structure ### Core Modules ``` openzim_mcp/ ├── __init__.py # Package initialization and version ├── __main__.py # CLI entry point ├── main.py # Application entry point ├── server.py # MCP server implementation ├── config.py # Configuration management ├── security.py # Security and validation ├── cache.py # Caching functionality ├── content_processor.py # Content processing ├── zim_operations.py # ZIM file operations ├── instance_tracker.py # Multi-instance management ├── exceptions.py # Custom exceptions └── constants.py # Application constants ``` ### Enhanced Module Responsibilities #### Core Infrastructure - **`server.py`**: Enhanced with health monitoring and diagnostics - **`config.py`**: Expanded configuration with validation and profiles - **`security.py`**: Advanced security features and input validation #### Business Logic - **`zim_operations.py`**: Smart retrieval system integration - **`cache.py`**: Multi-layer caching with performance metrics - **`content_processor.py`**: Enhanced content analysis and link extraction #### Enterprise Features - **`instance_tracker.py`**: Multi-instance management and conflict resolution - **Smart Retrieval**: Integrated path resolution and fallback mechanisms - **Health Monitoring**: Comprehensive system diagnostics and metrics ### Configuration System ```python # Hierarchical configuration with validation class OpenZimMcpConfig: cache: CacheConfig content: ContentConfig logging: LoggingConfig server: ServerConfig security: SecurityConfig instance: InstanceConfig ``` ### Dependency Injection ```python # Modular design with dependency injection class OpenZimMcpServer: def __init__( self, config: OpenZimMcpConfig, cache_manager: CacheManager, content_processor: ContentProcessor, zim_operations: ZimOperations, security_validator: SecurityValidator, instance_tracker: InstanceTracker ): # Component initialization ``` ## Design Patterns ### 1. Strategy Pattern **Used for**: Content processing strategies ```python class ContentProcessor: def __init__(self, strategies: Dict[str, ProcessingStrategy]): self.strategies = strategies def process(self, content_type: str, content: str) -> str: strategy = self.strategies.get(content_type, self.default_strategy) return strategy.process(content) ``` ### 2. Factory Pattern **Used for**: ZIM file handler creation ```python class ZimHandlerFactory: @staticmethod def create_handler(zim_file_path: str) -> ZimHandler: # Create appropriate handler based on file characteristics return ZimHandler(zim_file_path) ``` ### 3. Observer Pattern **Used for**: Health monitoring and metrics ```python class HealthMonitor: def __init__(self): self.observers = [] def notify_health_change(self, health_data: HealthData): for observer in self.observers: observer.on_health_update(health_data) ``` ### 4. Decorator Pattern **Used for**: Caching and logging ```python @cache_result(ttl=3600) @log_performance def search_zim_file(self, zim_file_path: str, query: str) -> List[SearchResult]: # Implementation ``` ## Performance Architecture ### Caching Strategy ``` ┌─────────────────────────────────────────────────────────────┐ │ Cache Layers │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ │ │ L1 Cache │ │ L2 Cache │ │ L3 Cache │ │ │ │ (Memory) │ │ (Metadata) │ │ (Path Mapping) │ │ │ │ │ │ │ │ │ │ │ │ • Search │ │ • ZIM Meta │ │ • Entry Paths │ │ │ │ • Content │ │ • Structure │ │ • Namespace Info │ │ │ │ • Links │ │ • Health │ │ • Suggestions │ │ │ └─────────────┘ └─────────────┘ └─────────────────────┘ │ └─────────────────────────────────────────────────────────────┘ ``` ### Asynchronous Processing ```python # Non-blocking operations for better performance async def handle_request(self, request: McpRequest) -> McpResponse: # Asynchronous request processing result = await self.process_async(request) return self.format_response(result) ``` ### Resource Management ```python # Efficient resource cleanup class ZimFileManager: def __init__(self): self.open_files = {} self.file_locks = {} def __enter__(self): return self def __exit__(self, exc_type, exc_val, exc_tb): self.cleanup_resources() ``` ## Security Architecture ### Defense in Depth ``` ┌─────────────────────────────────────────────────────────────┐ │ Security Layers │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ │ │ Input │ │ Path │ │ Access │ │ │ │ Validation │ │ Validation │ │ Control │ │ │ │ │ │ │ │ │ │ │ │ • Sanitize │ │ • Normalize │ │ • Directory Limits │ │ │ │ • Length │ │ • Traversal │ │ • File Extensions │ │ │ │ • Type │ │ • Resolve │ │ • Permission Check │ │ │ └─────────────┘ └─────────────┘ └─────────────────────┘ │ └─────────────────────────────────────────────────────────────┘ ``` ### Security Validation Pipeline ```python def validate_request(self, request: McpRequest) -> ValidationResult: # 1. Input validation self.validate_input(request.params) # 2. Path validation self.validate_paths(request.file_paths) # 3. Access control self.check_access_permissions(request.file_paths) # 4. Rate limiting (future) self.check_rate_limits(request.client_id) return ValidationResult.VALID ``` ## Monitoring and Observability ### Health Monitoring ```python class HealthMonitor: def collect_metrics(self) -> HealthMetrics: return HealthMetrics( cache_performance=self.cache_manager.get_metrics(), memory_usage=self.get_memory_usage(), request_metrics=self.get_request_metrics(), instance_status=self.instance_tracker.get_status() ) ``` ### Structured Logging ```python # Consistent logging structure logger.info( "Request processed", extra={ "request_id": request.id, "tool_name": request.tool, "duration_ms": duration, "cache_hit": cache_hit, "zim_file": zim_file_path } ) ``` ## Multi-Instance Management ### Instance Tracking ```python class InstanceTracker: def register_instance(self) -> InstanceInfo: instance = InstanceInfo( pid=os.getpid(), config_hash=self.config.get_hash(), start_time=datetime.now(), directories=self.config.allowed_directories ) self.save_instance_file(instance) return instance ``` ### Conflict Detection ```python def detect_conflicts(self) -> List[Conflict]: conflicts = [] active_instances = self.get_active_instances() for instance in active_instances: if self.has_config_conflict(instance): conflicts.append(ConfigConflict(instance)) if self.has_directory_conflict(instance): conflicts.append(DirectoryConflict(instance)) return conflicts ``` ## Testing Architecture ### Test Structure ``` tests/ ├── unit/ # Unit tests with mocks ├── integration/ # Integration tests with real ZIM files ├── security/ # Security and validation tests ├── performance/ # Performance and load tests ├── fixtures/ # Test data and fixtures └── conftest.py # Pytest configuration ``` ### Test Categories 1. **Unit Tests**: Fast, isolated component testing 2. **Integration Tests**: End-to-end functionality with real ZIM files 3. **Security Tests**: Path traversal and input validation 4. **Performance Tests**: Cache performance and resource usage ## Scalability Considerations ### Horizontal Scaling - **Multi-instance support**: Conflict detection and resolution - **Load balancing**: Multiple server instances - **Shared caching**: Future Redis integration ### Vertical Scaling - **Memory optimization**: Efficient cache management - **CPU optimization**: Asynchronous processing - **I/O optimization**: Smart file access patterns --- **Want to contribute?** Check the [Contributing Guidelines](https://github.com/cameronrye/openzim-mcp/blob/main/CONTRIBUTING.md) for development setup and coding standards.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/cameronrye/openzim-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

Architecture-Overview.md•16.9 KiB