# Architecture Overview
Technical documentation of the OpenZIM MCP system architecture and design.
## ๐๏ธ System Architecture
OpenZIM MCP follows a modular, layered architecture designed for performance, security, and maintainability.
```
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MCP Client Layer โ
โ (Claude, Custom Clients, etc.) โ
โโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MCP Protocol
โโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ OpenZIM MCP Server โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Server โ โ Security โ โ Instance Tracker โ โ
โ โ Core โ โ Layer โ โ & Health Monitor โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Business Logic Layer โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Cache โ โ Content โ โ ZIM Operations โ โ
โ โ Manager โ โ Processor โ โ & Smart Retrieval โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Data Access Layer โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ libzim โ โ File System โ โ Configuration โ โ
โ โ Interface โ โ Access โ โ & Validation โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Storage Layer โ
โ ZIM Files, Cache, Logs, Instance Tracking โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
```
## ๐งฉ Core Components
### 1. Server Core (`server.py`)
**Responsibilities**:
- MCP protocol implementation
- Request routing and handling
- Tool registration and execution
- Error handling and response formatting
**Key Features**:
- Asynchronous request processing
- Structured logging
- Health monitoring
- Graceful shutdown handling
### 2. Security Layer (`security.py`)
**Responsibilities**:
- Input validation and sanitization
- Path traversal protection
- Access control enforcement
- Security policy implementation
**Security Features**:
- Whitelist-based directory access
- Path normalization and validation
- Input length limits
- File extension validation
### 3. Cache Manager (`cache.py`)
**Responsibilities**:
- LRU cache with TTL support
- Cache key generation and management
- Performance metrics collection
- Memory usage optimization
**Cache Strategy**:
- Content-based caching for search results
- Entry path mapping cache
- Metadata caching
- Configurable size and TTL limits
### 4. Content Processor (`content_processor.py`)
**Responsibilities**:
- HTML to text conversion
- Content formatting and cleanup
- Snippet generation
- Link extraction
**Processing Features**:
- Preserves formatting structure
- Handles various content types
- Configurable content limits
- Smart truncation
### 5. ZIM Operations (`zim_operations.py`)
**Responsibilities**:
- ZIM file access and management
- Search operations
- Entry retrieval
- Metadata extraction
**Smart Features**:
- Automatic path resolution
- Fallback search mechanisms
- Namespace browsing
- Article structure analysis
### 6. Instance Tracker (`instance_tracker.py`)
**Responsibilities**:
- Multi-instance management
- Conflict detection and resolution
- Process monitoring
- Configuration validation
**Enterprise Features**:
- Automatic instance registration
- Stale instance cleanup
- Configuration hash comparison
- Health monitoring integration
### 7. Smart Retrieval System
**Responsibilities**:
- Intelligent entry path resolution
- Path mapping cache management
- Automatic fallback strategies
- Performance optimization
**Advanced Capabilities**:
- Pattern learning and recognition
- Confidence-based caching
- Multiple search strategies
- Transparent operation
## ๐ Request Flow
### Typical Request Processing
```
1. MCP Client Request
โ
2. Server Core (request validation)
โ
3. Security Layer (authorization check)
โ
4. Cache Manager (cache lookup)
โ (cache miss)
5. ZIM Operations (data retrieval)
โ
6. Content Processor (formatting)
โ
7. Cache Manager (cache storage)
โ
8. Server Core (response formatting)
โ
9. MCP Client Response
```
### Smart Retrieval Flow
```
1. Direct Entry Access Attempt
โ (fails)
2. Search-Based Fallback
โ
3. Path Mapping Cache Check
โ (miss)
4. Multiple Search Strategies
โ
5. Best Match Selection
โ
6. Path Mapping Cache Update
โ
7. Content Retrieval
```
## ๐๏ธ Module Structure
### Core Modules
```
openzim_mcp/
โโโ __init__.py # Package initialization and version
โโโ __main__.py # CLI entry point
โโโ main.py # Application entry point
โโโ server.py # MCP server implementation
โโโ config.py # Configuration management
โโโ security.py # Security and validation
โโโ cache.py # Caching functionality
โโโ content_processor.py # Content processing
โโโ zim_operations.py # ZIM file operations
โโโ instance_tracker.py # Multi-instance management
โโโ exceptions.py # Custom exceptions
โโโ constants.py # Application constants
```
### Enhanced Module Responsibilities
#### Core Infrastructure
- **`server.py`**: Enhanced with health monitoring and diagnostics
- **`config.py`**: Expanded configuration with validation and profiles
- **`security.py`**: Advanced security features and input validation
#### Business Logic
- **`zim_operations.py`**: Smart retrieval system integration
- **`cache.py`**: Multi-layer caching with performance metrics
- **`content_processor.py`**: Enhanced content analysis and link extraction
#### Enterprise Features
- **`instance_tracker.py`**: Multi-instance management and conflict resolution
- **Smart Retrieval**: Integrated path resolution and fallback mechanisms
- **Health Monitoring**: Comprehensive system diagnostics and metrics
### Configuration System
```python
# Hierarchical configuration with validation
class OpenZimMcpConfig:
cache: CacheConfig
content: ContentConfig
logging: LoggingConfig
server: ServerConfig
security: SecurityConfig
instance: InstanceConfig
```
### Dependency Injection
```python
# Modular design with dependency injection
class OpenZimMcpServer:
def __init__(
self,
config: OpenZimMcpConfig,
cache_manager: CacheManager,
content_processor: ContentProcessor,
zim_operations: ZimOperations,
security_validator: SecurityValidator,
instance_tracker: InstanceTracker
):
# Component initialization
```
## ๐ง Design Patterns
### 1. Strategy Pattern
**Used for**: Content processing strategies
```python
class ContentProcessor:
def __init__(self, strategies: Dict[str, ProcessingStrategy]):
self.strategies = strategies
def process(self, content_type: str, content: str) -> str:
strategy = self.strategies.get(content_type, self.default_strategy)
return strategy.process(content)
```
### 2. Factory Pattern
**Used for**: ZIM file handler creation
```python
class ZimHandlerFactory:
@staticmethod
def create_handler(zim_file_path: str) -> ZimHandler:
# Create appropriate handler based on file characteristics
return ZimHandler(zim_file_path)
```
### 3. Observer Pattern
**Used for**: Health monitoring and metrics
```python
class HealthMonitor:
def __init__(self):
self.observers = []
def notify_health_change(self, health_data: HealthData):
for observer in self.observers:
observer.on_health_update(health_data)
```
### 4. Decorator Pattern
**Used for**: Caching and logging
```python
@cache_result(ttl=3600)
@log_performance
def search_zim_file(self, zim_file_path: str, query: str) -> List[SearchResult]:
# Implementation
```
## ๐ Performance Architecture
### Caching Strategy
```
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Cache Layers โ
โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ L1 Cache โ โ L2 Cache โ โ L3 Cache โ โ
โ โ (Memory) โ โ (Metadata) โ โ (Path Mapping) โ โ
โ โ โ โ โ โ โ โ
โ โ โข Search โ โ โข ZIM Meta โ โ โข Entry Paths โ โ
โ โ โข Content โ โ โข Structure โ โ โข Namespace Info โ โ
โ โ โข Links โ โ โข Health โ โ โข Suggestions โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
```
### Asynchronous Processing
```python
# Non-blocking operations for better performance
async def handle_request(self, request: McpRequest) -> McpResponse:
# Asynchronous request processing
result = await self.process_async(request)
return self.format_response(result)
```
### Resource Management
```python
# Efficient resource cleanup
class ZimFileManager:
def __init__(self):
self.open_files = {}
self.file_locks = {}
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.cleanup_resources()
```
## ๐ Security Architecture
### Defense in Depth
```
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Security Layers โ
โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Input โ โ Path โ โ Access โ โ
โ โ Validation โ โ Validation โ โ Control โ โ
โ โ โ โ โ โ โ โ
โ โ โข Sanitize โ โ โข Normalize โ โ โข Directory Limits โ โ
โ โ โข Length โ โ โข Traversal โ โ โข File Extensions โ โ
โ โ โข Type โ โ โข Resolve โ โ โข Permission Check โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
```
### Security Validation Pipeline
```python
def validate_request(self, request: McpRequest) -> ValidationResult:
# 1. Input validation
self.validate_input(request.params)
# 2. Path validation
self.validate_paths(request.file_paths)
# 3. Access control
self.check_access_permissions(request.file_paths)
# 4. Rate limiting (future)
self.check_rate_limits(request.client_id)
return ValidationResult.VALID
```
## ๐ Monitoring and Observability
### Health Monitoring
```python
class HealthMonitor:
def collect_metrics(self) -> HealthMetrics:
return HealthMetrics(
cache_performance=self.cache_manager.get_metrics(),
memory_usage=self.get_memory_usage(),
request_metrics=self.get_request_metrics(),
instance_status=self.instance_tracker.get_status()
)
```
### Structured Logging
```python
# Consistent logging structure
logger.info(
"Request processed",
extra={
"request_id": request.id,
"tool_name": request.tool,
"duration_ms": duration,
"cache_hit": cache_hit,
"zim_file": zim_file_path
}
)
```
## ๐ Multi-Instance Management
### Instance Tracking
```python
class InstanceTracker:
def register_instance(self) -> InstanceInfo:
instance = InstanceInfo(
pid=os.getpid(),
config_hash=self.config.get_hash(),
start_time=datetime.now(),
directories=self.config.allowed_directories
)
self.save_instance_file(instance)
return instance
```
### Conflict Detection
```python
def detect_conflicts(self) -> List[Conflict]:
conflicts = []
active_instances = self.get_active_instances()
for instance in active_instances:
if self.has_config_conflict(instance):
conflicts.append(ConfigConflict(instance))
if self.has_directory_conflict(instance):
conflicts.append(DirectoryConflict(instance))
return conflicts
```
## ๐งช Testing Architecture
### Test Structure
```
tests/
โโโ unit/ # Unit tests with mocks
โโโ integration/ # Integration tests with real ZIM files
โโโ security/ # Security and validation tests
โโโ performance/ # Performance and load tests
โโโ fixtures/ # Test data and fixtures
โโโ conftest.py # Pytest configuration
```
### Test Categories
1. **Unit Tests**: Fast, isolated component testing
2. **Integration Tests**: End-to-end functionality with real ZIM files
3. **Security Tests**: Path traversal and input validation
4. **Performance Tests**: Cache performance and resource usage
## ๐ Scalability Considerations
### Horizontal Scaling
- **Multi-instance support**: Conflict detection and resolution
- **Load balancing**: Multiple server instances
- **Shared caching**: Future Redis integration
### Vertical Scaling
- **Memory optimization**: Efficient cache management
- **CPU optimization**: Asynchronous processing
- **I/O optimization**: Smart file access patterns
## ๐ฎ Future Architecture Enhancements
### Planned Improvements
1. **Microservices**: Split into specialized services
2. **Message Queue**: Asynchronous task processing
3. **Distributed Cache**: Redis/Memcached integration
4. **API Gateway**: Rate limiting and authentication
5. **Container Support**: Docker and Kubernetes deployment
### Extension Points
- **Plugin System**: Custom content processors
- **Custom Backends**: Alternative storage systems
- **Authentication**: User management and access control
- **Analytics**: Usage tracking and optimization
---
**Want to contribute?** Check the [Contributing Guidelines](https://github.com/cameronrye/openzim-mcp/blob/main/CONTRIBUTING.md) for development setup and coding standards.