Provides access to ZIM format knowledge bases created by Kiwix, enabling AI agents to search and retrieve content from offline Wikipedia and other reference materials stored in compressed ZIM archives.
Allows access to Wikibooks content stored in ZIM format archives, enabling AI agents to search and retrieve educational content from offline Wikibooks collections.
Enables AI agents to access Wikipedia content stored in offline ZIM archives, providing tools for searching articles, browsing by namespace, extracting article structure, and retrieving detailed content without requiring internet connectivity.
OpenZIM MCP Server
🧠 Built for LLM Intelligence
OpenZIM MCP transforms static ZIM archives into dynamic knowledge engines for Large Language Models. Unlike basic file readers, this tool provides intelligent, structured access that LLMs need to effectively navigate and understand vast knowledge repositories.
🚀 Why LLMs Love OpenZIM MCP:
- Smart Navigation: Browse by namespace (articles, metadata, media) instead of blind searching
- Context-Aware Discovery: Get article structure, relationships, and metadata for deeper understanding
- Intelligent Search: Advanced filtering, auto-complete suggestions, and relevance-ranked results
- Performance Optimized: Cached operations and pagination prevent timeouts on massive archives
- Relationship Mapping: Extract internal/external links to understand content connections
Whether you're building a research assistant, knowledge chatbot, or content analysis system, OpenZIM MCP gives your LLM the structured access patterns it needs to unlock the full potential of offline knowledge archives. No more fumbling through raw text dumps! 🎯
OpenZIM MCP is a modern, secure, and high-performance MCP (Model Context Protocol) server that enables AI models to access and search ZIM format knowledge bases offline.
ZIM (Zeno IMproved) is an open file format developed by the openZIM project, designed specifically for offline storage and access to website content. The format supports high compression rates using Zstandard compression (default since 2021) and enables fast full-text searching, making it ideal for storing entire Wikipedia content and other large reference materials in relatively compact files. The openZIM project is sponsored by Wikimedia CH and supported by the Wikimedia Foundation, ensuring the format's continued development and adoption for offline knowledge access, especially in environments without reliable internet connectivity.
✨ Features
- 🔒 Security First: Comprehensive input validation and path traversal protection
- ⚡ High Performance: Intelligent caching and optimized ZIM file operations
- 🧠 Smart Retrieval: Automatic fallback from direct access to search-based retrieval for reliable entry access
- 🧪 Well Tested: 90%+ test coverage with comprehensive test suite
- 🏗️ Modern Architecture: Modular design with dependency injection
- 📝 Type Safe: Full type annotations throughout the codebase
- 🔧 Configurable: Flexible configuration with validation
- 📊 Observable: Structured logging and health monitoring
🚀 Quick Start
Installation
Development Installation
For contributors and developers:
Prepare ZIM Files
Download ZIM files (e.g., Wikipedia, Wiktionary, etc.) from the Kiwix Library and place them in a directory:
Running the Server
MCP Configuration
Add to your MCP client configuration:
Alternative configuration using Python module:
For development (from source):
🛠️ Development
Running Tests
ZIM Test Data Integration
OpenZIM MCP integrates with the official zim-testing-suite for comprehensive testing with real ZIM files:
The test data includes:
- Basic files: Small ZIM files for essential testing
- Real content: Actual Wikipedia/Wikibooks content for integration testing
- Invalid files: Malformed ZIM files for error handling testing
- Special cases: Embedded content, split files, and edge cases
Test files are automatically organized by category and priority level.
Code Quality
Project Structure
📚 API Reference
Available Tools
list_zim_files - List all ZIM files in allowed directories
No parameters required.
search_zim_file - Search within ZIM file content
Required parameters:
zim_file_path
(string): Path to the ZIM filequery
(string): Search query term
Optional parameters:
limit
(integer, default: 10): Maximum number of results to returnoffset
(integer, default: 0): Starting offset for results (for pagination)
get_zim_entry - Get detailed content of a specific entry in a ZIM file
Required parameters:
zim_file_path
(string): Path to the ZIM fileentry_path
(string): Entry path, e.g., 'A/Some_Article'
Optional parameters:
max_content_length
(integer, default: 100000, minimum: 1000): Maximum length of returned content
Smart Retrieval Features:
- Automatic Fallback: If direct path access fails, automatically searches for the entry and uses the exact path found
- Path Mapping Cache: Caches successful path mappings for improved performance on repeated access
- Enhanced Error Guidance: Provides clear guidance when entries cannot be found, suggesting alternative approaches
- Transparent Operation: Works seamlessly regardless of path encoding differences (spaces vs underscores, URL encoding, etc.)
get_zim_metadata - Get ZIM file metadata from M namespace entries
Required parameters:
zim_file_path
(string): Path to the ZIM file
Returns: JSON string containing ZIM metadata including entry counts, archive information, and metadata entries like title, description, language, creator, etc.
get_main_page - Get the main page entry from W namespace
Required parameters:
zim_file_path
(string): Path to the ZIM file
Returns: Main page content or information about the main page entry.
list_namespaces - List available namespaces and their entry counts
Required parameters:
zim_file_path
(string): Path to the ZIM file
Returns: JSON string containing namespace information with entry counts, descriptions, and sample entries for each namespace (C, M, W, X, etc.).
browse_namespace - Browse entries in a specific namespace with pagination
Required parameters:
zim_file_path
(string): Path to the ZIM filenamespace
(string): Namespace to browse (C, M, W, X, A, I, etc.)
Optional parameters:
limit
(integer, default: 50, range: 1-200): Maximum number of entries to returnoffset
(integer, default: 0): Starting offset for pagination
Returns: JSON string containing namespace entries with titles, content previews, and pagination information.
search_with_filters - Search within ZIM file content with advanced filters
Required parameters:
zim_file_path
(string): Path to the ZIM filequery
(string): Search query term
Optional parameters:
namespace
(string): Optional namespace filter (C, M, W, X, etc.)content_type
(string): Optional content type filter (text/html, text/plain, etc.)limit
(integer, default: 10, range: 1-100): Maximum number of results to returnoffset
(integer, default: 0): Starting offset for pagination
Returns: Filtered search results with namespace and content type information.
get_search_suggestions - Get search suggestions and auto-complete
Required parameters:
zim_file_path
(string): Path to the ZIM filepartial_query
(string): Partial search query (minimum 2 characters)
Optional parameters:
limit
(integer, default: 10, range: 1-50): Maximum number of suggestions to return
Returns: JSON string containing search suggestions based on article titles and content.
get_article_structure - Extract article structure and metadata
Required parameters:
zim_file_path
(string): Path to the ZIM fileentry_path
(string): Entry path, e.g., 'C/Some_Article'
Returns: JSON string containing article structure including headings, sections, metadata, and word count.
extract_article_links - Extract internal and external links from an article
Required parameters:
zim_file_path
(string): Path to the ZIM fileentry_path
(string): Entry path, e.g., 'C/Some_Article'
Returns: JSON string containing categorized links (internal, external, media) with titles and metadata.
Examples
Listing ZIM files
Response:
Searching ZIM files
Response:
Getting ZIM entries
Response:
Smart Retrieval in Action
Example: Automatic path resolution
Response (showing smart retrieval working):
get_server_health - Get server health and statistics
No parameters required.
Returns:
- Server status and performance metrics
- Cache statistics
- Configuration information
- Instance tracking information
- Conflict detection results
Example Response:
get_server_configuration - Get detailed server configuration
No parameters required.
Returns: Comprehensive server configuration including diagnostics, validation results, and conflict detection.
Example Response:
diagnose_server_state - Comprehensive server diagnostics
No parameters required.
Returns: Detailed diagnostic information including instance conflicts, configuration validation, file accessibility checks, and actionable recommendations.
Example Response:
resolve_server_conflicts - Identify and resolve server conflicts
No parameters required.
Returns: Results of conflict resolution including cleanup actions and recommendations.
Example Response:
Additional Search Examples
Computer-related search:
Response:
Getting detailed content:
Response:
🎯 Advanced Knowledge Retrieval Examples
Getting ZIM metadata:
Response:
Browsing a namespace:
Response:
Filtered search:
Getting article structure:
Response:
Getting search suggestions:
Response:
🔧 Server Management and Diagnostics Examples
Getting server health:
Response:
Diagnosing server state:
Response:
Resolving server conflicts:
Response:
🎯 ZIM Entry Retrieval Best Practices
Smart Retrieval System
OpenZIM MCP implements an intelligent entry retrieval system that automatically handles path encoding inconsistencies common in ZIM files:
How It Works:
- Direct Access First: Attempts to retrieve the entry using the provided path exactly as given
- Automatic Fallback: If direct access fails, automatically searches for the entry using various search terms
- Path Mapping Cache: Caches successful path mappings to improve performance for repeated access
- Enhanced Error Guidance: Provides clear guidance when entries cannot be found
Benefits for LLM Users:
- Transparent Operation: No need to understand ZIM path encoding complexities
- Single Tool Call: Eliminates the need for manual search-first methodology
- Reliable Results: Consistent success across different path formats (spaces vs underscores, URL encoding, etc.)
- Performance Optimized: Cached mappings improve repeated access speed
Example Scenarios Handled Automatically:
A/Test Article
→A/Test_Article
(space to underscore conversion)C/Café
→C/Café
(URL encoding differences)A/Some-Page
→A/Some_Page
(hyphen to underscore conversion)
Usage Recommendations
For Direct Entry Access:
When Entry Not Found: The system will automatically provide guidance:
⚠️ Important Notes and Limitations
Content Length Requirements
- The
max_content_length
parameter forget_zim_entry
must be at least 1000 characters - Content longer than the specified limit will be truncated with a note showing the total character count
Search Behavior
- Search results may include articles that contain the search terms in various contexts
- Results are ranked by relevance but may not always be directly related to the primary meaning of the search term
- Search snippets provide a preview of the content but may not show the exact location where the search term appears
File Format Support
- Currently supports ZIM files (Zeno IMproved format)
- Tested with Wikipedia ZIM files (e.g.,
wikipedia_en_100_2025-08.zim
) - File paths must be properly escaped in JSON (use
\\
for Windows paths)
🔄 Multi-Server Instance Management
OpenZIM MCP includes advanced multi-server instance tracking and conflict detection to ensure reliable operation when multiple server instances are running.
Instance Tracking Features
- Automatic Instance Registration: Each server instance is automatically registered with a unique process ID and configuration hash
- Conflict Detection: Detects when multiple servers with different configurations are accessing the same directories
- Stale Instance Cleanup: Automatically identifies and cleans up orphaned instance files from terminated processes
- Configuration Validation: Ensures all server instances use compatible configurations
Conflict Types
- Configuration Mismatch: Multiple servers with different settings accessing the same directories
- Multiple Instances: Multiple servers running simultaneously (may cause confusion)
- Stale Instances: Orphaned instance files from terminated processes
Automatic Conflict Warnings
OpenZIM MCP automatically includes conflict warnings in search results and file listings when issues are detected:
Best Practices
- Use
diagnose_server_state()
regularly to check for conflicts - Run
resolve_server_conflicts()
to clean up stale instances - Ensure all server instances use the same configuration when accessing shared directories
- Monitor server health with
get_server_health()
for instance tracking information
🔧 Configuration
OpenZIM MCP supports configuration through environment variables with the OPENZIM_MCP_
prefix:
Configuration Options
Setting | Default | Description |
---|---|---|
OPENZIM_MCP_CACHE__ENABLED | true | Enable/disable caching |
OPENZIM_MCP_CACHE__MAX_SIZE | 100 | Maximum cache entries |
OPENZIM_MCP_CACHE__TTL_SECONDS | 3600 | Cache TTL in seconds |
OPENZIM_MCP_CONTENT__MAX_CONTENT_LENGTH | 100000 | Max content length |
OPENZIM_MCP_CONTENT__SNIPPET_LENGTH | 1000 | Max snippet length |
OPENZIM_MCP_CONTENT__DEFAULT_SEARCH_LIMIT | 10 | Default search result limit |
OPENZIM_MCP_LOGGING__LEVEL | INFO | Logging level |
OPENZIM_MCP_LOGGING__FORMAT | %(asctime)s - %(name)s - %(levelname)s - %(message)s | Log message format |
OPENZIM_MCP_SERVER_NAME | openzim-mcp | Server instance name |
🔒 Security Features
- Path Traversal Protection: Secure path validation prevents access outside allowed directories
- Input Sanitization: All user inputs are validated and sanitized
- Resource Management: Proper cleanup of ZIM archive resources
- Error Handling: Sanitized error messages prevent information disclosure
- Type Safety: Full type annotations prevent type-related vulnerabilities
🚀 Performance Features
- Intelligent Caching: LRU cache with TTL for frequently accessed content
- Resource Pooling: Efficient ZIM archive management
- Optimized Content Processing: Fast HTML to text conversion
- Lazy Loading: Components initialized only when needed
- Memory Management: Proper cleanup and resource management
🧪 Testing
The project includes comprehensive testing with 90%+ coverage using both mock data and real ZIM files:
Test Categories
- Unit Tests: Individual component testing with mocks
- Integration Tests: End-to-end functionality testing with real ZIM files
- Security Tests: Path traversal and input validation testing
- Performance Tests: Cache and resource management testing
- Format Compatibility: Testing with various ZIM file formats and versions
- Error Handling: Testing with invalid and malformed ZIM files
Test Infrastructure
OpenZIM MCP uses a hybrid testing approach:
- Mock-based tests: Fast unit tests using mocked libzim components
- Real ZIM file tests: Integration tests using official zim-testing-suite files
- Automatic test data management: Download and organize test files as needed
Test Data Sources
- Built-in test data: Basic test files included in the repository
- zim-testing-suite integration: Official test files from the OpenZIM project
- Environment variable support:
ZIM_TEST_DATA_DIR
for custom test data locations
Test Markers
Tests are organized with pytest markers:
@pytest.mark.requires_zim_data
: Tests requiring ZIM test data files@pytest.mark.integration
: Integration tests@pytest.mark.slow
: Long-running tests
📈 Monitoring
OpenZIM MCP provides built-in monitoring capabilities:
- Health Checks: Server health and status monitoring
- Cache Metrics: Cache hit rates and performance statistics
- Structured Logging: JSON-formatted logs for easy parsing
- Error Tracking: Comprehensive error logging and tracking
🔄 Versioning
This project uses Semantic Versioning with automated version management through release-please.
Automated Releases
Version bumps and releases are automated based on Conventional Commits:
feat:
- New features (minor version bump)fix:
- Bug fixes (patch version bump)feat!:
orBREAKING CHANGE:
- Breaking changes (major version bump)perf:
- Performance improvements (patch version bump)docs:
,style:
,refactor:
,test:
,chore:
- No version bump
Release Process
The project uses an improved, consolidated release system with automatic validation:
- Automatic (Recommended): Push conventional commits → Release Please creates PR → Merge PR → Automatic release
- Manual: Use GitHub Actions UI for direct control over releases
- Emergency: Push tags directly for critical fixes
Key Features:
- ✅ Zero-touch releases from main branch
- ✅ Automatic version synchronization validation
- ✅ Comprehensive testing before every release
- ✅ Improved error handling and rollback capabilities
- ✅ Branch protection prevents broken releases
For detailed instructions, see Release Process Guide.
Commit Message Format
Examples:
🤝 Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Make your changes
- Run tests (
make check
) - Use conventional commit messages (
git commit -m 'feat: add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
Development Guidelines
- Follow PEP 8 style guidelines
- Add type hints to all functions
- Write tests for new functionality
- Update documentation as needed
- Use conventional commit messages for automatic versioning
- Ensure all tests pass before submitting
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
This server cannot be installed
local-only server
The server can only run on the client's local machine because it depends on local resources.
Enables AI models to access and search offline Wikipedia and other knowledge bases stored in ZIM format files. Provides intelligent content retrieval, structured browsing, advanced search capabilities, and metadata extraction for comprehensive offline knowledge access.
- 🧠 Built for LLM Intelligence
- ✨ Features
- 🚀 Quick Start
- 🛠️ Development
- 📚 API Reference
- Available Tools
- list_zim_files - List all ZIM files in allowed directories
- search_zim_file - Search within ZIM file content
- get_zim_entry - Get detailed content of a specific entry in a ZIM file
- get_zim_metadata - Get ZIM file metadata from M namespace entries
- get_main_page - Get the main page entry from W namespace
- list_namespaces - List available namespaces and their entry counts
- browse_namespace - Browse entries in a specific namespace with pagination
- search_with_filters - Search within ZIM file content with advanced filters
- get_search_suggestions - Get search suggestions and auto-complete
- get_article_structure - Extract article structure and metadata
- extract_article_links - Extract internal and external links from an article
- Examples
- Listing ZIM files
- Searching ZIM files
- Getting ZIM entries
- Smart Retrieval in Action
- get_server_health - Get server health and statistics
- get_server_configuration - Get detailed server configuration
- diagnose_server_state - Comprehensive server diagnostics
- resolve_server_conflicts - Identify and resolve server conflicts
- Additional Search Examples
- 🎯 Advanced Knowledge Retrieval Examples
- 🔧 Server Management and Diagnostics Examples
- 🎯 ZIM Entry Retrieval Best Practices
- ⚠️ Important Notes and Limitations
- 🔄 Multi-Server Instance Management
- 🔧 Configuration
- 🔒 Security Features
- 🚀 Performance Features
- 🧪 Testing
- 📈 Monitoring
- 🔄 Versioning
- 🤝 Contributing
- 📄 License
- 🙏 Acknowledgments