Provides access to ZIM format knowledge bases created by Kiwix, enabling AI agents to search and retrieve content from offline Wikipedia and other reference materials stored in compressed ZIM archives.
Allows access to Wikibooks content stored in ZIM format archives, enabling AI agents to search and retrieve educational content from offline Wikibooks collections.
Enables AI agents to access Wikipedia content stored in offline ZIM archives, providing tools for searching articles, browsing by namespace, extracting article structure, and retrieving detailed content without requiring internet connectivity.
OpenZIM MCP Server
๐ง Built for LLM Intelligence
OpenZIM MCP transforms static ZIM archives into dynamic knowledge engines for Large Language Models. Unlike basic file readers, this tool provides intelligent, structured access that LLMs need to effectively navigate and understand vast knowledge repositories.
๐ Why LLMs Love OpenZIM MCP:
Smart Navigation: Browse by namespace (articles, metadata, media) instead of blind searching
Context-Aware Discovery: Get article structure, relationships, and metadata for deeper understanding
Intelligent Search: Advanced filtering, auto-complete suggestions, and relevance-ranked results
Performance Optimized: Cached operations and pagination prevent timeouts on massive archives
Relationship Mapping: Extract internal/external links to understand content connections
Whether you're building a research assistant, knowledge chatbot, or content analysis system, OpenZIM MCP gives your LLM the structured access patterns it needs to unlock the full potential of offline knowledge archives. No more fumbling through raw text dumps! ๐ฏ
OpenZIM MCP is a modern, secure, and high-performance MCP (Model Context Protocol) server that enables AI models to access and search ZIM format knowledge bases offline.
ZIM (Zeno IMproved) is an open file format developed by the openZIM project, designed specifically for offline storage and access to website content. The format supports high compression rates using Zstandard compression (default since 2021) and enables fast full-text searching, making it ideal for storing entire Wikipedia content and other large reference materials in relatively compact files. The openZIM project is sponsored by Wikimedia CH and supported by the Wikimedia Foundation, ensuring the format's continued development and adoption for offline knowledge access, especially in environments without reliable internet connectivity.
โจ Features
๐ Security First: Comprehensive input validation and path traversal protection
โก High Performance: Intelligent caching and optimized ZIM file operations
๐ง Smart Retrieval: Automatic fallback from direct access to search-based retrieval for reliable entry access
๐งช Well Tested: 90%+ test coverage with comprehensive test suite
๐๏ธ Modern Architecture: Modular design with dependency injection
๐ Type Safe: Full type annotations throughout the codebase
๐ง Configurable: Flexible configuration with validation
๐ Observable: Structured logging and health monitoring
๐ Quick Start
Installation
Development Installation
For contributors and developers:
Prepare ZIM Files
Download ZIM files (e.g., Wikipedia, Wiktionary, etc.) from the Kiwix Library and place them in a directory:
Running the Server
MCP Configuration
Add to your MCP client configuration:
Alternative configuration using Python module:
For development (from source):
๐ ๏ธ Development
Running Tests
ZIM Test Data Integration
OpenZIM MCP integrates with the official zim-testing-suite for comprehensive testing with real ZIM files:
The test data includes:
Basic files: Small ZIM files for essential testing
Real content: Actual Wikipedia/Wikibooks content for integration testing
Invalid files: Malformed ZIM files for error handling testing
Special cases: Embedded content, split files, and edge cases
Test files are automatically organized by category and priority level.
Code Quality
Project Structure
๐ API Reference
Available Tools
list_zim_files - List all ZIM files in allowed directories
No parameters required.
search_zim_file - Search within ZIM file content
Required parameters:
zim_file_path
(string): Path to the ZIM filequery
(string): Search query term
Optional parameters:
limit
(integer, default: 10): Maximum number of results to returnoffset
(integer, default: 0): Starting offset for results (for pagination)
get_zim_entry - Get detailed content of a specific entry in a ZIM file
Required parameters:
zim_file_path
(string): Path to the ZIM fileentry_path
(string): Entry path, e.g., 'A/Some_Article'
Optional parameters:
max_content_length
(integer, default: 100000, minimum: 1000): Maximum length of returned content
Smart Retrieval Features:
Automatic Fallback: If direct path access fails, automatically searches for the entry and uses the exact path found
Path Mapping Cache: Caches successful path mappings for improved performance on repeated access
Enhanced Error Guidance: Provides clear guidance when entries cannot be found, suggesting alternative approaches
Transparent Operation: Works seamlessly regardless of path encoding differences (spaces vs underscores, URL encoding, etc.)
get_zim_metadata - Get ZIM file metadata from M namespace entries
Required parameters:
zim_file_path
(string): Path to the ZIM file
Returns: JSON string containing ZIM metadata including entry counts, archive information, and metadata entries like title, description, language, creator, etc.
get_main_page - Get the main page entry from W namespace
Required parameters:
zim_file_path
(string): Path to the ZIM file
Returns: Main page content or information about the main page entry.
list_namespaces - List available namespaces and their entry counts
Required parameters:
zim_file_path
(string): Path to the ZIM file
Returns: JSON string containing namespace information with entry counts, descriptions, and sample entries for each namespace (C, M, W, X, etc.).
browse_namespace - Browse entries in a specific namespace with pagination
Required parameters:
zim_file_path
(string): Path to the ZIM filenamespace
(string): Namespace to browse (C, M, W, X, A, I, etc.)
Optional parameters:
limit
(integer, default: 50, range: 1-200): Maximum number of entries to returnoffset
(integer, default: 0): Starting offset for pagination
Returns: JSON string containing namespace entries with titles, content previews, and pagination information.
search_with_filters - Search within ZIM file content with advanced filters
Required parameters:
zim_file_path
(string): Path to the ZIM filequery
(string): Search query term
Optional parameters:
namespace
(string): Optional namespace filter (C, M, W, X, etc.)content_type
(string): Optional content type filter (text/html, text/plain, etc.)limit
(integer, default: 10, range: 1-100): Maximum number of results to returnoffset
(integer, default: 0): Starting offset for pagination
Returns: Filtered search results with namespace and content type information.
get_search_suggestions - Get search suggestions and auto-complete
Required parameters:
zim_file_path
(string): Path to the ZIM filepartial_query
(string): Partial search query (minimum 2 characters)
Optional parameters:
limit
(integer, default: 10, range: 1-50): Maximum number of suggestions to return
Returns: JSON string containing search suggestions based on article titles and content.
get_article_structure - Extract article structure and metadata
Required parameters:
zim_file_path
(string): Path to the ZIM fileentry_path
(string): Entry path, e.g., 'C/Some_Article'
Returns: JSON string containing article structure including headings, sections, metadata, and word count.
extract_article_links - Extract internal and external links from an article
Required parameters:
zim_file_path
(string): Path to the ZIM fileentry_path
(string): Entry path, e.g., 'C/Some_Article'
Returns: JSON string containing categorized links (internal, external, media) with titles and metadata.
Examples
Listing ZIM files
Response:
Searching ZIM files
Response:
Getting ZIM entries
Response:
Smart Retrieval in Action
Example: Automatic path resolution
Response (showing smart retrieval working):
get_server_health - Get server health and statistics
No parameters required.
Returns:
Server status and performance metrics
Cache statistics
Configuration information
Instance tracking information
Conflict detection results
Example Response:
get_server_configuration - Get detailed server configuration
No parameters required.
Returns: Comprehensive server configuration including diagnostics, validation results, and conflict detection.
Example Response:
diagnose_server_state - Comprehensive server diagnostics
No parameters required.
Returns: Detailed diagnostic information including instance conflicts, configuration validation, file accessibility checks, and actionable recommendations.
Example Response:
resolve_server_conflicts - Identify and resolve server conflicts
No parameters required.
Returns: Results of conflict resolution including cleanup actions and recommendations.
Example Response:
Additional Search Examples
Computer-related search:
Response:
Getting detailed content:
Response:
๐ฏ Advanced Knowledge Retrieval Examples
Getting ZIM metadata:
Response:
Browsing a namespace:
Response:
Filtered search:
Getting article structure:
Response:
Getting search suggestions:
Response:
๐ง Server Management and Diagnostics Examples
Getting server health:
Response:
Diagnosing server state:
Response:
Resolving server conflicts:
Response:
๐ฏ ZIM Entry Retrieval Best Practices
Smart Retrieval System
OpenZIM MCP implements an intelligent entry retrieval system that automatically handles path encoding inconsistencies common in ZIM files:
How It Works:
Direct Access First: Attempts to retrieve the entry using the provided path exactly as given
Automatic Fallback: If direct access fails, automatically searches for the entry using various search terms
Path Mapping Cache: Caches successful path mappings to improve performance for repeated access
Enhanced Error Guidance: Provides clear guidance when entries cannot be found
Benefits for LLM Users:
Transparent Operation: No need to understand ZIM path encoding complexities
Single Tool Call: Eliminates the need for manual search-first methodology
Reliable Results: Consistent success across different path formats (spaces vs underscores, URL encoding, etc.)
Performance Optimized: Cached mappings improve repeated access speed
Example Scenarios Handled Automatically:
A/Test Article
โA/Test_Article
(space to underscore conversion)C/Cafรฉ
โC/Caf%C3%A9
(URL encoding differences)A/Some-Page
โA/Some_Page
(hyphen to underscore conversion)
Usage Recommendations
For Direct Entry Access:
When Entry Not Found: The system will automatically provide guidance:
โ ๏ธ Important Notes and Limitations
Content Length Requirements
The
max_content_length
parameter forget_zim_entry
must be at least 1000 charactersContent longer than the specified limit will be truncated with a note showing the total character count
Search Behavior
Search results may include articles that contain the search terms in various contexts
Results are ranked by relevance but may not always be directly related to the primary meaning of the search term
Search snippets provide a preview of the content but may not show the exact location where the search term appears
File Format Support
Currently supports ZIM files (Zeno IMproved format)
Tested with Wikipedia ZIM files (e.g.,
wikipedia_en_100_2025-08.zim
)File paths must be properly escaped in JSON (use
\\
for Windows paths)
๐ Multi-Server Instance Management
OpenZIM MCP includes advanced multi-server instance tracking and conflict detection to ensure reliable operation when multiple server instances are running.
Instance Tracking Features
Automatic Instance Registration: Each server instance is automatically registered with a unique process ID and configuration hash
Conflict Detection: Detects when multiple servers with different configurations are accessing the same directories
Stale Instance Cleanup: Automatically identifies and cleans up orphaned instance files from terminated processes
Configuration Validation: Ensures all server instances use compatible configurations
Conflict Types
Configuration Mismatch: Multiple servers with different settings accessing the same directories
Multiple Instances: Multiple servers running simultaneously (may cause confusion)
Stale Instances: Orphaned instance files from terminated processes
Automatic Conflict Warnings
OpenZIM MCP automatically includes conflict warnings in search results and file listings when issues are detected:
Best Practices
Use
diagnose_server_state()
regularly to check for conflictsRun
resolve_server_conflicts()
to clean up stale instancesEnsure all server instances use the same configuration when accessing shared directories
Monitor server health with
get_server_health()
for instance tracking information
๐ง Configuration
OpenZIM MCP supports configuration through environment variables with the OPENZIM_MCP_
prefix:
Configuration Options
Setting | Default | Description |
|
| Enable/disable caching |
|
| Maximum cache entries |
|
| Cache TTL in seconds |
|
| Max content length |
|
| Max snippet length |
|
| Default search result limit |
|
| Logging level |
|
| Log message format |
|
| Server instance name |
๐ Security Features
Path Traversal Protection: Secure path validation prevents access outside allowed directories
Input Sanitization: All user inputs are validated and sanitized
Resource Management: Proper cleanup of ZIM archive resources
Error Handling: Sanitized error messages prevent information disclosure
Type Safety: Full type annotations prevent type-related vulnerabilities
๐ Performance Features
Intelligent Caching: LRU cache with TTL for frequently accessed content
Resource Pooling: Efficient ZIM archive management
Optimized Content Processing: Fast HTML to text conversion
Lazy Loading: Components initialized only when needed
Memory Management: Proper cleanup and resource management
๐งช Testing
The project includes comprehensive testing with 90%+ coverage using both mock data and real ZIM files:
Test Categories
Unit Tests: Individual component testing with mocks
Integration Tests: End-to-end functionality testing with real ZIM files
Security Tests: Path traversal and input validation testing
Performance Tests: Cache and resource management testing
Format Compatibility: Testing with various ZIM file formats and versions
Error Handling: Testing with invalid and malformed ZIM files
Test Infrastructure
OpenZIM MCP uses a hybrid testing approach:
Mock-based tests: Fast unit tests using mocked libzim components
Real ZIM file tests: Integration tests using official zim-testing-suite files
Automatic test data management: Download and organize test files as needed
Test Data Sources
Built-in test data: Basic test files included in the repository
zim-testing-suite integration: Official test files from the OpenZIM project
Environment variable support:
ZIM_TEST_DATA_DIR
for custom test data locations
Test Markers
Tests are organized with pytest markers:
@pytest.mark.requires_zim_data
: Tests requiring ZIM test data files@pytest.mark.integration
: Integration tests@pytest.mark.slow
: Long-running tests
๐ Monitoring
OpenZIM MCP provides built-in monitoring capabilities:
Health Checks: Server health and status monitoring
Cache Metrics: Cache hit rates and performance statistics
Structured Logging: JSON-formatted logs for easy parsing
Error Tracking: Comprehensive error logging and tracking
๐ Versioning
This project uses Semantic Versioning with automated version management through release-please.
Automated Releases
Version bumps and releases are automated based on Conventional Commits:
feat:
- New features (minor version bump)fix:
- Bug fixes (patch version bump)feat!:
orBREAKING CHANGE:
- Breaking changes (major version bump)perf:
- Performance improvements (patch version bump)docs:
,style:
,refactor:
,test:
,chore:
- No version bump
Release Process
The project uses an improved, consolidated release system with automatic validation:
Automatic (Recommended): Push conventional commits โ Release Please creates PR โ Merge PR โ Automatic release
Manual: Use GitHub Actions UI for direct control over releases
Emergency: Push tags directly for critical fixes
Key Features:
โ Zero-touch releases from main branch
โ Automatic version synchronization validation
โ Comprehensive testing before every release
โ Improved error handling and rollback capabilities
โ Branch protection prevents broken releases
For detailed instructions, see Release Process Guide.
Commit Message Format
Examples:
๐ค Contributing
Fork the repository
Create a feature branch (
git checkout -b feature/amazing-feature
)Make your changes
Run tests (
make check
)Use conventional commit messages (
git commit -m 'feat: add amazing feature'
)Push to the branch (
git push origin feature/amazing-feature
)Open a Pull Request
Development Guidelines
Follow PEP 8 style guidelines
Add type hints to all functions
Write tests for new functionality
Update documentation as needed
Use conventional commit messages for automatic versioning
Ensure all tests pass before submitting
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
This server cannot be installed
local-only server
The server can only run on the client's local machine because it depends on local resources.
Enables AI models to access and search offline Wikipedia and other knowledge bases stored in ZIM format files. Provides intelligent content retrieval, structured browsing, advanced search capabilities, and metadata extraction for comprehensive offline knowledge access.
- ๐ง Built for LLM Intelligence
- โจ Features
- ๐ Quick Start
- ๐ ๏ธ Development
- ๐ API Reference
- Available Tools
- list_zim_files - List all ZIM files in allowed directories
- search_zim_file - Search within ZIM file content
- get_zim_entry - Get detailed content of a specific entry in a ZIM file
- get_zim_metadata - Get ZIM file metadata from M namespace entries
- get_main_page - Get the main page entry from W namespace
- list_namespaces - List available namespaces and their entry counts
- browse_namespace - Browse entries in a specific namespace with pagination
- search_with_filters - Search within ZIM file content with advanced filters
- get_search_suggestions - Get search suggestions and auto-complete
- get_article_structure - Extract article structure and metadata
- extract_article_links - Extract internal and external links from an article
- Examples
- Listing ZIM files
- Searching ZIM files
- Getting ZIM entries
- Smart Retrieval in Action
- get_server_health - Get server health and statistics
- get_server_configuration - Get detailed server configuration
- diagnose_server_state - Comprehensive server diagnostics
- resolve_server_conflicts - Identify and resolve server conflicts
- Additional Search Examples
- ๐ฏ Advanced Knowledge Retrieval Examples
- ๐ง Server Management and Diagnostics Examples
- ๐ฏ ZIM Entry Retrieval Best Practices
- โ ๏ธ Important Notes and Limitations
- ๐ Multi-Server Instance Management
- ๐ง Configuration
- ๐ Security Features
- ๐ Performance Features
- ๐งช Testing
- ๐ Monitoring
- ๐ Versioning
- ๐ค Contributing
- ๐ License
- ๐ Acknowledgments