Tika MCP Server
Extracts content and metadata from files using Apache Tika, supporting various file formats such as PDF, DOCX, and images with OCR.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Tika MCP Serverextract text from the uploaded report.pdf"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Tika MCP Server
This project provides a Model Context Protocol (MCP) server for extracting content and metadata from files using Apache Tika.
Overview
The Tika MCP server allows AI assistants to extract text and metadata from various file formats (PDF, DOCX, images with OCR, etc.) using Apache Tika. This enables AI assistants to understand and work with the content of files that users upload.
Features
Extract text content from various file formats
Extract metadata (author, creation date, etc.) from files
Support for PDF, DOCX, images, and many other formats
Simple JSON-RPC API following the Model Context Protocol
Requirements
Python 3.6+
Apache Tika server running (default: http://localhost:9998)
MCP-compatible client
Installation
Clone this repository
Install dependencies:
pip install -r requirements.txtRegister the MCP server:
python -m app.register_mcp_server
Usage
The Tika MCP server provides a single tool:
extract_file
Extracts content and metadata from a file using Apache Tika.
Parameters:
file_path: Path to the file to extract content fromtika_url: URL of the running Tika server (default: http://localhost:9998)
Returns:
metadata: Dictionary of metadata extracted from the filecontent: Array of content blocks extracted from the file
Testing
Several test scripts are provided to verify the functionality:
app/test_tika_simple.py: Tests the Tika client directlyapp/test_simple_mcp.py: Tests the MCP server using the JSON-RPC protocol
Project Structure
app/: Main application codesimple_mcp_server.py: MCP server implementationtika_client.py: Client for Apache Tikamodel.py: Data models and business logicregister_mcp_server.py: Script to register the MCP server
examples/: Example files for testingrequirements.txt: Python dependencies
Setup
Get a venv using either:
uv venvor
python3 -m venv .venvActivate the virtual environment and install dependencies:
source .venv/bin/activate
pip install -r requirements.txtRunning the MCP Server
Start the Apache Tika server (if not already running):
docker run -d -p 9998:9998 apache/tikaRegister and run the MCP server:
python -m app.register_mcp_serverLicense
MIT
This server cannot be installed
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/daveyproctor/tika-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server