Agent Helper
Allows AI agents to describe images using the Ollama LLaVA vision model running locally.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Agent HelperProcess the folder named 'receipts' and show me the extracted text."
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Agent Helper
A local MCP server that gives AI agents the ability to process files — OCR images, extract text from PDFs and DOCX, and describe images using local vision models. All processing is done entirely on your machine.
Architecture
┌──────────────────────┐
AI Agent (MCP) ───▶│ MCP Server :5021 │
(opencode, etc.) │ FastMCP / SSE │
└──────────┬───────────┘
│
┌──────────▼───────────┐
│ Orchestrator │
│ Routes files by type │
└──┬────┬────┬────┬────┘
│ │ │ │
┌────▼┐ ┌▼───┐┌▼───┐┌▼────┐
│ OCR │ │PDF ││DOCX││Vision│
│Tesser│ │MuPDF││py- ││Ollama│
│act │ │ ││docx││Moon │
└──────┘ └────┘└────┘└─────┘
Browser ─────▶ Management UI :5020
(FastAPI dashboard)Related MCP server: KnowledgeBaseMCP
Features
Feature | Description |
OCR | Extract text from images via Tesseract |
PDF extraction | Text extraction from PDFs via PyMuPDF |
DOCX extraction | Paragraph extraction from Word files |
Vision (optional) | Describe images using Ollama LLaVA and/or Moondream (ONNX) |
API key auth | Bearer token authentication for MCP clients, managed via web UI |
Management dashboard | Web UI at port 5020 for settings, keys, job history, live logs |
Job history | Results cached to disk, viewable in dashboard |
Parallel processing | Files processed concurrently |
Live logs | Stream logs to the dashboard without WebSockets |
Requirements
Python 3.10+
Tesseract OCR (system package):
sudo apt install tesseract-ocr # Debian/Ubuntu brew install tesseract # macOSOllama (optional, for vision):
curl -fsSL https://ollama.com/install.sh | sh ollama pull llava
Quick start
git clone https://github.com/<your-user>/agent_helper.git
cd agent_helper
# One-shot setup:
chmod +x start.sh
./start.sh
# Or manually:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python main.pyOpen http://127.0.0.1:5020 in your browser.
Systemd service (auto-start on boot)
cp agent-helper.service ~/.config/systemd/user/
systemctl --user daemon-reload
systemctl --user enable agent-helper
systemctl --user start agent-helper
loginctl enable-linger administrator # keep running after logoutPorts
Port | Service | Access |
5020 | Management UI (FastAPI) |
|
5021 | MCP Server (SSE) |
|
Management dashboard
Visit http://127.0.0.1:5020:
MCP Server — Start, stop, restart the MCP server
Vision Backend — Toggle between OCR only / Ollama / Moondream / Both
API Keys — Create and revoke keys for MCP clients
Processing Folders — Browse
Processing/subfoldersJob History — View past processing jobs
Health Panel — Check Tesseract, Ollama, Moondream status
Live Logs — Scrollable log stream
MCP tools (for AI agents)
Connect your AI agent (opencode, Claude Code, etc.) to http://localhost:5021/sse with a bearer token.
process_folder(folder_name)
Process all files in Processing/<folder_name>/.
If the folder doesn't exist, it's created and the agent is told to place files there
If it exists, all supported files are processed and text/descriptions are returned
process_file(folder_name, filename)
Process a single file within a subfolder.
list_folders()
List all subfolders in Processing/.
list_files(folder_name)
List files in a specific subfolder.
opencode configuration
Add to your opencode.json or ~/.config/opencode/opencode.json:
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"agent_helper": {
"type": "remote",
"url": "http://localhost:5021/sse",
"headers": {
"Authorization": "Bearer <your-api-key>"
},
"enabled": true
}
}
}File processing support
Extension | Processor | Output |
| Tesseract OCR + optional vision | Extracted text + image description |
| PyMuPDF | Extracted text per page |
| python-docx | Extracted paragraphs |
| Direct read | Raw file content |
Vision backends
Mode | Backend | Notes |
| Tesseract only | No vision model needed |
| LLaVA via Ollama | Requires Ollama running locally |
| Moondream ONNX | Pure Python, no external service |
| Ollama → Moondream fallback | Tries Ollama first, falls back to Moondream |
Project structure
agent_helper/
├── config.py # Settings management (persisted to JSON)
├── logger.py # Ring buffer logger (500 lines, polled by UI)
├── auth.py # API key management (SHA-256 hashed)
├── main.py # Entry point
├── mcp_server.py # FastMCP server on port 5021
├── processor_orchestrator.py # File routing + parallel processing
├── processors/
│ ├── image.py # Tesseract OCR
│ ├── vision.py # Ollama + Moondream (ONNX) vision
│ ├── pdf.py # PyMuPDF text extraction
│ └── docx.py # python-docx parsing
├── management_ui/
│ ├── app.py # FastAPI dashboard on port 5020
│ └── templates/
│ └── dashboard.html # HTMX dark-theme dashboard
├── Processing/ # Watch folder (created on first run)
├── data/ # Settings & API keys (persisted)
├── logs/ # Log output
├── requirements.txt
├── start.sh
└── agent-helper.service # systemd user serviceLicense
MIT
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/wajirasls/agent_helper'
If you have feedback or need assistance with the MCP directory API, please join our Discord server