Which integrations are available for this server?

Allows AI agents to describe images using the Ollama LLaVA vision model running locally.

How do I use Agent Helper?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Agent Helper Process the folder named 'receipts' and show me the extracted text." That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

Agent Helper

by wajirasls

Overview Schema Related Servers Score Discussions

Python

Local

Agent Helper

A local MCP server that gives AI agents the ability to process files — OCR images, extract text from PDFs and DOCX, and describe images using local vision models. All processing is done entirely on your machine.

Architecture

                     ┌──────────────────────┐
  AI Agent (MCP) ───▶│  MCP Server :5021    │
  (opencode, etc.)   │  FastMCP / SSE       │
                     └──────────┬───────────┘
                                │
                     ┌──────────▼───────────┐
                     │  Orchestrator         │
                     │  Routes files by type │
                     └──┬────┬────┬────┬────┘
                        │    │    │    │
                   ┌────▼┐ ┌▼───┐┌▼───┐┌▼────┐
                   │ OCR  │ │PDF ││DOCX││Vision│
                   │Tesser│ │MuPDF││py- ││Ollama│
                   │act   │ │    ││docx││Moon │
                   └──────┘ └────┘└────┘└─────┘

  Browser ─────▶ Management UI :5020
                 (FastAPI dashboard)

Related MCP server: KnowledgeBaseMCP

Features

Feature	Description
OCR	Extract text from images via Tesseract
PDF extraction	Text extraction from PDFs via PyMuPDF
DOCX extraction	Paragraph extraction from Word files
Vision (optional)	Describe images using Ollama LLaVA and/or Moondream (ONNX)
API key auth	Bearer token authentication for MCP clients, managed via web UI
Management dashboard	Web UI at port 5020 for settings, keys, job history, live logs
Job history	Results cached to disk, viewable in dashboard
Parallel processing	Files processed concurrently
Live logs	Stream logs to the dashboard without WebSockets

Requirements

Python 3.10+

Tesseract OCR (system package):

sudo apt install tesseract-ocr   # Debian/Ubuntu
brew install tesseract           # macOS

Ollama (optional, for vision):

curl -fsSL https://ollama.com/install.sh | sh
ollama pull llava

Quick start

git clone https://github.com/<your-user>/agent_helper.git
cd agent_helper

# One-shot setup:
chmod +x start.sh
./start.sh

# Or manually:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python main.py

Open http://127.0.0.1:5020 in your browser.

Systemd service (auto-start on boot)

cp agent-helper.service ~/.config/systemd/user/
systemctl --user daemon-reload
systemctl --user enable agent-helper
systemctl --user start agent-helper
loginctl enable-linger administrator  # keep running after logout

Ports

Port	Service	Access
5020	Management UI (FastAPI)	`http://127.0.0.1:5020`
5021	MCP Server (SSE)	`http://0.0.0.0:5021/sse`

Management dashboard

Visit http://127.0.0.1:5020:

MCP Server — Start, stop, restart the MCP server
Vision Backend — Toggle between OCR only / Ollama / Moondream / Both
API Keys — Create and revoke keys for MCP clients
Processing Folders — Browse Processing/ subfolders
Job History — View past processing jobs
Health Panel — Check Tesseract, Ollama, Moondream status
Live Logs — Scrollable log stream

MCP tools (for AI agents)

Connect your AI agent (opencode, Claude Code, etc.) to http://localhost:5021/sse with a bearer token.

`process_folder(folder_name)`

Process all files in Processing/<folder_name>/.

If the folder doesn't exist, it's created and the agent is told to place files there
If it exists, all supported files are processed and text/descriptions are returned

`process_file(folder_name, filename)`

Process a single file within a subfolder.

`list_folders()`

List all subfolders in Processing/.

`list_files(folder_name)`

List files in a specific subfolder.

opencode configuration

Add to your opencode.json or ~/.config/opencode/opencode.json:

{
  "$schema": "https://opencode.ai/config.json",
  "mcp": {
    "agent_helper": {
      "type": "remote",
      "url": "http://localhost:5021/sse",
      "headers": {
        "Authorization": "Bearer <your-api-key>"
      },
      "enabled": true
    }
  }
}

File processing support

Extension	Processor	Output
`.jpg`, `.png`, `.webp`, `.bmp`, `.tiff`	Tesseract OCR + optional vision	Extracted text + image description
`.pdf`	PyMuPDF	Extracted text per page
`.docx`	python-docx	Extracted paragraphs
`.txt`, `.md`, `.csv`, `.json`, `.xml`	Direct read	Raw file content

Vision backends

Mode	Backend	Notes
`ocr` (default)	Tesseract only	No vision model needed
`ollama`	LLaVA via Ollama	Requires Ollama running locally
`moondream`	Moondream ONNX	Pure Python, no external service
`both`	Ollama → Moondream fallback	Tries Ollama first, falls back to Moondream

Project structure

agent_helper/
├── config.py                 # Settings management (persisted to JSON)
├── logger.py                 # Ring buffer logger (500 lines, polled by UI)
├── auth.py                   # API key management (SHA-256 hashed)
├── main.py                   # Entry point
├── mcp_server.py             # FastMCP server on port 5021
├── processor_orchestrator.py # File routing + parallel processing
├── processors/
│   ├── image.py              # Tesseract OCR
│   ├── vision.py             # Ollama + Moondream (ONNX) vision
│   ├── pdf.py                # PyMuPDF text extraction
│   └── docx.py               # python-docx parsing
├── management_ui/
│   ├── app.py                # FastAPI dashboard on port 5020
│   └── templates/
│       └── dashboard.html    # HTMX dark-theme dashboard
├── Processing/               # Watch folder (created on first run)
├── data/                     # Settings & API keys (persisted)
├── logs/                     # Log output
├── requirements.txt
├── start.sh
└── agent-helper.service      # systemd user service

License

MIT

This server cannot be installed

license - not found

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/wajirasls/agent_helper'

If you have feedback or need assistance with the MCP directory API, please join our Discord server