Visual Document Forensics MCP Server
Designed to be called by a UiPath agent for deterministic visual and structural analysis of PDF and DOCX documents.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Visual Document Forensics MCP Serveranalyze the contract PDF for blur and OCR anomalies"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Visual Document Forensics MCP Server
A production-ready Python MCP server that performs deterministic visual and structural analysis of PDF and DOCX documents.
It is designed to be called by a UiPath agent. The agent does the reasoning and decision-making; this server does only one thing: extract measurable visual evidence that UiPath Analyze File cannot reliably provide.
The server never determines fraud. It reports metrics and measurable anomalies (blur, OCR confidence, effective DPI, image stretch, font usage, structural inconsistencies) and leaves all interpretation to the agent.
Guarantees
All processing is local and deterministic. The server does not use:
LLMs or VLMs
External APIs or API keys
Embeddings, vector databases, or retrieval systems
Any cloud service or network access
The only optional external component is the local Tesseract OCR binary. If it is absent, the server degrades gracefully (OCR metrics are skipped and a warning is emitted) — it never fails because of it.
Related MCP server: MCP PDF
Architecture
Document
↓
MCP Tool Call src/server/app.py (FastMCP, stdio transport)
↓
Document Loader src/render/loader.py (detect PDF/DOCX, open handles)
↓
Page Renderer src/render/pdf_renderer.py (PyMuPDF, configurable DPI)
↓
Tile Generator src/tiling/tiler.py (overlapping tiles)
↓
Visual Analysis Engine src/analyzers/* (blur, sharpness, contrast,
↓ entropy, edge density, noise,
↓ OCR confidence, text density)
PDF Structure Engine src/analyzers/pdf_structure.py (images, scaling, DPI,
↓ fonts, objects; pikepdf check)
Evidence Aggregator src/tools/analyze.py (detectors + rollup)
↓
JSON Response src/schemas/models.py (pydantic-validated contract)
↓
UiPath AgentProject layout
visual-forensics-mcp/
├── src/
│ ├── server/ # FastMCP server (analyze_document tool)
│ ├── tools/ # analysis orchestrator (the pipeline)
│ ├── render/ # document loader + PDF/DOCX rendering
│ ├── tiling/ # overlapping tile generation
│ ├── analyzers/ # deterministic metric computation
│ ├── detectors/ # anomaly detection from metrics
│ ├── schemas/ # pydantic output contract
│ └── utils/ # config, logging, geometry, image ops
├── tests/ # pytest suite (unit + end-to-end)
├── configs/ # default.yaml (all thresholds live here)
├── examples/ # sample docs + example request/response
├── requirements.txt
├── pyproject.toml
└── README.mdAnalyzer & detector modules
Analyzers ( | Detectors ( |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
| |
|
Installation
Requires Python 3.11+.
# 1. Create and activate a virtual environment (recommended)
python -m venv .venv
# Windows (PowerShell)
.venv\Scripts\Activate.ps1
# macOS / Linux
source .venv/bin/activate
# 2. Install dependencies
pip install -r requirements.txtTesseract OCR (optional but recommended)
OCR-based metrics (ocr_confidence) and the OCR-confidence detector require the
Tesseract binary. The Python wheel (pytesseract) only wraps it.
Windows: install the UB Mannheim build, then either add it to
PATHor setocr.tesseract_cmdinconfigs/default.yaml(or via request options) to e.g.C:\\Program Files\\Tesseract-OCR\\tesseract.exe.macOS:
brew install tesseractDebian/Ubuntu:
sudo apt-get install tesseract-ocr
Without Tesseract, everything else still runs; the response simply includes a
warning and summary.ocr_available = false.
Configuration
All tunables — including every threshold — live in
configs/default.yaml. There are no hardcoded
thresholds anywhere in the analysis or detection code.
Key settings:
render:
dpi: 400 # default render resolution
tiling:
tile_size: 512
tile_overlap: 0.20
features:
enable_ocr: true
enable_pdf_analysis: trueThree ways to configure, in increasing precedence:
Edit
configs/default.yaml.Point the server at another file via the
VISUAL_FORENSICS_CONFIGenvironment variable.Pass an
optionsdict per request (deep-merged over the file), e.g.{"render": {"dpi": 300}, "features": {"enable_ocr": false}}.
Running the MCP server
python -m src.server.appThis starts a FastMCP server on the stdio transport — the standard way MCP clients (including UiPath) launch and communicate with a local server.
A console script is also installed:
visual-forensics-mcpMCP usage
The server exposes a single primary tool:
analyze_document(document_path: str, options: dict | None = None)It returns a JSON object with this exact top-level shape:
{
"document_id": "...",
"document_type": "...",
"summary": {},
"page_results": [],
"document_findings": [],
"warnings": [],
"errors": []
}Every finding (in page_results[].findings and document_findings) has the
required schema:
{
"type": "...",
"page": 0,
"bbox": [0, 0, 0, 0],
"metrics": {},
"confidence": 0.0,
"explanation": "..."
}bbox values are in rendered pixel coordinates at the page's render DPI.
Document-level findings (e.g. font_mismatch) use page: 0 and a zero bbox.
Finding types produced
Type | Trigger (configurable) |
| Tile sharpness far below the page distribution |
| Tile OCR confidence far below page average |
| Embedded image effective DPI far below render DPI |
| Embedded image scaled non-uniformly ( |
| Tile noise signature unlike its neighbours |
| Rare fonts / too many font families |
| Raster region inside an otherwise vector page |
| Strongly overlapping / stacked images |
UiPath integration
UiPath agents can call MCP tools directly. Configure this server as a local stdio MCP server in your UiPath agent / MCP client configuration:
{
"mcpServers": {
"visual-forensics": {
"command": "C:/path/to/visual-forensics-mcp/.venv/Scripts/python.exe",
"args": ["-m", "src.server.app"],
"cwd": "C:/path/to/visual-forensics-mcp",
"env": {
// optional: point at a custom config
"VISUAL_FORENSICS_CONFIG": "C:/path/to/custom.yaml"
}
}
}
}Typical agent flow:
UiPath downloads / locates the document and resolves a local file path.
The agent calls
analyze_documentwith that path (and optionaloptions).The agent reads
summary,page_results[].findings, anddocument_findings, and applies its own business rules / human-in-the-loop logic to decide what to do next.
Because the server is deterministic and offline, the same document always yields the same evidence — ideal for auditable, repeatable RPA workflows.
Example request
{
"tool": "analyze_document",
"arguments": {
"document_path": "C:/docs/invoice.pdf",
"options": { "render": { "dpi": 200 }, "features": { "enable_ocr": true } }
}
}Example response (abridged)
For the bundled examples/sample.pdf (a text page with a stretched, low-resolution
raster insert and one rare-font line):
{
"document_id": "5f34550e05450828",
"document_type": "pdf",
"summary": {
"page_count": 1,
"tile_count": 20,
"finding_count": 4,
"findings_by_type": {
"resolution_anomaly": 1,
"image_stretch": 1,
"raster_in_vector_anomaly": 1,
"font_mismatch": 1
},
"ocr_available": false,
"pdf_structure_available": true
},
"page_results": [
{
"page": 1,
"width": 1700,
"height": 2200,
"dpi": 200.0,
"is_vector": true,
"findings": [
{
"type": "image_stretch",
"page": 1,
"bbox": [888.89, 833.33, 1444.44, 1166.67],
"metrics": { "scale_x": 5.0, "scale_y": 3.0, "stretch_ratio": 0.4, "xref": 32 },
"confidence": 0.4,
"explanation": "Embedded image is scaled non-uniformly (horizontal and vertical scale factors differ), distorting the image."
}
]
}
],
"document_findings": [
{
"type": "font_mismatch",
"page": 0,
"bbox": [0, 0, 0, 0],
"metrics": { "font": "Times-Roman", "span_count": 1, "share": 0.04 },
"confidence": 0.6,
"explanation": "A font family is used on only a small fraction of text spans, an unusual change versus the dominant fonts."
}
],
"warnings": ["Tesseract OCR binary not available; OCR metrics and the OCR-confidence detector are disabled for this run."],
"errors": []
}Regenerate the full sample with:
python -m examples.generate_examplesTesting
pytest -qThe suite covers PDF rendering, DOCX loading, tile generation, blur computation, OCR confidence extraction (graceful path), image-scaling detection, font extraction, schema validation, MCP invocation, and a complete end-to-end run for PDF, DOCX, and scanned/image-only PDF.
How metrics are computed (deterministic definitions)
Blur — variance of the Laplacian (lower ⇒ blurrier).
Sharpness — Tenengrad (mean squared Sobel gradient) and mean gradient.
Contrast — RMS (intensity std) and Michelson
(max−min)/(max+min).Entropy — Shannon entropy (bits) of the 256-bin intensity histogram.
Edge density — fraction of Canny edge pixels.
Noise — Immerkaer fast σ estimate and median-residual std.
OCR confidence — mean Tesseract word confidence, normalised to 0–1.
Text density — foreground fraction after adaptive binarisation.
Effective DPI —
original_px × 72 / displayed_pointsper axis.Scale / stretch —
displayed_points / original_pxper axis; stretch is|scale_x − scale_y| / max(scale_x, scale_y).
Anomalies are flagged using per-page z-scores and/or absolute floors, all read
from configs/default.yaml. Confidence is a monotonic function of the measured
deviation, not a probability of fraud.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/PranayK07/visual-forensics-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server