Which integrations are available for this server?

Designed to be called by a UiPath agent for deterministic visual and structural analysis of PDF and DOCX documents.

How do I use Visual Document Forensics MCP Server?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Visual Document Forensics MCP Server analyze the contract PDF for blur and OCR anomalies" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

Visual Document Forensics MCP Server

by PranayK07

Overview Schema Related Servers Score Discussions

Python

Local

Visual Document Forensics MCP Server

A production-ready Python MCP server that performs deterministic visual and structural analysis of PDF, DOCX, and raster-image evidence. Supported raster formats are PNG, JPEG, TIFF, BMP, WebP, and GIF (including multi-frame files).

It is designed to be called by a UiPath agent. The agent does the reasoning and decision-making; this server does only one thing: extract measurable visual evidence that UiPath Analyze File cannot reliably provide.

The server reports metrics and measurable anomalies (blur, OCR confidence, effective DPI, image stretch, font usage, structural inconsistencies) and leaves all interpretation and decision-making to the downstream agent.

Guarantees

All processing is local and deterministic. The server does not use:

LLMs or VLMs
External APIs or API keys
Embeddings, vector databases, or retrieval systems
Any cloud service or network access

The only optional external component is the local Tesseract OCR binary. If it is absent, the server degrades gracefully (OCR metrics are skipped and a warning is emitted) — it never fails because of it.

Related MCP server: MCP PDF

Architecture

Document
    ↓
MCP Tool Call            src/server/app.py        (FastMCP, stdio transport)
    ↓
Document Loader          src/render/loader.py     (detect PDF/DOCX/raster by signature)
    ↓
Page Renderer            src/render/pdf_renderer.py (PyMuPDF, configurable DPI)
    ↓
Tile Generator           src/tiling/tiler.py      (overlapping tiles)
    ↓
Visual Analysis Engine   src/analyzers/*          (blur, sharpness, contrast,
    ↓                                              entropy, edge density, noise,
    ↓                                              OCR confidence, text density)
PDF Structure Engine     src/analyzers/pdf_structure.py (images, scaling, DPI,
    ↓                                              fonts, objects; pikepdf check)
Evidence Aggregator      src/tools/analyze.py     (detectors + rollup)
    ↓
Statistics Aggregator    src/report/statistics.py (document + claim distributions)
    ↓
JSON / Markdown          src/schemas + src/report (validated data + inline SVG graphs)
    ↓
UiPath Agent

Project layout

visual-forensics-mcp/
├── src/
│   ├── server/      # FastMCP server (analyze_document tool)
│   ├── tools/       # analysis orchestrator (the pipeline)
│   ├── render/      # document loader, conversion, and page rendering
│   ├── tiling/      # overlapping tile generation
│   ├── analyzers/   # deterministic metric computation
│   ├── detectors/   # anomaly detection from metrics
│   ├── schemas/     # pydantic evidence and statistics contracts
│   ├── report/      # factual Markdown report + PDF visual overlays
│   └── utils/       # config, logging, geometry, image ops
├── tests/           # pytest suite (unit + end-to-end)
├── configs/         # default.yaml (all thresholds live here)
├── examples/        # sample docs + example request/response
├── requirements.txt
├── pyproject.toml
└── README.md

Analyzer & detector modules

Analyzers (`src/analyzers/`)	Detectors (`src/detectors/`)
`blur.py` — Laplacian variance	`blur_detector.py` — blur anomaly
`sharpness.py` — Tenengrad / gradient	`dpi_detector.py` — resolution anomaly
`contrast.py` — RMS / Michelson	`font_detector.py` — font mismatch
`entropy.py` — Shannon entropy	`font_outlier_detector.py` — located non-dominant-font text
`edge_density.py` — Canny edge ratio	`stretch_detector.py` — non-uniform scaling
`noise.py` — Immerkaer sigma / residual	`ocr_confidence_detector.py` — low OCR confidence
`ocr.py` — Tesseract confidence	`compression_detector.py` — compression artifact
`font_analysis.py` — font families/sizes/span locations	`structure_detector.py` — raster-in-vector / overlap
`image_scaling.py` — scale/effective DPI
`pdf_structure.py` — images/objects/fonts

Installation

Requires Python 3.11+.

# 1. Create and activate a virtual environment (recommended)
python -m venv .venv
# Windows (PowerShell)
.venv\Scripts\Activate.ps1
# macOS / Linux
source .venv/bin/activate

# 2. Install dependencies
pip install -r requirements.txt

Tesseract OCR (optional but recommended)

OCR-based metrics (ocr_confidence) and the OCR-confidence detector require the Tesseract binary. The Python wheel (pytesseract) only wraps it.

Windows: install the UB Mannheim build, then either add it to PATH or set ocr.tesseract_cmd in configs/default.yaml (or via request options) to e.g. C:\\Program Files\\Tesseract-OCR\\tesseract.exe.
macOS: brew install tesseract
Debian/Ubuntu: sudo apt-get install tesseract-ocr

Without Tesseract, everything else still runs; the response simply includes a warning and summary.ocr_available = false.

Configuration

The canonical tunables and detector thresholds live in configs/default.yaml. Defensive code defaults mirror that file so incomplete custom configurations remain usable.

Key settings:

render:
  dpi: 400                 # default render resolution
tiling:
  tile_size: 512
  tile_overlap: 0.20
features:
  enable_ocr: true
  enable_pdf_analysis: true

Three ways to configure, in increasing precedence:

Edit configs/default.yaml.
Point the server at another file via the VISUAL_FORENSICS_CONFIG environment variable.
Pass an options dict per request (deep-merged over the file), e.g. {"render": {"dpi": 300}, "features": {"enable_ocr": false}}.

Running the MCP server

python -m src.server.app

This starts a FastMCP server on the stdio transport — the standard way MCP clients (including UiPath) launch and communicate with a local server.

A console script is also installed:

visual-forensics-mcp

MCP usage

The server exposes a single primary tool:

analyze_document(document_paths: list[str], options: dict | None = None)

It returns schema version 2.0, claim-level statistics, and a results array. Every result also carries its own statistics. Statistics are computed before optional tile-detail suppression, so output.include_tile_metrics: false does not remove the aggregate measurements.

{
  "schema_version": "2.0",
  "statistics": {
    "document_count": 2,
    "overall_metrics": {},
    "documents": [],
    "methodology": {}
  },
  "results": [
    {
      "document_id": "...",
      "document_name": "invoice.pdf",
      "document_type": "...",
      "summary": {},
      "statistics": {"metrics": {}},
      "page_results": [],
      "document_findings": [],
      "warnings": [],
      "errors": []
    }
  ]
}

For each numeric metric, the statistics contract includes valid/missing/nonfinite counts, mean, median, exact mode when one is meaningful, min/max/range, sample variance and standard deviation, percentiles, IQR, MAD, coefficient of variation, bias-corrected skewness, unbiased excess kurtosis, Tukey fences/outlier counts, and deterministic histogram bins. Claim statistics include both pooled observations and the distribution of per-document means.

Every finding (in results[].page_results[].findings and results[].document_findings) has the required schema:

{
  "type": "...",
  "page": 0,
  "bbox": [0, 0, 0, 0],
  "metrics": {},
  "confidence": 0.0,
  "explanation": "..."
}

bbox values are in rendered pixel coordinates at the page's render DPI. Document-level findings (e.g. font_mismatch) use page: 0 and a zero bbox.

Finding types produced

Type	Trigger (configurable)
`blur_anomaly`	Tile sharpness far below the page distribution
`ocr_confidence_anomaly`	Tile OCR confidence far below page average
`resolution_anomaly`	Embedded image effective DPI far below render DPI
`image_stretch`	Embedded image scaled non-uniformly (`scale_x` ≠ `scale_y`)
`compression_artifact_anomaly`	Tile noise signature unlike its neighbours
`font_mismatch`	Rare fonts / too many font families (document-level)
`font_outlier`	Text set in a font other than the document's dominant font, with an exact bounding box, the font name, and a confidence
`raster_in_vector_anomaly`	Raster region inside an otherwise vector page
`object_overlap_anomaly`	Strongly overlapping / stacked images

UiPath integration

UiPath agents can call MCP tools directly. Configure this server as a local stdio MCP server in your UiPath agent / MCP client configuration:

{
  "mcpServers": {
    "visual-forensics": {
      "command": "C:/path/to/visual-forensics-mcp/.venv/Scripts/python.exe",
      "args": ["-m", "src.server.app"],
      "cwd": "C:/path/to/visual-forensics-mcp",
      "env": {
        // optional: point at a custom config
        "VISUAL_FORENSICS_CONFIG": "C:/path/to/custom.yaml"
      }
    }
  }
}

Typical agent flow:

UiPath downloads / locates the document(s) and resolves local file paths.
The agent calls analyze_document with those paths (and optional options).
The agent reads each result's summary, page_results[].findings, and document_findings, and applies its own business rules / human-in-the-loop logic to decide what to do next.

Because the server is deterministic and offline, the same document always yields the same evidence — ideal for auditable, repeatable RPA workflows.

Example request

{
  "tool": "analyze_document",
  "arguments": {
    "document_paths": ["C:/docs/invoice.pdf", "C:/docs/receipt.pdf"],
    "options": { "render": { "dpi": 200 }, "features": { "enable_ocr": true } }
  }
}

Example response (abridged)

For the bundled examples/sample.pdf (a text page with a stretched, low-resolution raster insert and one rare-font line):

{
  "schema_version": "2.0",
  "statistics": {
    "document_count": 1,
    "overall_metrics": {
      "tile.blur_score": {
        "pooled": {
          "finite_count": 14,
          "mean": 3580.016092857142,
          "median": 2721.7989500000003,
          "modes": [],
          "mode_method": "none_all_values_unique"
        }
      }
    }
  },
  "results": [
    {
      "document_id": "5f34550e05450828",
      "document_name": "sample.pdf",
      "document_type": "pdf",
      "summary": {
        "page_count": 1,
        "tile_count": 20,
        "finding_count": 4,
        "findings_by_type": {
          "resolution_anomaly": 1,
          "image_stretch": 1,
          "raster_in_vector_anomaly": 1,
          "font_outlier": 1
        },
        "ocr_available": false,
        "pdf_structure_available": true
      },
      "page_results": [
        {
          "page": 1,
          "width": 1700,
          "height": 2200,
          "dpi": 200.0,
          "is_vector": true,
          "findings": [
            {
              "type": "image_stretch",
              "page": 1,
              "bbox": [888.89, 833.33, 1444.44, 1166.67],
              "metrics": { "scale_x": 5.0, "scale_y": 3.0, "stretch_ratio": 0.4, "xref": 32 },
              "confidence": 0.4,
              "explanation": "Embedded image is scaled non-uniformly (horizontal and vertical scale factors differ), distorting the image."
            }
          ]
        }
      ],
      "document_findings": [],
      "warnings": ["Tesseract OCR binary not available; OCR metrics and the OCR-confidence detector are disabled for this run."],
      "errors": []
    }
  ]
}

Regenerate the full sample with:

python -m examples.generate_examples

Batch annotation (claim-set folders)

Both CLI tools accept one file, one claim-set folder, or a claims root whose subfolders are claim sets. Each claim set is analyzed as a batch and gets its own output folder so related documents stay together.

Input layout

claims/                          ← pass this path (claims root)
├── claim_001/                   ← one claim set
│   ├── invoice.pdf
│   ├── estimate.pdf
│   └── notes.docx
└── claim_002/
    ├── police_report.pdf
    └── photos_summary.pdf

You can also pass a single claim-set folder (claims/claim_001/) or a lone file.

Supported extensions: .pdf, .docx, .png, .jpg, .jpeg, .tif, .tiff, .bmp, .webp, and .gif. File signatures are checked when loading; unsupported files in a folder are ignored.

Output layout

# Claims root → mirrored claim folders under --out-dir
python annotate_report.py "claims/" --out-dir "results/"
python font_agent.py "claims/" --out-dir "font_results/"

# Single claim set → its evidence bundle goes directly in --out-dir
python annotate_report.py "claims/claim_001/" --out-dir "results/claim_001/"

# Default: writes to a sibling "<input-name>_result/" directory
python annotate_report.py "claims/claim_001/"
# → claims/claim_001_result/

Example after python annotate_report.py claims/ --out-dir results/ (claims root → one output folder per claim set):

results/
├── claim_001/
│   ├── report.md
│   ├── json_results/
│   │   ├── claim_result.json
│   │   ├── invoice - result.json
│   │   ├── estimate - result.json
│   │   └── notes - result.json
│   └── annotated_visuals/
│       ├── invoice - annotated.pdf
│       ├── estimate - annotated.pdf
│       └── notes - annotated.pdf
└── claim_002/
    ├── report.md
    ├── json_results/
    │   ├── claim_result.json
    │   ├── police_report - result.json
    │   └── photos_summary - result.json
    └── annotated_visuals/
        ├── police_report - annotated.pdf
        └── photos_summary - annotated.pdf

report.md is self-contained: overall claim graphs and statistics come first, followed by per-document sections. Its graphs are inline SVG, so each claim bundle has exactly the two subfolders shown above. The report and JSON are measurement-only: they do not assign a score, severity label, intent, or verdict.

font_agent.py uses the same bundle and folder mirroring; its per-document filenames use <stem> - fonts annotated.pdf and <stem> - fonts result.json.

Flag	Meaning
`input_path` (positional)	File, claim-set folder, or claims root
`--out-dir`	Root directory for result bundles
`--out`	Exact PDF path (single file only)
`--result`	Reuse an existing analysis JSON (single file only)

The MCP tool analyze_document accepts multiple paths and returns the same claim/document statistics in JSON (without annotated PDFs). Use the CLIs when you need the on-disk bundle and visual overlays.

Font consistency agent (`font_agent.py`)

A standalone command-line agent that answers one question: where does this document depart from its own font?

python font_agent.py "path/to/document.pdf"
python font_agent.py "path/to/claim_set/" --out-dir "path/to/out/"
python font_agent.py "path/to/claims_root/" --out-dir "path/to/out/"

It determines the document's dominant font (the most common family by text span count) and produces "<stem> - fonts annotated.pdf" in which:

a banner at the top of every page states the dominant font, plus every other font detected with its usage share, detection confidence, and the pages it appears on;
every region whose non-dominant family meets the confidence gate gets a crimson bounding box exactly where it sits on the page, labelled with the font name and measured-deviation confidence. The console output mirrors the annotation — dominant font, every other font, and each boxed region with its page, confidence, pixel bbox, and text snippet:

=== Font Report ===
Dominant font : Calibri  (37.2% of text, 16 spans)
Other fonts   :
  - TimesLTPro-Bold           23.3% of text   not boxed   -
  - Alegreya-Regular           2.3% of text   confidence 0.94   pages 1
Boxed regions : 1
  - page 1  font 'Alegreya-Regular'  conf 0.94  bbox [810, 1755, 1205, 1831] px  text: 'NH-2026-001847'
  ...

Only the font detectors run (visual/OCR/image analysis is switched off), so the agent is fast. The same evidence is available programmatically: the pipeline emits font_outlier findings in page_results[].findings, each carrying bbox, confidence, and metrics.font / metrics.dominant_font. Confidence is deterministic: dominant_spans / (dominant_spans + font_spans), so a font family with very few spans relative to the dominant family scores near 1.0. Thresholds live under detectors.font_outlier in configs/default.yaml; located findings must meet the configurable min_confidence (0.90 by default). Common weight/style names such as Times New Roman, Times New Roman Bold, and Times New Roman Italic are normalised to one family before their usage distribution is measured. The font banner is toggled via report.draw.add_font_banner.

DOCX inputs work too: they are converted to PDF in memory for analysis, and the annotated copy is written from that rendition. Note that PyMuPDF's DOCX conversion may substitute font families, so span-level font locations are exact for PDF inputs but approximate for DOCX (document-level font usage from python-docx still feeds the font_mismatch detector).

The general-purpose annotate_report.py (all finding types) also draws the font banner and the compact font-outlier boxes by default.

Testing

pytest -q

The suite covers PDF/DOCX/raster loading and rendering, multi-frame and EXIF handling, tile generation, analyzer calculations, OCR's graceful unavailable path, image scaling, font normalization and located font findings, descriptive statistics and SVG report generation, exact bundle layout, packaged-config consistency, schema validation, MCP invocation, and complete end-to-end runs.

How metrics are computed (deterministic definitions)

Blur — variance of the Laplacian (lower ⇒ blurrier).
Sharpness — Tenengrad (mean squared Sobel gradient) and mean gradient.
Contrast — RMS (intensity std) and Michelson (max−min)/(max+min).
Entropy — Shannon entropy (bits) of the 256-bin intensity histogram.
Edge density — fraction of Canny edge pixels.
Noise — Immerkaer fast σ estimate and median-residual std.
OCR confidence — mean Tesseract word confidence, normalised to 0–1.
Text density — foreground fraction after adaptive binarisation.
Effective DPI — original_px × 72 / displayed_points per axis.
Scale / stretch — displayed_points / original_px per axis; stretch is |scale_x − scale_y| / max(scale_x, scale_y).

Anomalies are flagged using per-page z-scores and/or absolute floors, all read from configs/default.yaml. Confidence is a monotonic function of the measured deviation; it is supplied as evidence for downstream interpretation.

This server cannot be installed

license - not found

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Related MCP Servers

Word Document Reader MCP
File Systems Documentation Access Research & Data
little2512
F
license
B
quality
D
maintenance
Enables reading and analyzing Word documents with advanced features including table extraction, OCR image analysis, full-text search, and intelligent caching for optimized performance on large documents.
Last updated 2026-02-10
7
MCP PDF
Developer Tools Documentation Access Research & Data
rsp2k
A
license
-
quality
D
maintenance
Enables AI-powered extraction and analysis of PDF documents with 40+ specialized tools for text, tables, images, layout analysis, security assessment, and document intelligence. Supports both text-based and scanned PDFs with OCR capabilities.
Last updated 2026-06-09
9
MIT
ReadPDFx - OCR PDF MCP Server
App Automation Documentation Access Developer Tools
irev
A
license
-
quality
D
maintenance
Provides intelligent OCR and PDF processing capabilities that automatically detect whether PDFs contain digital text or scanned images and apply appropriate extraction methods. Supports text extraction, OCR processing, structure analysis, and batch operations.
Last updated 2025-11-04
MIT
pdf-mcp
AI & Machine Learning Image & Video Processing
saury1120
A
license
-
quality
D
maintenance
Enables PDF document processing including text, image, and table extraction, as well as intelligent classification and similarity analysis across multiple languages.
Last updated 2025-04-07
49
MIT

View all related MCP servers

Related MCP Connectors

EasyAccessPDF
PDF accessibility checks (veraPDF PDF/UA-1), auto-fix and Markdown conversion. EU-hosted.
Kamy
Document API for AI-native software: render PDFs, e-sign, PAdES-seal, and verify.
Amalgix
Cross-model evidence pipeline for financial filings & contracts. x402 pay-per-call.

View all MCP Connectors

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/PranayK07/visual-forensics-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

Visual Document Forensics MCP Server

Guarantees

Architecture

Project layout

Analyzer & detector modules

Installation

Tesseract OCR (optional but recommended)

Configuration

Running the MCP server

MCP usage

Finding types produced

UiPath integration

Example request

Example response (abridged)

Batch annotation (claim-set folders)

Input layout

Output layout

Font consistency agent (font_agent.py)

Testing

How metrics are computed (deterministic definitions)

Maintenance

Resources

Looking for Admin?

Related MCP Servers

Word Document Reader MCP

MCP PDF

ReadPDFx - OCR PDF MCP Server

pdf-mcp

Related MCP Connectors

Latest Blog Posts

MCP directory API

Font consistency agent (`font_agent.py`)