271,551 tools. Last updated 2026-07-08 03:23

"Tools for Extracting Structured Data from PDFs Using OCR" matching MCP tools:

firecrawl_extractA
firecrawl-mcp-server
Extract specific structured data from web pages using LLM-powered extraction. Use a prompt or schema to get details like prices, names, and descriptions.
MIT
firecrawl_extractA
Firecrawl MCP Server
Extract structured data from web pages using LLM capabilities. Define specific information to retrieve with custom prompts and JSON schemas for organized output.
MIT
didit_verify_idA
Didit MCP Server
Verify identity documents by submitting front and optional back images. Get structured OCR data and authenticity checks for fraud prevention.
MIT
crawling_exaB
Exa MCP Server
Extract full text content, metadata, and structured information from specific web URLs for detailed content analysis and data retrieval.
MIT
browserbase_stagehand_extractC
Browserbase MCP Server
Extract structured data or text from web pages using specific instructions to target and retrieve information from the current page.
Apache 2.0
parse_documentC
MCP-Upstage-Server
Extract structured content from PDFs, images, and Office files while preserving original formatting and layout using Upstage AI's document digitization API.
MIT

Matching MCP Servers

Structured-shofficial
Knowledge & Memory Databases
structured-sh
A
license
-
quality
D
maintenance
MCP server providing managed persistent memory for AI agents. Read and write structured state across sessions, tools, and restarts at 1000+ requests per second, with no infrastructure to self-host or operate.
Last updated 2026-04-09
2
Apache 2.0
Structured Data Validator &
Data Platforms Developer Tools
agenson-tools
A
license
-
quality
C
maintenance
Provides AI agents with data validation, transformation, and normalization capabilities, including JSON schema validation, CSV processing, data normalization, text cleaning, and dataset merging.
Last updated 2026-04-02
127
MIT

Matching MCP Connectors

Mirelia-Structured-Data-Marketplace
A fully autonomous, Agent-to-Agent (A2A) patent data marketplace powered by the Model Context Protocol (MCP) and A2A standards. This server provides highly structured, AI-optimized JSON patent datasets curated for autonomous R&D agents, LLMs, and Quants. Currently exclusively hosting AI-ready patents from IPC/CPC Sections G (Physics & Computing) and H (Electricity).
Mirelia-Structured-Data-Marketplace
Autonomous A2A marketplace providing AI-ready, structured USPTO patent JSON datasets. Features IPC/CPC Sections G (Physics/Computing, e.g., G01 Sensors, G06 AI/ML) and H (Electricity, e.g., H01 Semiconductors, H04 5G). Enables instant M2M data delivery via automated on-chain payment verification. Networks: Base (USDC), Polygon (USDC), Oasis (ROSE).

FPD_get_document_content_with_mistral_ocrA
USPTO Final Petition Decisions MCP Server
Extract text from USPTO petition documents using hybrid extraction: free PyPDF2 for text-based PDFs, Mistral OCR for scanned documents. Analyze legal arguments, issues, and patterns in petition decisions.
MIT
ocr_imageA
macos-vision-mcp
Extract text from local images and PDFs with Apple Vision OCR, returning plain text or structured blocks with bounding boxes. Works offline on macOS.
MIT
obsidian_ocr_pdfA
enquire-mcp
Extracts text from scanned or image-only PDFs using Tesseract OCR, returning per-page content and confidence scores. Use when standard PDF reading fails due to missing text layer.
MIT
extract_tablesA
@shuji-bonji/pdf-reader-mcp
Extract tables from tagged PDFs: walks structure tree to pull cell text, outputting as Markdown or JSON for structured data.
MIT
get_note_resourcesA
Joplin MCP Server
Retrieve all resources attached to a note, including OCR text from images and PDFs, to reveal content inside attachments.
MIT
obsidian_read_pdfA
enquire-mcp
Extracts plain text from vault PDFs, returning per-page content, full text, and metadata. Supports page range selection and flags image-only PDFs for OCR.
MIT
parse_contractA
document-to-json-mcp
Extract parties, key dates, financial terms, and essential clauses from contract PDFs into structured JSON.
MIT
ocr_pdfA
ocrmypdf-mcp
Run OCR on scanned PDFs to add a searchable text layer using Tesseract, making the text copyable and searchable.
MIT
firecrawl_extractA
Firecrawl MCP Server
Extract structured data from web pages using LLM capabilities. Define specific information to retrieve with custom prompts and JSON schemas.
MIT
firecrawl_extractA
Firecrawl MCP Server
Extract structured data from web pages using LLM capabilities. Define specific information to retrieve like product details or pricing through custom prompts and schemas.
MIT
extract_documentA
Sats4AI
Extract text from PDFs and images as structured Markdown. Handles complex layouts, tables, handwriting, and math notation. Pay per page with Bitcoin Lightning.
MIT
recognize_textA
yandex-vision-ocr-mcp
Extract text from images (JPEG/PNG) or single-page PDFs using Yandex Vision OCR. Supports printed, handwritten, table, and markdown models with language selection.
MIT
pdf_read_pagesA
pdf-mcp
Read text, images, and tables from specific PDF pages with support for page ranges and OCR for scanned content.
MIT
crw_parse_fileA
crw-mcp
Extract text from a local PDF (base64) and convert to markdown. Works only for text-based PDFs; scanned documents return empty.
AGPL 3.0

"Tools for Extracting Structured Data from PDFs Using OCR" matching MCP tools:

firecrawl_extractA

firecrawl_extractA

didit_verify_idA

crawling_exaB

browserbase_stagehand_extractC

parse_documentC

Matching MCP Servers

Structured-shofficial

Structured Data Validator &

Matching MCP Connectors

FPD_get_document_content_with_mistral_ocrA

ocr_imageA

obsidian_ocr_pdfA

extract_tablesA

get_note_resourcesA

obsidian_read_pdfA

parse_contractA

ocr_pdfA

firecrawl_extractA

firecrawl_extractA

extract_documentA

recognize_textA

pdf_read_pagesA

crw_parse_fileA