206,792 tools. Last updated 2026-06-17 15:41

"High-Quality PDF Extraction to Text with Tokenization and Accurate Processing of Complex Layouts" matching MCP tools:

strale_execute
Strale - 290+ API capabilities for AI agents
Executes a Strale capability by slug and returns the result. Use this when you need to perform any verification, validation, lookup, or data extraction from the 271-capability registry. Call strale_search first to find the right slug and required input fields. Returns a result object with the capability output, quality score (SQS), latency, price charged, and data provenance. Five free capabilities work without an API key (10/day limit). Paid capabilities debit from the wallet — check strale_balance first for high-value calls.
Connector
parse_pdf_to_text
Nordic Financial MCP
Download a PDF from a URL and extract all text content, page by page. Use this to read the full text of a specific document — for example, an annual report PDF linked from a search_filings result. Best combined with search_filings: use search_filings to locate the document, then parse_pdf_to_text for the full text. Do not use for PDFs that are already well-represented in the database — search_filings is faster and returns pre-ranked, relevant excerpts. Not suitable for scanned (image-only) PDFs without embedded text; those pages will be returned as "(no extractable text)". Args: pdf_url: Direct HTTPS URL to the PDF file, e.g. https://example.com/report.pdf. Must be publicly accessible; authentication-protected URLs will fail. Returns: All text from the PDF with "--- Page N ---" separators between pages. Returns an error string if the download fails, the URL does not point to a valid PDF, or the document exceeds the 60-second download timeout.
Connector
read_datasheet
sheetsdata-mcp
Read from a component's datasheet. Two modes: **Section mode** (default): Returns a named section. Start with section='summary' to get an overview and a list of available_sections. Then request specific sections by name. Section names are dynamic — any heading in the actual datasheet works (e.g. 'register_map', 'i2c_interface', 'power_management'). If a section name isn't found, automatically falls back to search mode. **Search mode**: Semantic search within the part's datasheet. Best for targeted questions (register bit fields, I2C config, specific specs). Use when you need to find specific information rather than a whole section. First call for a new part triggers extraction (30s-2min). Subsequent calls are cached. **Datasheet vs Reference Manual**: Manufacturer datasheets cover high-level specs, pinout, absolute maximum ratings, and package info. For microcontrollers (STM32, nRF52, RP2040), register-level programming details (I2C CR1/CR2, DMA config, interrupt bits) are in a separate Reference Manual, not the datasheet. The summary's available_sections will show what's actually present. The part_number must be a specific manufacturer part number (e.g. 'TPS54302', 'STM32F446RCT6') or LCSC number (e.g. 'C2837938'). Do NOT pass bare component values ('100nF', '10K'), descriptions, or reference designators. DATASHEET STATUS VALUES: - 'ready' — extracted and indexed; call read_datasheet, search_datasheets, or analyze_image. - 'extracting' / 'in_progress' / 'queued' / 'pending' — extraction running or scheduled. Poll check_extraction_status every 5-10s until 'ready' or 'failed'. Typical time: 30s-2min. - 'not_extracted' — known part but datasheet hasn't been fetched yet. Trigger it via prefetch_datasheets (cheapest) or by calling read_datasheet (auto-triggers on first read). - 'no_source' — we couldn't find a public datasheet URL for this MPN. First, retry prefetch_datasheets in 10-30s (the URL resolver re-runs and often finds a source on the second pass). If still 'no_source', the agent can upload the PDF manually via request_datasheet_upload + confirm_datasheet_upload (see those tools). Org-uploaded datasheets are private to the org. - 'unsupported' — PDF exists but can't be extracted (scanned image-only, encrypted, or corrupted). Upload a clean text-based PDF via request_datasheet_upload to override. - 'failed' / 'error' — extraction errored. The response includes the error reason. Retry via prefetch_datasheets or escalate to support. - 'rejected' — input wasn't a real MPN (bare value like '100nF', description, or reference designator). Fix the input and re-call. - 'deduplicated' — another part in the family already has this datasheet; same content is returned under the primary MPN.
Connector
search_parts
sheetsdata-mcp
Search for electronic components by part number, description, or keyword. Start here — this is the best entry point for finding components. Queries all configured providers in parallel. Results are merged by MPN with indicative pricing and stock from each source. Each result includes datasheet_status so you know which parts have datasheets available for read_datasheet. Best with specific part numbers or keywords (e.g. 'STM32F103', 'buck converter 3A'). For spec-based discovery in natural language, use search_datasheets instead. When the calling org has a private parts library, matching org-uploaded parts are appended to the results with source='private_library' and any tags the team has applied — including private parts whose MPN, manufacturer, description, type, category, or tag matches the query. DATASHEET STATUS VALUES: - 'ready' — extracted and indexed; call read_datasheet, search_datasheets, or analyze_image. - 'extracting' / 'in_progress' / 'queued' / 'pending' — extraction running or scheduled. Poll check_extraction_status every 5-10s until 'ready' or 'failed'. Typical time: 30s-2min. - 'not_extracted' — known part but datasheet hasn't been fetched yet. Trigger it via prefetch_datasheets (cheapest) or by calling read_datasheet (auto-triggers on first read). - 'no_source' — we couldn't find a public datasheet URL for this MPN. First, retry prefetch_datasheets in 10-30s (the URL resolver re-runs and often finds a source on the second pass). If still 'no_source', the agent can upload the PDF manually via request_datasheet_upload + confirm_datasheet_upload (see those tools). Org-uploaded datasheets are private to the org. - 'unsupported' — PDF exists but can't be extracted (scanned image-only, encrypted, or corrupted). Upload a clean text-based PDF via request_datasheet_upload to override. - 'failed' / 'error' — extraction errored. The response includes the error reason. Retry via prefetch_datasheets or escalate to support. - 'rejected' — input wasn't a real MPN (bare value like '100nF', description, or reference designator). Fix the input and re-call. - 'deduplicated' — another part in the family already has this datasheet; same content is returned under the primary MPN.
Connector
generate_video
Sats4AI - Bitcoin-Powered AI Tools
Generate a video from a text prompt. Uses Kling v3 — cinematic quality, consistent motion, physics-aware rendering. Standard and pro quality modes with optional AI-generated audio track. Async — returns requestId, poll with check_job_status. Pricing: standard 300-400 sats/sec, pro 450-550 sats/sec (audio adds 100 sats/sec). Duration 3-15 seconds. Pay per request with Bitcoin Lightning — no API key or signup needed. Requires create_payment with toolName='generate_video' and duration, mode, generate_audio params.
Connector
check_extraction_status
sheetsdata-mcp
Check the extraction status of one or more parts. Free. Each entry includes the current extraction step, elapsed seconds, and document ID. Use after prefetch_datasheets or after read_datasheet triggers a new extraction. Recommended polling cadence: every 5-10 seconds. Extraction typically takes 30s-2min for new parts, so polling faster than every 5s wastes calls. Stop polling once status is 'ready', 'failed', 'no_source', or 'unsupported'. DATASHEET STATUS VALUES: - 'ready' — extracted and indexed; call read_datasheet, search_datasheets, or analyze_image. - 'extracting' / 'in_progress' / 'queued' / 'pending' — extraction running or scheduled. Poll check_extraction_status every 5-10s until 'ready' or 'failed'. Typical time: 30s-2min. - 'not_extracted' — known part but datasheet hasn't been fetched yet. Trigger it via prefetch_datasheets (cheapest) or by calling read_datasheet (auto-triggers on first read). - 'no_source' — we couldn't find a public datasheet URL for this MPN. First, retry prefetch_datasheets in 10-30s (the URL resolver re-runs and often finds a source on the second pass). If still 'no_source', the agent can upload the PDF manually via request_datasheet_upload + confirm_datasheet_upload (see those tools). Org-uploaded datasheets are private to the org. - 'unsupported' — PDF exists but can't be extracted (scanned image-only, encrypted, or corrupted). Upload a clean text-based PDF via request_datasheet_upload to override. - 'failed' / 'error' — extraction errored. The response includes the error reason. Retry via prefetch_datasheets or escalate to support. - 'rejected' — input wasn't a real MPN (bare value like '100nF', description, or reference designator). Fix the input and re-call. - 'deduplicated' — another part in the family already has this datasheet; same content is returned under the primary MPN.
Connector

Matching MCP Servers

PDF to Text MCP Server
xxx87
-
license
-
quality
-
maintenance
Converts PDF files to text for use with MCP-compatible applications like Cursor IDE.
Last updated 2025-09-19
PDF Extraction MCP Server
File Systems Text Summarization
xraywu
F
license
C
quality
D
maintenance
An MCP server that provides a tool to extract text content from local PDF files, supporting both standard PDF reading and OCR capabilities with optional page selection.
Last updated 2025-05-31
1
30

Matching MCP Connectors

Complex Portal
EBI Complex Portal MCP.
Quality QR
Create and manage trackable QR codes with scan tracking, analytics, and dynamic URL updates.

render_mermaid
Diagrams MCP
Render a Mermaid diagram definition and return the image with metadata. The definition should be valid Mermaid syntax (e.g. flowchart, sequence, class, ER, state, or Gantt diagram). Returns a list of content blocks: the rendered image plus a JSON text block with metadata including a mermaid.live edit link for opening the diagram in a browser editor. Args: definition: Mermaid diagram definition text. filename: Output filename without extension. format: Output format — ``"png"`` (default), ``"svg"``, or ``"pdf"``. download_link: If True, return a temporary download URL path (/images/{token}) that expires after 15 minutes; if False, return inline image bytes. Defaults to True (URL) — set ``DIAGRAMS_INLINE_DEFAULT=true`` on the server to flip the default. SVG/PDF and PNGs larger than the inline limit always use a download link.
Connector
x711_hive_write
x711 — Universal Agent Gas Station
Contribute knowledge to The Hive — x711's collective agent memory. Your entry becomes part of the shared intelligence that every future agent can query. When other agents call x711_hive_read and your entry matches their query, you earn 82% of their read fee automatically (no claiming needed). High-quality entries earn recurring passive income. Minimum 8 chars, max 8000. Returns: { written: true, id, namespace, earn_note }.
Connector
extract_contract_from_url
transaction-coordinator
Extract structured transaction data from a contract at a URL. Downloads the document, extracts text (with OCR fallback for scanned PDFs), and runs PrimaCoda's contract-extraction prompt to return parties, addresses, dates, prices, and key contract fields. Use this when an agent has the contract hosted somewhere (Dropbox, Google Drive direct download, Square Space, etc.) and wants to skip the upload step. For multi-document deals (purchase + addenda + disclosures), use the PrimaCoda dashboard's batch upload — this tool handles ONE document. Args: pdf_url: Direct download URL for the contract (PDF, DOCX, TXT, or image). Must be reachable from the PrimaCoda server. Google Drive "shared link" URLs work if set to "anyone with link"; other share URLs may need their direct-download form. api_key: Your PrimaCoda MCP API key (starts 'pck_').
Connector
files_read
dialogbrain
Read **text content** of an attached file. Works for: .txt, .md, .json, code files, and PDFs (after files.ingest extracts text). DO NOT call on binary files — for IMAGES use `files.get_base64`, for AUDIO/VIDEO it cannot be transcribed via this tool, and for non-PDF DOCUMENTS run `files.ingest` first, THEN files.read. Calling on a binary mime-type returns an error — saves you a turn to read the routing hint before deciding.
Connector
files_read
DialogBrain
Read **text content** of an attached file. Works for: .txt, .md, .json, code files, and PDFs (after files.ingest extracts text). DO NOT call on binary files — for IMAGES use `files.get_base64`, for AUDIO/VIDEO it cannot be transcribed via this tool, and for non-PDF DOCUMENTS run `files.ingest` first, THEN files.read. Calling on a binary mime-type returns an error — saves you a turn to read the routing hint before deciding.
Connector
extract_document
Sats4AI - Bitcoin-Powered AI Tools
Extract text from PDFs and images as clean Markdown. Uses Mistral OCR — handles complex layouts, tables, handwriting, multi-column documents, and mathematical notation. Preserves document hierarchy in structured Markdown. 10 sats/page. Pay per request with Bitcoin Lightning — no API key or signup needed. Requires create_payment with toolName='extract_document' and quantity=pageCount for multi-page PDFs.
Connector
json-extract
The Stall
Extracts and parses JSON from mixed-content text. Handles LLM output with JSON embedded in prose, code fences (```json), trailing commas, single-quoted strings, JS-style comments, and bare object keys (JSON5-style). Returns the parsed data, a cleaned JSON string, extraction method used, and any repair applied. Pure text processing — zero external API calls.
Connector
check_claims
DocImprint
Verify a list of factual claims against document text. Uses a quality AI model. Typical workflow: call extract_text or extract_url first to get the text, then pass it here. Returns: { claims: [{ claim, status: "supported"|"contradicted"|"not_found", evidence: { quote, paragraphs[] }, confidence: "high"|"medium"|"low" }], truncated: boolean }
Connector
prefetch_datasheets
sheetsdata-mcp
Trigger background datasheet extraction for multiple parts at once (up to 20). Non-blocking — returns immediately with the status of each part. Use this to warm up datasheets for a BOM before calling read_datasheet. Example: prefetch_datasheets(['TPS54302', 'ADS1115', 'LP5907']) If a part comes back 'no_source' on the first call, retry prefetch for that MPN once after 10-30s — the URL resolver is retriable and often finds a source on the second pass. If still 'no_source', use request_datasheet_upload + confirm_datasheet_upload to attach your own PDF (org-private). Part numbers must be specific MPNs (e.g. 'STM32F446RCT6', 'TPS54302DDCR') or LCSC numbers (e.g. 'C2837938'). Do NOT pass bare values ('100nF', '10K'), descriptions, BOM reference designators, test points, or board/module names — see the server instructions for the full rule set. When a BOM has values-only rows, use search_parts first to resolve each to an MPN. DATASHEET STATUS VALUES: - 'ready' — extracted and indexed; call read_datasheet, search_datasheets, or analyze_image. - 'extracting' / 'in_progress' / 'queued' / 'pending' — extraction running or scheduled. Poll check_extraction_status every 5-10s until 'ready' or 'failed'. Typical time: 30s-2min. - 'not_extracted' — known part but datasheet hasn't been fetched yet. Trigger it via prefetch_datasheets (cheapest) or by calling read_datasheet (auto-triggers on first read). - 'no_source' — we couldn't find a public datasheet URL for this MPN. First, retry prefetch_datasheets in 10-30s (the URL resolver re-runs and often finds a source on the second pass). If still 'no_source', the agent can upload the PDF manually via request_datasheet_upload + confirm_datasheet_upload (see those tools). Org-uploaded datasheets are private to the org. - 'unsupported' — PDF exists but can't be extracted (scanned image-only, encrypted, or corrupted). Upload a clean text-based PDF via request_datasheet_upload to override. - 'failed' / 'error' — extraction errored. The response includes the error reason. Retry via prefetch_datasheets or escalate to support. - 'rejected' — input wasn't a real MPN (bare value like '100nF', description, or reference designator). Fix the input and re-call. - 'deduplicated' — another part in the family already has this datasheet; same content is returned under the primary MPN.
Connector
request_datasheet_upload
sheetsdata-mcp
Request a signed URL to upload a datasheet PDF for a component whose datasheet we don't have. Use this when search_parts / get_part_details / prefetch_datasheets return datasheet_status='no_source' (and a retry didn't help) or 'unsupported'. Free — the upload fee is only charged on confirm_datasheet_upload after we validate the file. Flow (3 steps): 1. Call request_datasheet_upload with the MPN, the file's SHA-256, and its byte size. You get back an upload_url, upload_method ('PUT'), upload_headers, and an opaque upload_token. 2. Upload the PDF directly to the returned URL with curl: `curl -X PUT -H 'Content-Type: application/pdf' --data-binary @file.pdf "$UPLOAD_URL"` (add any headers from upload_headers). 3. Call confirm_datasheet_upload with the upload_token. Server verifies the bytes, re-hashes, checks for the MPN on the first page, charges the upload fee (50¢), and queues extraction. Returns document_id + status='pending'. Validation rules (checked at confirm time, refunded on failure): - File must be a valid PDF (magic bytes + parseable). - Actual SHA-256 must match expected_sha256. - Actual byte size must match size_bytes (±0). - MPN or its core stem must appear in the first page text (catches wrong-file uploads). Scanned image-only PDFs will fail this check — upload a text-based PDF. - Max 50MB per file. No dev-kit manuals / BOB schematics / app-notes as datasheets — use the matching MPN's actual datasheet. Uploaded datasheets are scoped to your organization (private). They satisfy read_datasheet, search_datasheets, check_design_fit, and analyze_image for your org's tokens only. Tokens expire after 15 minutes. If upload fails or times out, just call request_datasheet_upload again.
Connector
sign_pdf
DocWand
Sign a PDF: opens an interactive widget where the user draws, types or uploads a signature and places it on the document. Optionally pass signature_name to pre-render a handwritten-style signature. ALWAYS use this for PDF signing requests — never sign or modify the PDF yourself; the user reviews and downloads in the widget. All processing happens locally in the user's browser — the file is never uploaded. Podpisz PDF: narysuj, wpisz lub wgraj podpis i umieść go na dokumencie; plik nie opuszcza przeglądarki.
Connector
extract_pdf
paper-mcp
Extract a PDF to clean Markdown/LaTeX text via MinerU (great for papers behind no open-access full text — give the user's PDF and get readable text back). Provide pdf_url (downloaded server-side, SSRF-guarded) OR pdf_base64. formula/table toggle math/table reconstruction. Returns {task_id, status, cached, content, chars}: a recently-seen (cached) or small PDF comes back with `content` in one call; a fresh PDF (MinerU is GPU-heavy, minutes) returns status='running' + a task_id — then call extract_pdf_result(task_id) to fetch the text.
Connector
render_html_to_pdf
pdfzen
Prepare a paid PDF render from arbitrary Handlebars-flavoured HTML. Use only when no starter fits (one-off layouts, custom branding). Prefer render_template_to_pdf when a starter matches. Validates your HTML and returns the exact, ready-to-execute HTTP request to run against pdfzen's render endpoint — POST /v402/render/pdf (x402, $0.006 USDC on Base, no API key) or POST /v1/render/pdf (pdfzen API key). pdfzen renders are executed over HTTP, not streamed in-band over MCP; this tool is the bridge.
Connector
get_analysis_status
hypathesis
Check the processing status of an uploaded paper. Poll this tool after uploading a PDF until status is 'Ready' before calling get_variable_relationships. Args: file_id: The file_id returned by the /upload endpoint. authorization: Optional. API key as 'Bearer hk_...' or 'hk_...'. Returns: { "status": "Processing" | "Ready" | "Empty" | "Ineligible" | "Pending", "edges_count": int, "variables_count": int }
Connector