213,503 tools. Last updated 2026-06-19 18:02

"Tools and resources for PDF and XML document processing" matching MCP tools:

detect_ai_text
OpenWarrant — AI Text Detection
Estimate the PROBABILITY that a document's text was AI-GENERATED (LLM-written prose). USE THIS WHEN someone shares prose — an essay, cover letter, article, review, application, or report (or a link to one) — and asks: did an AI / ChatGPT write this? is this human-written? detect AI text. Provide the document ONE way: `text` (pasted markdown/plain prose), `url` (a public http(s) link to a page or PDF — fetched server-side, the cheapest call), OR `bytes_b64` (a base64 PDF/file, plus `filename` for routing). Returns `{probability, lean, tells, reasoning, applicable}`. HONEST SCOPE: the probability is the model's CONFIDENCE, not a calibrated truth — it can false-flag templated/coached or non-native-English writing. It works on PROSE only: for a form/table/numeric document (payslip, statement) it returns `applicable: false` and abstains, because AI-text detection false-positives badly there — use `verify_document` (the authenticity engine) for those, and `verify_references` to check a doc's citations/claims.
Connector
create_letter
PostAgent — Print and Mail
Upload and normalize a FINISHED, ready-to-mail document to PDF. Choose this when the content is final and IDENTICAL for every recipient — including when you mail the same letter to many people (just quote/pay once per recipient with the same documentId). The exact bytes you give are what gets printed. Use create_template instead only when the content must vary per recipient via {{fields}}. Returns a documentId, the stored page count, byte size, and source format. Free; no payment required. Provide the document EXACTLY ONE way: `content` (inline text, for html/markdown/text), `contentBase64` (base64-encoded binary, for pdf/docx/image), or `url` (a publicly reachable URL the server fetches). Supplying none, or more than one, is an error. Maximum upload size is 31457280 bytes (~30 MB); output page size is US Letter. Any `{{...}}` text is printed LITERALLY here — it is NOT treated as a merge field. If you want personalized mail merge across recipients, use `create_template` instead. Reserved address zone: a recipient address block is printed over the top ~3 inches of page 1, so the server reserves that space for you automatically. For text/html/markdown/docx, page-1 content is pushed below the block (content may therefore flow onto an additional page); for pdf and image inputs, a blank first page is prepended. As a result the returned page count — and the per-page price in the resulting quote — can be one more than your source document (e.g. a single-page PDF is stored as 2 pages). You do NOT need to leave the top of your document blank yourself. See the postagent://formats resource for per-format details.
Connector
merge_pdfs
Sats4AI - Bitcoin-Powered AI Tools
Merge multiple PDF files into a single document. Preserves bookmarks, links, and formatting. Returns JSON: { url } — a temporary download URL (valid ~1 hour). Minimum 2 files, no maximum. Files are concatenated in array order. 100 sats per merge regardless of file count. Use convert_file instead if you need format conversion (e.g., DOCX→PDF). Pay per request with Bitcoin Lightning — no API key, no account needed. Requires create_payment with toolName='merge_pdfs'.
Connector
classify_document
OpenWarrant — Document Verification Suite
Classify a FINANCIAL document's type and issuing country. Specialised in financial-services documents: payslip, tax_invoice, bank_statement, salary_certificate, payg_summary, receipt. USE THIS WHEN someone shares a document (or a link to one) and asks: what kind of document is this? is this a payslip / invoice / bank statement? route this document. Also use it as the FIRST step before verify_document, so the right checks run. Provide the document ONE way: `url` (a public http(s) link to a PDF or image — fetched server-side, the cheapest call) OR `bytes_b64` (inline base64, plus `filename` for PDF-vs-image routing). Returns `{document_type, country_code, confidence, is_financial_document, evidence, ...}`. HONEST SCOPE: type classification only — NOT an authenticity or fraud judgment (use verify_document for that). Below the confidence threshold it abstains with 'unknown' rather than guessing; non-financial documents classify as 'other'. The document is never stored.
Connector
parse_pdf_to_text
Nordic Financial MCP
Download a PDF from a URL and extract all text content, page by page. Use this to read the full text of a specific document — for example, an annual report PDF linked from a search_filings result. Best combined with search_filings: use search_filings to locate the document, then parse_pdf_to_text for the full text. Do not use for PDFs that are already well-represented in the database — search_filings is faster and returns pre-ranked, relevant excerpts. Not suitable for scanned (image-only) PDFs without embedded text; those pages will be returned as "(no extractable text)". Args: pdf_url: Direct HTTPS URL to the PDF file, e.g. https://example.com/report.pdf. Must be publicly accessible; authentication-protected URLs will fail. Returns: All text from the PDF with "--- Page N ---" separators between pages. Returns an error string if the download fails, the URL does not point to a valid PDF, or the document exceeds the 60-second download timeout.
Connector
extract_contract_from_url
transaction-coordinator
Extract structured transaction data from a contract at a URL. Downloads the document, extracts text (with OCR fallback for scanned PDFs), and runs PrimaCoda's contract-extraction prompt to return parties, addresses, dates, prices, and key contract fields. Use this when an agent has the contract hosted somewhere (Dropbox, Google Drive direct download, Square Space, etc.) and wants to skip the upload step. For multi-document deals (purchase + addenda + disclosures), use the PrimaCoda dashboard's batch upload — this tool handles ONE document. Args: pdf_url: Direct download URL for the contract (PDF, DOCX, TXT, or image). Must be reachable from the PrimaCoda server. Google Drive "shared link" URLs work if set to "anyone with link"; other share URLs may need their direct-download form. api_key: Your PrimaCoda MCP API key (starts 'pck_').
Connector

Matching MCP Servers

PDF Tools
File Systems Search Text Summarization
Open-Document-Alliance
A
license
-
quality
A
maintenance
The local PDF workflow for Claude Desktop and MCP hosts: fill, sign, merge, split, extract, and analyze PDFs without sending files to a web app.
Last updated 2026-05-04
5
140
MIT
Usage And Billing MCP Server
Finance
BACH-AI-Tools
A
license
C
quality
D
maintenance
Enables access to Usage and Billing APIs for managing accounts, products, meters, plans, and usage reporting. Supports operations like creating products/plans, reporting usage, and retrieving billing information.
Last updated 2025-12-07
18
MIT

Matching MCP Connectors

pdf-generator
Generate PDFs from Markdown or HTML. Zero-auth, agent-native. Returns base64-encoded PDF.
pdf
Markdown to PDF: headings, bold, code, lists, rules. A4/Letter/Legal. Free 30/hr. MCP + REST.

create_letter
PostAgent
Upload and normalize a FINISHED, ready-to-mail document to PDF. Choose this when the content is final and IDENTICAL for every recipient — including when you mail the same letter to many people (just quote/pay once per recipient with the same documentId). The exact bytes you give are what gets printed. Use create_template instead only when the content must vary per recipient via {{fields}}. Returns a documentId, the stored page count, byte size, and source format. Free; no payment required. Provide the document EXACTLY ONE way: `content` (inline text, for html/markdown/text), `contentBase64` (base64-encoded binary, for pdf/docx/image), or `url` (a publicly reachable URL the server fetches). Supplying none, or more than one, is an error. Maximum upload size is 31457280 bytes (~30 MB); output page size is US Letter. Any `{{...}}` text is printed LITERALLY here — it is NOT treated as a merge field. If you want personalized mail merge across recipients, use `create_template` instead. Reserved address zone: a recipient address block is printed over the top ~3 inches of page 1, so the server reserves that space for you automatically. For text/html/markdown/docx, page-1 content is pushed below the block (content may therefore flow onto an additional page); for pdf and image inputs, a blank first page is prepended. As a result the returned page count — and the per-page price in the resulting quote — can be one more than your source document (e.g. a single-page PDF is stored as 2 pages). You do NOT need to leave the top of your document blank yourself. See the postagent://formats resource for per-format details.
Connector
compile_prompt
flompt
Compile a list of blocks into a Claude-optimized structured XML prompt. Takes the JSON returned by decompose_prompt (or manually crafted blocks) and produces a ready-to-use XML prompt with a token estimate. Args: blocks_json: JSON-stringified list of blocks. Each block: {"type": "role|objective|...", "content": "...", "label": "...", "description": "...", "summary": ""} Returns: The compiled XML prompt with token estimate.
Connector
detect_ai_text
OpenWarrant — Document Verification Suite
Estimate the PROBABILITY that a document's text was AI-GENERATED (LLM-written prose). USE THIS WHEN someone shares prose — an essay, cover letter, article, review, application, or report (or a link to one) — and asks: did an AI / ChatGPT write this? is this human-written? detect AI text. Provide the document ONE way: `text` (pasted markdown/plain prose), `url` (a public http(s) link to a page or PDF — fetched server-side, the cheapest call), OR `bytes_b64` (a base64 PDF/file, plus `filename` for routing). Returns `{probability, lean, tells, reasoning, applicable}`. HONEST SCOPE: the probability is the model's CONFIDENCE, not a calibrated truth — it can false-flag templated/coached or non-native-English writing. It works on PROSE only: for a form/table/numeric document (payslip, statement) it returns `applicable: false` and abstains, because AI-text detection false-positives badly there — use `verify_document` (the authenticity engine) for those, and `verify_references` to check a doc's citations/claims.
Connector
redact_pii
OpenWarrant — Document Verification Suite
Detect and MASK personally identifiable information in a document (PDF or image). USE THIS WHEN you need to know what PII a document contains, or to get a redacted copy before forwarding / logging / passing it to another model. Two layers: a deterministic regex+checksum pass for structured identifiers (emails, payment cards, SSN, PAN, ABN) and a vision model for the unstructured PII — names, addresses, dates of birth, phone numbers, and photo/signature presence. Provide the document ONE way: `url` (a public http(s) link, fetched server-side) or `bytes_b64` (inline base64, plus `filename`). `max_pages` caps how many pages are read (default a few; ceiling 10). Returns `{pii_found, by_type, items[] (type, masked preview, method), redacted_text, has_photo, has_signature}`. Values are MASKED in the response — the raw PII is never returned. DETECTION coverage, not a guarantee: it may miss PII or over-flag, so review before relying on it for compliance. The document is never stored.
Connector
get_broker_info
Synapze — Financial Intermediary MCP
Informations et branding du courtier / Broker branding and identity. Returns: company name, logo URL, brand color (#hex), address, postal code, phone, ORIAS number, website, specialties, and DDA compliance status. ALWAYS call this before generating any document (PDF, PPTX, comparison, advisory note) to brand it with the broker's logo, color, name, address, and ORIAS number.
Connector
create_letter
PostAgent
Upload and normalize a FINISHED, ready-to-mail document to PDF. Choose this when the content is final and IDENTICAL for every recipient — including when you mail the same letter to many people (just quote/pay once per recipient with the same documentId). The exact bytes you give are what gets printed. Use create_template instead only when the content must vary per recipient via {{fields}}. Returns a documentId, the stored page count, byte size, and source format. Free; no payment required. Provide the document EXACTLY ONE way: `content` (inline text, for html/markdown/text), `contentBase64` (base64-encoded binary, for pdf/docx/image), or `url` (a publicly reachable URL the server fetches). Supplying none, or more than one, is an error. Maximum upload size is 31457280 bytes (~30 MB); output page size is US Letter. Any `{{...}}` text is printed LITERALLY here — it is NOT treated as a merge field. If you want personalized mail merge across recipients, use `create_template` instead. Reserved address zone: a recipient address block is printed over the top ~3 inches of page 1, so the server reserves that space for you automatically. For text/html/markdown/docx, page-1 content is pushed below the block (content may therefore flow onto an additional page); for pdf and image inputs, a blank first page is prepended. As a result the returned page count — and the per-page price in the resulting quote — can be one more than your source document (e.g. a single-page PDF is stored as 2 pages). You do NOT need to leave the top of your document blank yourself. See the postagent://formats resource for per-format details.
Connector
get_broker_info
Synapze — Financial Intermediary MCP
Informations et branding du courtier / Broker branding and identity. Returns: company name, logo URL, brand color (#hex), address, postal code, phone, ORIAS number, website, specialties, and DDA compliance status. ALWAYS call this before generating any document (PDF, PPTX, comparison, advisory note) to brand it with the broker's logo, color, name, address, and ORIAS number.
Connector
get_analysis_status
hypathesis
Check the processing status of an uploaded paper. Poll this tool after uploading a PDF until status is 'Ready' before calling get_variable_relationships. Args: file_id: The file_id returned by the /upload endpoint. authorization: Optional. API key as 'Bearer hk_...' or 'hk_...'. Returns: { "status": "Processing" | "Ready" | "Empty" | "Ineligible" | "Pending", "edges_count": int, "variables_count": int }
Connector
extract_fields
OpenWarrant — Document Verification Suite
Extract structured FIELDS from a document (PDF or image) with a vision model. USE THIS WHEN you need specific values OUT of a document — a payslip's gross/net, an invoice's total/ABN, a form's checkboxes, a table's cells — rather than a yes/no about the document. (For "is this genuine?" use verify_document; for "what kind of document is this?" classify_document.) Say WHAT to pull, four ways: - `fields`: an ad-hoc list — names like ["gross_pay","abn"], or objects {"name":..., "type":"text|amount|date|boolean", "description":...}. THE general case: ask for exactly the fields your task needs. Use type "boolean" for a checkbox/tickbox. - `template`: a named preset — "payslip", "tax_invoice", "bank_statement", "receipt". - NEITHER: AUTO — the document is classified and that type's fields are used. - auto on an unrecognised type: schema-free — every labelled field is returned. Provide the document ONE way: `url` (a public http(s) link — fetched server-side, the cheapest call) OR `bytes_b64` (inline base64, plus `filename` for PDF-vs-image routing). `country` is an optional hint; `max_pages` caps how many pages are read (default a few; hard ceiling 10). Returns `{mode, document_type, fields{name:{value,confidence,page}}, not_found, pages_read, page_limit}`. EXTRACTION, not verification — values are what the document SHOWS, not proof it is genuine. A field that isn't clearly present comes back in `not_found` (it abstains rather than guessing). The document is never stored.
Connector
sign_pdf
DocWand
Sign a PDF: opens an interactive widget where the user draws, types or uploads a signature and places it on the document. Optionally pass signature_name to pre-render a handwritten-style signature. ALWAYS use this for PDF signing requests — never sign or modify the PDF yourself; the user reviews and downloads in the widget. All processing happens locally in the user's browser — the file is never uploaded. Podpisz PDF: narysuj, wpisz lub wgraj podpis i umieść go na dokumencie; plik nie opuszcza przeglądarki.
Connector
send_outbound_mail
mailbox
Submit a document for printing and postal mailing by the facility. Supported formats: PDF, DOCX, JPG, PNG, TXT, CSV. The document is stored securely and printed by the facility operator. IMPORTANT: With a production key (sk_agent_), this immediately charges the member's card on file. Use dry_run=true to preview cost before committing, or requires_approval=true to defer until human approval. Sandbox keys (sk_agent_test_) skip billing entirely. Optionally attach the outbound mail to inbound context with inbound_capture_id and postal_mail_thread_id so lineage stays explicit.
Connector
regulations_find_comments
federal-regulations-mcp-server
Fetch public comments on a Federal Register document or a Regulations.gov docket — the unique corpus of what citizens and organizations actually submitted. Provide exactly one targeting parameter: docket_id (all comments in a docket, broadest), document_object_id (comments on one document, from regulations_get_docket), fr_document_number (convenience — resolves the FR number to its Regulations.gov document internally), or comment_id (one comment's full detail and attachments). The list endpoint returns no body text or attachment info — call with comment_id to read a comment's body. When a comment's real content is a PDF/DOCX attachment, the body is a stub and attachmentOnly is true; the attachment download URLs are returned. Requires REGULATIONS_GOV_API_KEY (free at https://api.data.gov/signup/).
Connector
ceevee_get_report
CareerProof MCP
Get report status and metadata (without PDF). Returns status (pending/processing/completed/failed), title, type, inputs, and summary. This is the polling tool for ceevee_generate_report — call every 30 seconds, up to 40 times (20 min max). When status='completed', download PDF with ceevee_download_report(report_id). If status='failed', relay error_message. If still processing after 40 polls, stop and give the user the report_id to check later. Free.
Connector
manage_rag_content
mcp
Manage RAG (Retrieval-Augmented Generation) collections and documents. Collections are named containers for documents that are chunked, embedded, and indexed for semantic search. Actions: Collection actions: - "create_collection": Create a new collection - "list_collections": List all collections in an app - "get_collection": Get details for a specific collection (includes document counts by status) - "delete_collection": Permanently delete a collection and all its documents/embeddings Document actions: - "ingest_document": Add a document (raw text or uploaded file) to be chunked, embedded, and indexed - "list_documents": List all documents in a collection with their status - "get_document_status": Check the processing status of a specific document - "delete_document": Permanently delete a document and its chunks/embeddings Parameters by action: create_collection: { app_id, action: "create_collection", name, description?, access_mode?, chunk_size?, chunk_overlap? } list_collections: { app_id, action: "list_collections" } get_collection: { app_id, action: "get_collection", name } delete_collection: { app_id, action: "delete_collection", name } ingest_document: { app_id, collection, action: "ingest_document", text?, storage_object_id?, filename?, metadata? } list_documents: { app_id, collection, action: "list_documents" } get_document_status: { app_id, collection, action: "get_document_status", document_id } delete_document: { app_id, collection, action: "delete_document", document_id } access_mode options (create_collection): - "private" (default): Only the app owner can query - "shared": All authenticated users can query - "custom": Use RLS policies for fine-grained access Ingestion modes for ingest_document (provide one): 1. Raw text: provide "text" directly 2. File-based: upload via manage_storage (action: "upload_url") first, then provide "storage_object_id" Supported file types: PDF, TXT, Markdown, CSV, HTML, DOCX, XLSX, PPTX. Document statuses: "pending" → "processing" → "ready" (or "failed") Workflow: create_collection → ingest_document → poll get_document_status until "ready" → query with rag_query. Warning: "delete_collection" permanently removes the collection, all documents, and embeddings. Cannot be undone. Warning: "delete_document" permanently removes the document and its embeddings. To replace, delete then re-ingest. Common errors: - RESOURCE_NOT_FOUND: App, collection, or document doesn't exist - VALIDATION_DUPLICATE_NAME: Collection name already exists (create_collection) - VALIDATION_ERROR: Neither text nor storage_object_id provided (ingest_document)
Connector