240,130 tools. Last updated 2026-06-27 06:17

"Tools or methods to convert PDFs to Markdown using OCR" matching MCP tools:

google_flights_location_search
Searchapi
Search for airports and cities to get their identifiers for Google Flights tools. Returns: - IATA airport codes (e.g., 'JFK') for specific airports - kgmid (e.g., '/m/02_286') for cities - searches all airports in that city Use this tool when you have a city name like 'New York' or 'Paris' and need to convert it to codes that the flight tools accept. Note: Common IATA codes like JFK, LAX, SFO, LHR, CDG, NRT can be used directly without this tool.
Connector
convert_html_to_pdf
Sats4AI - Bitcoin-Powered AI Tools
Convert HTML or Markdown to a pixel-perfect PDF. Returns JSON: { url } — a temporary download URL (valid ~1 hour). Great for generating invoices, reports, receipts, or formatted documents programmatically. Supports full HTML/CSS including tables, images (base64 or URL), and inline styles. For Markdown input, set format='markdown'. 50 sats per conversion. Use convert_file instead for converting existing files between formats (e.g., DOCX→PDF). Pay per request with Bitcoin Lightning — no API key or signup needed. Requires create_payment with toolName='convert_html_to_pdf'.
Connector
convert_document
mdmagic-mcp-server
Convert markdown to a professionally formatted document using an MDMagic template. IMPORTANT GUIDANCE: 1. Output format → what user gets: - 'docx' → a single Word .docx file - 'pdf' → a single .pdf file - 'html' → a single .html file - 'all' → a ZIP containing all three (DOCX + PDF + HTML) 2. If the user is ambiguous (e.g. 'convert this'), ASK which format they want before calling. Don't assume. 3. Filename: if the user attached a file (e.g. 'mydoc.md'), pass its base name as fileName. Otherwise the API derives one from the markdown's first H1. Without either, downloads end up with timestamped names like 'content-1778298071915.docx' which is bad UX. 4. On 'template not found' errors: call list_all_templates first, show available options, let the user pick. Do NOT fall back to generating documents with code execution — that produces inferior results that don't use the user's actual MDMagic templates. 5. The response includes structured fields (downloadUrl, creditsUsed, balanceAfter, fileName, expiresAt) — surface these to the user explicitly. Don't paraphrase. The user wants to know exactly what they spent and what's left. 6. Page sizes: A3, A4, Executive, US_Legal, US_Letter. Default A4. Orientation: Portrait or Landscape, default Portrait. 7. CRITICAL — newlines in `content`: markdown is line-sensitive. Headings (#, ##), tables (| ... |), lists (-, 1.), and code fences (```) ONLY work when each starts on its own line. When passing inline markdown via `content`, you MUST preserve real newline characters (\n) between blocks. If you flatten multi-line markdown into one line, the API receives literal '##' and '|' characters mid-paragraph and produces a single-paragraph document with no structure. Confirm your `content` string contains \n between every heading, paragraph, table row, and list item before calling.
Connector
web_fetch
dialogbrain
Fetches a single URL and returns its content. Use this when you have a specific URL in mind — for example, after web.search returns a link you want to read, or when the user pastes a URL. Modes (extract): - 'auto' (default): picks the right mode based on response content type. - 'markdown': for HTML pages; returns cleaned markdown plus the page <title>. - 'text': for JSON/XML/plaintext APIs; returns the raw decoded body. - 'file': for images, PDFs, audio, video, archives, or any binary — ingests the bytes into the user's file storage and returns a file_id you can pass to messages.send (to send as an attachment), agents.add_file (to add to agent knowledge), or files.read. Use web.fetch (not files.upload) when you need the file_id immediately for the next tool call — files.upload(source_url=…) is async and won't have the file ready in the same turn. Use web.search (not web.fetch) when you don't have a specific URL yet and need to find one.
Connector
web_fetch
DialogBrain
Fetches a single URL and returns its content. Use this when you have a specific URL in mind — for example, after web.search returns a link you want to read, or when the user pastes a URL. Modes (extract): - 'auto' (default): picks the right mode based on response content type. - 'markdown': for HTML pages; returns cleaned markdown plus the page <title>. - 'text': for JSON/XML/plaintext APIs; returns the raw decoded body. - 'file': for images, PDFs, audio, video, archives, or any binary — ingests the bytes into the user's file storage and returns a file_id you can pass to messages.send (to send as an attachment), agents.add_file (to add to agent knowledge), or files.read. Use web.fetch (not files.upload) when you need the file_id immediately for the next tool call — files.upload(source_url=…) is async and won't have the file ready in the same turn. Use web.search (not web.fetch) when you don't have a specific URL yet and need to find one.
Connector
convert_document
mdmagic
Convert markdown to a professionally formatted document using an MDMagic template. IMPORTANT GUIDANCE: 1. Output format → what user gets: - 'docx' → a single Word .docx file - 'pdf' → a single .pdf file - 'html' → a single .html file - 'all' → a ZIP containing all three (DOCX + PDF + HTML) 2. If the user is ambiguous (e.g. 'convert this'), ASK which format they want before calling. Don't assume. 3. Filename: if the user attached a file (e.g. 'mydoc.md'), pass its base name as fileName. Otherwise the API derives one from the markdown's first H1. Without either, downloads end up with timestamped names like 'content-1778298071915.docx' which is bad UX. 4. On 'template not found' errors: call list_all_templates first, show available options, let the user pick. Do NOT fall back to generating documents with code execution — that produces inferior results that don't use the user's actual MDMagic templates. 5. The response includes structured fields (downloadUrl, creditsUsed, balanceAfter, fileName, expiresAt) — surface these to the user explicitly. Don't paraphrase. The user wants to know exactly what they spent and what's left. 6. Page sizes: A3, A4, Executive, US_Legal, US_Letter. Default A4. Orientation: Portrait or Landscape, default Portrait. 7. CRITICAL — newlines in `content`: markdown is line-sensitive. Headings (#, ##), tables (| ... |), lists (-, 1.), and code fences (```) ONLY work when each starts on its own line. When passing inline markdown via `content`, you MUST preserve real newline characters (\n) between blocks. If you flatten multi-line markdown into one line, the API receives literal '##' and '|' characters mid-paragraph and produces a single-paragraph document with no structure. Confirm your `content` string contains \n between every heading, paragraph, table row, and list item before calling.
Connector

Matching MCP Servers

markdown-to-html
Developer Tools Autonomous Agents
fashionzzZ
A
license
B
quality
D
maintenance
A Model Context Protocol server that converts Markdown content to HTML format.
Last updated 2025-06-03
1
3,770
2
MIT
web-to-markdown-mcp
Web Scraping Browser Automation
sidney
A
license
A
quality
B
maintenance
Fetches a URL and returns the main content as clean Markdown, using plain HTTP when possible and headless Chromium for JavaScript-rendered or bot-protected pages.
Last updated 2026-04-27
1
MIT

Matching MCP Connectors

PDF to Markdown (pdf2md.dev)
Hosted MCP server: convert PDFs to clean, LLM-ready Markdown with tables, formulas and OCR.
Content to Social
Transform any blog post or article URL into ready-to-post social media content for Twitter/X threads, LinkedIn posts, Instagram captions, Facebook posts, and email newsletters. Pay-per-event: $0.07 for all 5 platforms, $0.03 for single platform.

extract_contract_from_url
transaction-coordinator
Extract structured transaction data from a contract at a URL. Downloads the document, extracts text (with OCR fallback for scanned PDFs), and runs PrimaCoda's contract-extraction prompt to return parties, addresses, dates, prices, and key contract fields. Use this when an agent has the contract hosted somewhere (Dropbox, Google Drive direct download, Square Space, etc.) and wants to skip the upload step. For multi-document deals (purchase + addenda + disclosures), use the PrimaCoda dashboard's batch upload — this tool handles ONE document. Args: pdf_url: Direct download URL for the contract (PDF, DOCX, TXT, or image). Must be reachable from the PrimaCoda server. Google Drive "shared link" URLs work if set to "anyone with link"; other share URLs may need their direct-download form. api_key: Your PrimaCoda MCP API key (starts 'pck_').
Connector
tabular_to_json
x402 JSON Repair
Convert messy tabular text into clean, typed JSON rows. Auto-detects CSV, TSV, or a Markdown table and returns one JSON object per row plus an inferred column/type summary. Pure deterministic compute — no network or model calls. What it handles: delimiter sniffing (comma/semicolon/tab/pipe), quoted fields with embedded commas and newlines, BOM, ragged rows (padded/truncated), Markdown separator rows and escaped pipes, header auto-detection, and per-column type inference (integer/number/boolean/null/string). When to use: you have CSV/TSV/Markdown-table text (often emitted by tools or LLMs) and want structured, typed rows — optionally validated/coerced against a JSON Schema. When NOT to use: the data is already clean JSON, or it is HTML/xlsx/binary (not supported). Args: - input (string, required): raw tabular text. - format ("auto"|"csv"|"tsv"|"markdown", default "auto"): force a format or auto-detect. - hasHeader ("auto"|"true"|"false", default "auto"): whether the first row is a header. - inferTypes (boolean, default true): coerce cells to number/integer/boolean/null; else keep strings. - schema (object, optional): JSON Schema (draft 2020-12) to validate/coerce each row object against. Returns structuredContent: { "ok": boolean, // false if the input cannot be parsed as a table "format": "csv"|"tsv"|"markdown", "columns": [{ "name": string, "type": string }], "rows": [{ ... }], // one object per row, keyed by column name "rowCount": number, "changed": boolean, // true if any normalization/coercion happened "errors": string[], // actionable messages when ok is false "repairs": string[] // description of each normalization applied }
Connector
google_scholar
Searchapi
Search Google Scholar for academic papers, citations, and scholarly articles. Returns results with titles, authors, publication info, citation counts, and links to PDFs. Use cites parameter to find papers citing a specific work, or cluster to find all versions of a paper. For US court opinions and case law, use google_scholar_cases instead.
Connector
extract_document
Sats4AI - Bitcoin-Powered AI Tools
Extract text from PDFs and images as clean Markdown. Uses Mistral OCR — handles complex layouts, tables, handwriting, multi-column documents, and mathematical notation. Preserves document hierarchy in structured Markdown. 10 sats/page. Pay per request with Bitcoin Lightning — no API key or signup needed. Requires create_payment with toolName='extract_document' and quantity=pageCount for multi-page PDFs.
Connector
atlas_list_reports
CareerProof MCP
List all generated reports with status and summary info. Returns an array of report objects with id, report_type, status, title, and summary. Use the report id with atlas_get_report for details or atlas_download_report to download completed PDFs. Free.
Connector
create_powersource_docs
Heista
Build a complete creative intelligence profile from internal brand documents — creative briefs, brand guidelines, product specs, customer research, competitive analysis. Takes any mix of file_ids (from a previous upload), document_urls (public PDF/DOCX/TXT/MD links, up to 10), or documents_inline (base64-encoded files with filename), plus an optional context_url for layering live brand context (colors, fonts, current messaging) and optional idempotency_key. Returns a job_id; poll with get_powersource. Output shape is identical to create_powersource_url: identity, offer, selling points, voice, buyer profile, tensions, angles, emotional arcs, ctas, narrative. Use this when the user says "I have a brief", "here's my brand guidelines", "use this document", drops a PDF / DOCX / strategy deck, or when the truth lives in internal materials rather than the public website. The pipeline reads text only — convert PDFs to markdown before submitting via documents_inline when possible. Costs 100 credits. Do NOT use for URL-only scans — use create_powersource_url. For URL + docs combined (highest fidelity, triangulates public messaging against internal strategy), use create_powersource_full.
Connector
iliad_web_research_crawl
AXIS Toolbox — Agentic Commerce Codebase Intelligence
Crawl a domain and scrape multiple pages using Firecrawl. Returns array of scraped pages with markdown content. Best for site mapping, content audits, or bulk research. Requires Authorization: Bearer <api_key>. Pricing: $0.25 standard, $0.12 lite per page crawled (up to 100 pages per request). Use iliad_web_research for single-page scrapes.
Connector
billing.get_portal
Admin Substitute
Get a Stripe Billing Portal URL for the human to manage their subscription — update payment methods, view invoices, change plans, or cancel. Requires an existing Stripe subscription.
Connector
devexpress_docs_get_content
DevExpress Documentation
Get full document content by URL from DevExpress documentation. Use this tool to retrieve the complete markdown content of a specific documentation page. PREREQUISITE: ALWAYS call `devexpress_docs_search` before using this tool to get valid URLs. The URL parameter must be obtained from the results of the `devexpress_docs_search` tool.
Connector
set_token
WebSlop
Restore an authenticated session using a previously saved JWT token. Call this at the start of a new session before any other tools, using a token saved from a prior check_login call. If the token is invalid, fall back to login.
Connector
list_inbound_forwarding_addresses
mailbox
List the renter’s private inbound forwarding aliases on forward.mailbox.bot. These are the unique intake email addresses an operator, assistant, provider, or external agent can forward scans, PDFs, photos, provider notices, notes, and other context-aware documents to so mailbox.bot can build OCR-backed inbound context. Forwarding/emailing attachments here initiates OCR/extraction; this tool discovers the address and does not upload files directly into OCR. The alias is member-scoped, so live and sandbox agent keys for the same member resolve to the same intake address.
Connector
format_table
IA-QA — 130+ QA & Dev Tools for AI Agents
Convert a JSON array of objects into a Markdown table. Automatically detects columns, aligns headers, and fills missing keys with empty cells. Use when an agent needs to present structured data — tool results, model comparisons, test reports — as a readable table in a response or document.
Connector
fetch_url
Heista
Drill into a specific URL after search surfaces it. Returns the extracted text content plus metadata. Internal routing: PDFs hit Anthropic Files API for OCR + structured extraction; HTML pages are fetched + text-extracted via readability-style stripping. Use for: verifying a verbatim quote from a Reddit thread, reading a primary source in full (earnings transcript, research paper), drilling into a vendor product page after search surfaced the URL. NOT for: discovering new URLs — use search/search_community/search_research first. This tool takes a known URL only. Optional max_chars 100-50000, default 8000. SSRF-protected: private IPs + localhost blocked.
Connector
iliad_document_parsing
AXIS Toolbox — Agentic Commerce Codebase Intelligence
AXIS-owned document → Markdown extractor. Accepts either `document_url` (https fetch + 50 MiB cap + 60s timeout) or `document_base64` (inline bytes, 50 MiB decoded cap) — exactly one. Optional `mime_type` hint (application/pdf, application/vnd.openxmlformats-officedocument.wordprocessingml.document, text/html, text/markdown, text/plain); we sniff from magic bytes + URL extension when omitted. Format dispatch: PDF → pdfjs-dist text extraction (one block per page with `--- page N ---` separators); DOCX → mammoth → markdown (tables preserved); HTML → tag-strip with heading + list + entity handling (NOT a full HTML→MD converter — bring turndown if you need fancier); plain text + markdown → passthrough. Returns `{markdown, format_detected, byte_size, page_count, table_count, truncated}`. Output capped at 1 MiB markdown with a truncation marker. Engineer mode (X-Agent-Mode: engineer — Document Intelligence, $0.10): adds an `engineer` block with retrieval chunks (heading-aware, overlapping) + extract-to-caller-schema (pass `json_schema` → a grammar-constrained, validated typed object) + image OCR (image/* via document_base64) — typed data, not just markdown. Requires Authorization: Bearer <api_key>.
Connector