305,560 tools. Last updated 2026-07-23 07:10

"Tools for converting Word and PDF documents to Markdown format" matching MCP tools:

convert_html_to_pdf
Sats4AI - Bitcoin-Powered AI Tools
Convert HTML or Markdown to a pixel-perfect PDF. Returns JSON: { url } — a temporary download URL (valid ~1 hour). Great for generating invoices, reports, receipts, or formatted documents programmatically. Supports full HTML/CSS including tables, images (base64 or URL), and inline styles. For Markdown input, set format='markdown'. 50 sats per conversion. Use convert_file instead for converting existing files between formats (e.g., DOCX→PDF). Pay per request with Bitcoin Lightning — no API key or signup needed. Requires create_payment with toolName='convert_html_to_pdf'.
Connector
get_report
Valuein — SEC EDGAR Fundamentals & Smart-Money Data
Fetch the current HEAD of a report by id. `format=markdown` returns the rendered body, `format=json` returns the full structured payload (sections + citations + report-type-specific data), `format=preview` returns abstract-only. Authors see any of their own reports; non-authors only get `preview` of listed reports and need the report's required tier for full bodies. Sample-tier non-authors are downgraded to preview regardless of input. For an archived prior version use `get_report_version`, not this tool.
Connector
norse_search
norse
Look an Old Norse word up on Wiktionary and return its senses plus full declension/conjugation tables — attested content (including the verbs' mediopassive voice), not invented. Any form of the word works; an inflected query is resolved to its lemma automatically via previously cached paradigms and the result notes the resolution. With search_language='eng' the query is an English word instead: the result lists its per-sense Old Norse equivalents (the translations block) plus their expanded entries. Returns Markdown plus the same result as structuredContent matching the declared outputSchema. Results are cached server-side; first-time queries reach the live upstream politely and calls are rate limited — on a rate-limit error, wait a few seconds and retry. Content is from en.wiktionary.org (CC BY-SA 4.0 — attribute and share alike if republished).
Connector
get_doc
Dock
Read a workspace's doc (TipTap rich-text) body. Format is negotiable via `format`: `markdown` (default — CommonMark + GFM, ready to feed to an LLM or render in a non-ProseMirror surface), `content` (TipTap JSON, round-trippable into update_doc for structural edits), `text` (plain text, best for search, summarisation, word-count heuristics), or `all` for the legacy three-in-one shape. Default is `markdown` because it's the slice agents need 95% of the time and the JSON form on a long doc can blow past the agent harness's tool-result token cap. Pass `format: "content"` only when you're round-tripping into update_doc for a structural edit. A workspace can hold any combination of doc and table surfaces, one or many of either kind; omit `surface_slug` to read the primary doc surface, or pass it to target a specific doc tab (use `list_surfaces` to enumerate). An unwritten or absent doc returns the requested format empty (markdown="", content={}, text=""); a `surface_slug` that doesn't match any live doc surface 404s.
Connector
convert_document
mdmagic-mcp-server
Convert markdown to a professionally formatted document using an MDMagic template. IMPORTANT GUIDANCE: 1. Output format → what user gets: - 'docx' → a single Word .docx file - 'pdf' → a single .pdf file - 'html' → a single .html file - 'all' → a ZIP containing all three (DOCX + PDF + HTML) 2. If the user is ambiguous (e.g. 'convert this'), ASK which format they want before calling. Don't assume. 3. Filename: if the user attached a file (e.g. 'mydoc.md'), pass its base name as fileName. Otherwise the API derives one from the markdown's first H1. Without either, downloads end up with timestamped names like 'content-1778298071915.docx' which is bad UX. 4. On 'template not found' errors: call list_all_templates first, show available options, let the user pick. Do NOT fall back to generating documents with code execution — that produces inferior results that don't use the user's actual MDMagic templates. 5. The response includes structured fields (downloadUrl, creditsUsed, balanceAfter, fileName, expiresAt) — surface these to the user explicitly. Don't paraphrase. The user wants to know exactly what they spent and what's left. 6. Page sizes: A3, A4, Executive, US_Legal, US_Letter. Default A4. Orientation: Portrait or Landscape, default Portrait. 7. CRITICAL — newlines in `content`: markdown is line-sensitive. Headings (#, ##), tables (| ... |), lists (-, 1.), and code fences (```) ONLY work when each starts on its own line. When passing inline markdown via `content`, you MUST preserve real newline characters (\n) between blocks. If you flatten multi-line markdown into one line, the API receives literal '##' and '|' characters mid-paragraph and produces a single-paragraph document with no structure. Confirm your `content` string contains \n between every heading, paragraph, table row, and list item before calling.
Connector
classify_document
OpenWarrant — Document Verification Suite
Classify a FINANCIAL document's type and issuing country. Specialised in financial-services documents: payslip, tax_invoice, bank_statement, salary_certificate, payg_summary, receipt. USE THIS WHEN someone shares a document (or a link to one) and asks: what kind of document is this? is this a payslip / invoice / bank statement? route this document. Also use it as the FIRST step before verify_document, so the right checks run. Provide the document ONE way: `url` (a public http(s) link to a PDF or image — fetched server-side, the cheapest call) OR `bytes_b64` (inline base64, plus `filename` for PDF-vs-image routing). Returns `{document_type, country_code, confidence, is_financial_document, evidence, ...}`. HONEST SCOPE: type classification only — NOT an authenticity or fraud judgment (use verify_document for that). Below the confidence threshold it abstains with 'unknown' rather than guessing; non-financial documents classify as 'other'. The document is never stored.
Connector

Matching MCP Servers

to-markdown
Note Taking Multimedia Processing Text Summarization
xincen0725
F
license
-
quality
B
maintenance
MCP server that converts PDF, video, web, and audio inputs into structured Markdown notes with support for checkpointing, batch processing, and Obsidian integration.
Last updated 2026-07-10
markdown-to-html
Developer Tools Autonomous Agents
fashionzzZ
A
license
B
quality
D
maintenance
A Model Context Protocol server that converts Markdown content to HTML format.
Last updated 2025-06-03
1
2,726
2
MIT

Matching MCP Connectors

PDF to Markdown (pdf2md.dev)
Hosted MCP server: convert PDFs to clean, LLM-ready Markdown with tables, formulas and OCR.
webpage-to-pdf
Convert any public webpage to a PDF. Single narrow tool, not a bloated PDF toolkit.

norse_search
hub
Look an Old Norse word up on Wiktionary and return its senses plus full declension/conjugation tables — attested content (including the verbs' mediopassive voice), not invented. Any form of the word works; an inflected query is resolved to its lemma automatically via previously cached paradigms and the result notes the resolution. With search_language='eng' the query is an English word instead: the result lists its per-sense Old Norse equivalents (the translations block) plus their expanded entries. Returns Markdown plus the same result as structuredContent matching the declared outputSchema. Results are cached server-side; first-time queries reach the live upstream politely and calls are rate limited — on a rate-limit error, wait a few seconds and retry. Content is from en.wiktionary.org (CC BY-SA 4.0 — attribute and share alike if republished).
Connector
create_letter
PostAgent
Upload and normalize a FINISHED, ready-to-mail document to PDF. Choose this when the content is final and IDENTICAL for every recipient — including when you mail the same letter to many people (just quote/pay once per recipient with the same documentId). The exact bytes you give are what gets printed. Use create_template instead only when the content must vary per recipient via {{fields}}. Returns a documentId, the stored page count, byte size, and source format. Free; no payment required. Provide the document EXACTLY ONE way: `content` (inline text, for html/markdown/text), `contentBase64` (base64-encoded binary, for pdf/docx/image), or `url` (a publicly reachable URL the server fetches). Supplying none, or more than one, is an error. Maximum upload size is 31457280 bytes (~30 MB); output page size is US Letter. Any `{{...}}` text is printed LITERALLY here — it is NOT treated as a merge field. If you want personalized mail merge across recipients, use `create_template` instead. Reserved address zone: a recipient address block is printed over the top ~3 inches of page 1, so the server reserves that space for you automatically. For text/html/markdown/docx, page-1 content is pushed below the block (content may therefore flow onto an additional page); for pdf and image inputs, a blank first page is prepended. As a result the returned page count — and the selected-provider cost behind the resulting quote — can be higher than your source document (e.g. a single-page PDF is stored as 2 pages). You do NOT need to leave the top of your document blank yourself. See the postagent://formats resource for per-format details.
Connector
tabular_to_json
x402 JSON Repair
Convert messy tabular text into clean, typed JSON rows. Auto-detects CSV, TSV, or a Markdown table and returns one JSON object per row plus an inferred column/type summary. Pure deterministic compute — no network or model calls. What it handles: delimiter sniffing (comma/semicolon/tab/pipe), quoted fields with embedded commas and newlines, BOM, ragged rows (padded/truncated), Markdown separator rows and escaped pipes, header auto-detection, and per-column type inference (integer/number/boolean/null/string). When to use: you have CSV/TSV/Markdown-table text (often emitted by tools or LLMs) and want structured, typed rows — optionally validated/coerced against a JSON Schema. When NOT to use: the data is already clean JSON, or it is HTML/xlsx/binary (not supported). Args: - input (string, required): raw tabular text. - format ("auto"|"csv"|"tsv"|"markdown", default "auto"): force a format or auto-detect. - hasHeader ("auto"|"true"|"false", default "auto"): whether the first row is a header. - inferTypes (boolean, default true): coerce cells to number/integer/boolean/null; else keep strings. - schema (object, optional): JSON Schema (draft 2020-12) to validate/coerce each row object against. Returns structuredContent: { "ok": boolean, // false if the input cannot be parsed as a table "format": "csv"|"tsv"|"markdown", "columns": [{ "name": string, "type": string }], "rows": [{ ... }], // one object per row, keyed by column name "rowCount": number, "changed": boolean, // true if any normalization/coercion happened "errors": string[], // actionable messages when ok is false "repairs": string[] // description of each normalization applied }
Connector
ris_get_document
ris-austria-mcp-server
Fetch one RIS document’s full text or its rendition URLs, with explicit binding status and the amtssigniert authentic PDF surfaced wherever it exists. Address the document exactly one of two ways: document_number plus application (both copied verbatim from a ris_search_* or ris_lookup_citation result), or a document_url from a result’s content_urls. format: markdown (default — the HTML rendition converted to markdown), html (raw HTML rendition), xml (the RIS Nutzdaten XML), or urls_only (no fetch — every rendition URL, including the Authentisch PDF). Format availability varies by application and the tool degrades explicitly, never silently: consolidated law, gazettes, case law, drafts, and most sectoral collections carry full text; district and municipal promulgations and court rules (Bvb, GrA, KmGer) publish only the signed authentic PDF; party-transparency decisions and council minutes (Upts, Mrp) are PDF-only; the 1848–1940 imperial gazettes (BgblAlt) are metadata-only — for these a text-format request returns a format_unavailable notice with the usable URL, not an error. Every result carries binding_status; only authentic (amtssigniert) publications are legally binding. This tool returns content, not fresh metadata — the metadata rides the search/lookup step that produced the document number. When the markdown text overflows the byte budget the tool returns a §/Artikel/Anlage section outline (kind: outline) instead of truncating; re-call with sections:[…] naming outline entries to retrieve just those. Raw html/xml renditions, which carry no such headings, return in full.
Connector
interslavic_transliterate
hub
Transliterate ONE Interslavic word between Latin and Cyrillic. This is bidirectional: it auto-detects the input's script and returns the attested spelling in the OTHER script — Latin input yields Cyrillic, Cyrillic input yields Latin (or pass `target` to force it). Unlike the lossy deromanize tools, this is an EXACT dataset lookup (both spellings are stored per entry): it never guesses — a word not in the dataset returns empty candidates.
Connector
files_read
DialogBrain
Read **text content** of an attached file. Works for: .txt, .md, .json, code files, and PDFs (after files.ingest extracts text). DO NOT call on binary files — for IMAGES use `files.get_base64`, for AUDIO/VIDEO it cannot be transcribed via this tool, and for non-PDF DOCUMENTS run `files.ingest` first, THEN files.read. Calling on a binary mime-type returns an error — saves you a turn to read the routing hint before deciding.
Connector
convert_document
mdmagic
Convert markdown to a professionally formatted document using an MDMagic template. IMPORTANT GUIDANCE: 1. Output format → what user gets: - 'docx' → a single Word .docx file - 'pdf' → a single .pdf file - 'html' → a single .html file - 'all' → a ZIP containing all three (DOCX + PDF + HTML) 2. If the user is ambiguous (e.g. 'convert this'), ASK which format they want before calling. Don't assume. 3. Filename: if the user attached a file (e.g. 'mydoc.md'), pass its base name as fileName. Otherwise the API derives one from the markdown's first H1. Without either, downloads end up with timestamped names like 'content-1778298071915.docx' which is bad UX. 4. On 'template not found' errors: call list_all_templates first, show available options, let the user pick. Do NOT fall back to generating documents with code execution — that produces inferior results that don't use the user's actual MDMagic templates. 5. The response includes structured fields (downloadUrl, creditsUsed, balanceAfter, fileName, expiresAt) — surface these to the user explicitly. Don't paraphrase. The user wants to know exactly what they spent and what's left. 6. Page sizes: A3, A4, Executive, US_Legal, US_Letter. Default A4. Orientation: Portrait or Landscape, default Portrait. 7. CRITICAL — newlines in `content`: markdown is line-sensitive. Headings (#, ##), tables (| ... |), lists (-, 1.), and code fences (```) ONLY work when each starts on its own line. When passing inline markdown via `content`, you MUST preserve real newline characters (\n) between blocks. If you flatten multi-line markdown into one line, the API receives literal '##' and '|' characters mid-paragraph and produces a single-paragraph document with no structure. Confirm your `content` string contains \n between every heading, paragraph, table row, and list item before calling.
Connector
files_read
dialogbrain
Read **text content** of an attached file. Works for: .txt, .md, .json, code files, and PDFs (after files.ingest extracts text). DO NOT call on binary files — for IMAGES use `files.get_base64`, for AUDIO/VIDEO it cannot be transcribed via this tool, and for non-PDF DOCUMENTS run `files.ingest` first, THEN files.read. Calling on a binary mime-type returns an error — saves you a turn to read the routing hint before deciding.
Connector
latin_search
hub
Look a Latin word up on Wiktionary and return its senses plus full declension/conjugation tables — attested content, not invented. Any form of the word works; an inflected query is resolved to its lemma automatically (via previously cached paradigms or the Morpheus analyzer) and the result notes the resolution. With search_language='eng' the query is an English word instead: the result lists its per-sense Latin equivalents (the translations block) plus their expanded entries. Returns Markdown plus the same result as structuredContent matching the declared outputSchema. Results are cached server-side; first-time queries reach the live upstream politely and calls are rate limited — on a rate-limit error, wait a few seconds and retry. Content is from en.wiktionary.org (CC BY-SA 4.0 — attribute and share alike if republished).
Connector
create_letter
PostAgent
Upload and normalize a FINISHED, ready-to-mail document to PDF. Choose this when the content is final and IDENTICAL for every recipient — including when you mail the same letter to many people (just quote/pay once per recipient with the same documentId). The exact bytes you give are what gets printed. Use create_template instead only when the content must vary per recipient via {{fields}}. Returns a documentId, the stored page count, byte size, and source format. Free; no payment required. Provide the document EXACTLY ONE way: `content` (inline text, for html/markdown/text), `contentBase64` (base64-encoded binary, for pdf/docx/image), or `url` (a publicly reachable URL the server fetches). Supplying none, or more than one, is an error. Maximum upload size is 31457280 bytes (~30 MB); output page size is US Letter. Any `{{...}}` text is printed LITERALLY here — it is NOT treated as a merge field. If you want personalized mail merge across recipients, use `create_template` instead. Reserved address zone: a recipient address block is printed over the top ~3 inches of page 1, so the server reserves that space for you automatically. For text/html/markdown/docx, page-1 content is pushed below the block (content may therefore flow onto an additional page); for pdf and image inputs, a blank first page is prepended. As a result the returned page count — and the selected-provider cost behind the resulting quote — can be higher than your source document (e.g. a single-page PDF is stored as 2 pages). You do NOT need to leave the top of your document blank yourself. See the postagent://formats resource for per-format details.
Connector
create_letter
PostAgent — Print and Mail
Upload and normalize a FINISHED, ready-to-mail document to PDF. Choose this when the content is final and IDENTICAL for every recipient — including when you mail the same letter to many people (just quote/pay once per recipient with the same documentId). The exact bytes you give are what gets printed. Use create_template instead only when the content must vary per recipient via {{fields}}. Returns a documentId, the stored page count, byte size, and source format. Free; no payment required. Provide the document EXACTLY ONE way: `content` (inline text, for html/markdown/text), `contentBase64` (base64-encoded binary, for pdf/docx/image), or `url` (a publicly reachable URL the server fetches). Supplying none, or more than one, is an error. Maximum upload size is 31457280 bytes (~30 MB); output page size is US Letter. Any `{{...}}` text is printed LITERALLY here — it is NOT treated as a merge field. If you want personalized mail merge across recipients, use `create_template` instead. Reserved address zone: a recipient address block is printed over the top ~3 inches of page 1, so the server reserves that space for you automatically. For text/html/markdown/docx, page-1 content is pushed below the block (content may therefore flow onto an additional page); for pdf and image inputs, a blank first page is prepended. As a result the returned page count — and the selected-provider cost behind the resulting quote — can be higher than your source document (e.g. a single-page PDF is stored as 2 pages). You do NOT need to leave the top of your document blank yourself. See the postagent://formats resource for per-format details.
Connector
firecrawl_parse
firecrawl-mcp
Parse a file using Firecrawl's /v2/parse endpoint. In local/non-cloud MCP mode, this tool reads filePath from the MCP server filesystem and posts multipart data to the configured self-hosted FIRECRAWL_API_URL, preserving the existing direct-read behavior. In hosted CLOUD_SERVICE mode, this tool is a two-call flow because hosted MCP cannot read your local filesystem: 1. Call with filePath, contentType, parse options, and optional declaredSizeBytes. The hosted server mints a short-lived upload URL and returns a safe local curl PUT command plus nextToolCall. 2. Run the returned curl command locally, then call firecrawl_parse again with uploadRef and the desired parse options. The hosted server calls /v2/parse server-side with your session credential. **Best for:** Extracting content from a local document (PDF, Word, Excel, HTML, etc.); pulling structured data out of a file with JSON format; converting binary documents into markdown for downstream reasoning. **Not recommended for:** Remote URLs (use firecrawl_scrape); multiple files at once (call parse multiple times); documents that require interactive actions, screenshots, or change tracking — those aren't supported by the parse endpoint. **Common mistakes:** In hosted mode, do not pass both filePath and uploadRef. Phase 1 uses filePath only to generate upload instructions; phase 2 uses uploadRef only to parse server-side. **Supported file types:** .html, .htm, .xhtml, .pdf, .docx, .doc, .odt, .rtf, .xlsx, .xls **Unsupported options:** actions, screenshot/branding/changeTracking formats, waitFor > 0, location, mobile, proxy values other than "auto" or "basic". **Privacy:** Set `redactPII: true` to return content with personally identifiable information redacted. **CRITICAL - Format Selection (same rules as firecrawl_scrape):** When the user asks for SPECIFIC data points from a document, you MUST use JSON format with a schema. Only use markdown when the user needs the ENTIRE document content. **Handling PDFs:** Add `"parsers": ["pdf"]` (optionally with `pdfOptions.maxPages`) when parsing a PDF so the PDF engine is invoked explicitly. For very long documents, cap `maxPages` to keep the response within token limits. **Hosted phase 1 example:** ```json { "name": "firecrawl_parse", "arguments": { "filePath": "/absolute/path/to/document.pdf", "contentType": "application/pdf", "formats": ["markdown"], "parsers": ["pdf"], "zeroDataRetention": true } } ``` **Hosted phase 2 example:** ```json { "name": "firecrawl_parse", "arguments": { "uploadRef": "upload-ref-from-phase-1", "formats": ["markdown"], "parsers": ["pdf"], "zeroDataRetention": true } } ``` **Returns:** Phase 1 hosted upload instructions or a parsed document with markdown, html, links, summary, json, or query results depending on the requested formats.
Connector
search
Islam West Africa Collection (IWAC)
Search the Islam West Africa Collection across newspaper articles, Islamic publications, archival documents, academic references, and the authority index (persons/places/organisations/events/subjects). Pass ONE concept or name — e.g. 'Tijaniyya', 'laïcité', 'Sheikh Gumi', 'pèlerinage'. Matching is accent- and case-insensitive; a multi-word query requires every word to appear somewhere in the item, so prefer a single concept per call. Write query strings and concept keywords in French for press/publication/document/index discovery even when the user's report language is not French. Academic references are multilingual, so try French and English title/abstract terms when relevant; metadata/filter labels remain French. Use the French transliteration of Islamic terms (Tabaski not 'Eid al-Adha', charia not 'sharia', Maouloud not 'Mawlid'). Returns {results:[{id,title,url,category}], ranking}; each result's `category` names its subset and the `ranking` field documents the ordering. Pass an id to `fetch` to read the full text. For filtered queries (by country, date, or newspaper) use the search_* tools instead.
Connector
snapforge_signup
SnapForge
Create a free SnapForge account (100 renders/month) with just an email address and get the API key instantly. The key is bound to the current MCP session, so the screenshot/pdf/markdown tools work immediately after signup — no browser needed.
Connector