Skip to main content
Glama
206,391 tools. Last updated 2026-06-17 13:10

"Tools for converting PDF files to Markdown format" matching MCP tools:

  • Fetch the current HEAD of a report by id. `format=markdown` returns the rendered body, `format=json` returns the full structured payload (sections + citations + report-type-specific data), `format=preview` returns abstract-only. Authors see any of their own reports; non-authors only get `preview` of listed reports and need the report's required tier for full bodies. Sample-tier non-authors are downgraded to preview regardless of input. For an archived prior version use `get_report_version`, not this tool.
    Connector
  • Auto-detect geometry file format and extract metadata statistics. Accepts a 3D geometry file via URL or base64 and returns structured metadata: bounding boxes, triangle counts, manifold analysis, point cloud statistics, and more. This is a read-only analysis tool — it does not perform mesh repair, format conversion, or boolean operations. Supported formats: STL, OBJ, PLY, PCD, LAS/LAZ, glTF/GLB. STEP and IGES support is planned. Provide either file_url (preferred for large files) or file_b64 (for files under 200KB). Include filename for format detection if using file_b64. When using file_url, the format is detected from the URL path extension; filename is not required. Files under 150KB are free. Larger files cost $0.02/MB via x402 (USDC on Base) or card via MPP (Stripe; adds $0.35 surcharge). If payment is required, the response includes payment details. Retry with the payment argument containing the payment proof. Privacy policy: https://caliper.fit/privacy
    Connector
  • Convert markdown to a professionally formatted document using an MDMagic template. IMPORTANT GUIDANCE: 1. Output format → what user gets: - 'docx' → a single Word .docx file - 'pdf' → a single .pdf file - 'html' → a single .html file - 'all' → a ZIP containing all three (DOCX + PDF + HTML) 2. If the user is ambiguous (e.g. 'convert this'), ASK which format they want before calling. Don't assume. 3. Filename: if the user attached a file (e.g. 'mydoc.md'), pass its base name as fileName. Otherwise the API derives one from the markdown's first H1. Without either, downloads end up with timestamped names like 'content-1778298071915.docx' which is bad UX. 4. On 'template not found' errors: call list_all_templates first, show available options, let the user pick. Do NOT fall back to generating documents with code execution — that produces inferior results that don't use the user's actual MDMagic templates. 5. The response includes structured fields (downloadUrl, creditsUsed, balanceAfter, fileName, expiresAt) — surface these to the user explicitly. Don't paraphrase. The user wants to know exactly what they spent and what's left. 6. Page sizes: A3, A4, Executive, US_Legal, US_Letter. Default A4. Orientation: Portrait or Landscape, default Portrait. 7. CRITICAL — newlines in `content`: markdown is line-sensitive. Headings (#, ##), tables (| ... |), lists (-, 1.), and code fences (```) ONLY work when each starts on its own line. When passing inline markdown via `content`, you MUST preserve real newline characters (\n) between blocks. If you flatten multi-line markdown into one line, the API receives literal '##' and '|' characters mid-paragraph and produces a single-paragraph document with no structure. Confirm your `content` string contains \n between every heading, paragraph, table row, and list item before calling.
    Connector
  • Download a PDF from a URL and extract all text content, page by page. Use this to read the full text of a specific document — for example, an annual report PDF linked from a search_filings result. Best combined with search_filings: use search_filings to locate the document, then parse_pdf_to_text for the full text. Do not use for PDFs that are already well-represented in the database — search_filings is faster and returns pre-ranked, relevant excerpts. Not suitable for scanned (image-only) PDFs without embedded text; those pages will be returned as "(no extractable text)". Args: pdf_url: Direct HTTPS URL to the PDF file, e.g. https://example.com/report.pdf. Must be publicly accessible; authentication-protected URLs will fail. Returns: All text from the PDF with "--- Page N ---" separators between pages. Returns an error string if the download fails, the URL does not point to a valid PDF, or the document exceeds the 60-second download timeout.
    Connector
  • Merge multiple PDF files into a single document. Preserves bookmarks, links, and formatting. Returns JSON: { url } — a temporary download URL (valid ~1 hour). Minimum 2 files, no maximum. Files are concatenated in array order. 100 sats per merge regardless of file count. Use convert_file instead if you need format conversion (e.g., DOCX→PDF). Pay per request with Bitcoin Lightning — no API key, no account needed. Requires create_payment with toolName='merge_pdfs'.
    Connector
  • List the user's CoreModels projects as [id,name,accessLevel] (see the response "format" field). Use a returned id as graphProjectId for other tools. Pass searchTerm to filter by name (case-insensitive substring). Set includePublicProjects=true to also include public projects. Paged: page is 1-based; increment page up to the returned totalPages to get all results.
    Connector

Matching MCP Servers

Matching MCP Connectors

  • Markdown to PDF: headings, bold, code, lists, rules. A4/Letter/Legal. Free 30/hr. MCP + REST.

  • Generate PDFs from Markdown or HTML. Zero-auth, agent-native. Returns base64-encoded PDF.

  • Convert HTML or Markdown to a pixel-perfect PDF. Returns JSON: { url } — a temporary download URL (valid ~1 hour). Great for generating invoices, reports, receipts, or formatted documents programmatically. Supports full HTML/CSS including tables, images (base64 or URL), and inline styles. For Markdown input, set format='markdown'. 50 sats per conversion. Use convert_file instead for converting existing files between formats (e.g., DOCX→PDF). Pay per request with Bitcoin Lightning — no API key or signup needed. Requires create_payment with toolName='convert_html_to_pdf'.
    Connector
  • Read a workspace's doc (TipTap rich-text) body. Format is negotiable via `format`: `markdown` (default — CommonMark + GFM, ready to feed to an LLM or render in a non-ProseMirror surface), `content` (TipTap JSON, round-trippable into update_doc for structural edits), `text` (plain text, best for search, summarisation, word-count heuristics), or `all` for the legacy three-in-one shape. Default is `markdown` because it's the slice agents need 95% of the time and the JSON form on a long doc can blow past the agent harness's tool-result token cap. Pass `format: "content"` only when you're round-tripping into update_doc for a structural edit. A workspace can hold any combination of doc and table surfaces, one or many of either kind; omit `surface_slug` to read the primary doc surface, or pass it to target a specific doc tab (use `list_surfaces` to enumerate). An unwritten or absent doc returns the requested format empty (markdown="", content={}, text=""); a `surface_slug` that doesn't match any live doc surface 404s.
    Connector
  • Upload and normalize a FINISHED, ready-to-mail document to PDF. Choose this when the content is final and IDENTICAL for every recipient — including when you mail the same letter to many people (just quote/pay once per recipient with the same documentId). The exact bytes you give are what gets printed. Use create_template instead only when the content must vary per recipient via {{fields}}. Returns a documentId, the stored page count, byte size, and source format. Free; no payment required. Provide the document EXACTLY ONE way: `content` (inline text, for html/markdown/text), `contentBase64` (base64-encoded binary, for pdf/docx/image), or `url` (a publicly reachable URL the server fetches). Supplying none, or more than one, is an error. Maximum upload size is 31457280 bytes (~30 MB); output page size is US Letter. Any `{{...}}` text is printed LITERALLY here — it is NOT treated as a merge field. If you want personalized mail merge across recipients, use `create_template` instead. Reserved address zone: a recipient address block is printed over the top ~3 inches of page 1, so the server reserves that space for you automatically. For text/html/markdown/docx, page-1 content is pushed below the block (content may therefore flow onto an additional page); for pdf and image inputs, a blank first page is prepended. As a result the returned page count — and the per-page price in the resulting quote — can be one more than your source document (e.g. a single-page PDF is stored as 2 pages). You do NOT need to leave the top of your document blank yourself. See the postagent://formats resource for per-format details.
    Connector
  • Public — list downloadable doctrine and agent asset artifacts (skill packs, rule packs, MCP setup snippets) the user can drop into their AI coding tool to import the Blueprint as native skill/rule files. Returns a list of assets with name, format (one of: zip / md / markdown / mdc / json / toml / text — the full vocabulary), pack_version, download_url, and platform target (Claude Code, Cursor, Codex, Gemini, Qwen). The response also carries `count` (length of `assets`) for symmetry with principles.list / clusters.list / guides.list. WHEN TO CALL: the user asks how to bring the Blueprint into their coding agent, or wants to install it as a local skill/rule file. WHEN NOT TO CALL: for the live MCP tools themselves — those are already available through this server. For doctrine content, prefer principles.list/get and guides.list/get. BEHAVIOR: read-only, idempotent, no auth required. Asset artefacts are regenerated on every deploy from the canonical doctrine.
    Connector
  • Read **text content** of an attached file. Works for: .txt, .md, .json, code files, and PDFs (after files.ingest extracts text). DO NOT call on binary files — for IMAGES use `files.get_base64`, for AUDIO/VIDEO it cannot be transcribed via this tool, and for non-PDF DOCUMENTS run `files.ingest` first, THEN files.read. Calling on a binary mime-type returns an error — saves you a turn to read the routing hint before deciding.
    Connector
  • Convert messy tabular text into clean, typed JSON rows. Auto-detects CSV, TSV, or a Markdown table and returns one JSON object per row plus an inferred column/type summary. Pure deterministic compute — no network or model calls. What it handles: delimiter sniffing (comma/semicolon/tab/pipe), quoted fields with embedded commas and newlines, BOM, ragged rows (padded/truncated), Markdown separator rows and escaped pipes, header auto-detection, and per-column type inference (integer/number/boolean/null/string). When to use: you have CSV/TSV/Markdown-table text (often emitted by tools or LLMs) and want structured, typed rows — optionally validated/coerced against a JSON Schema. When NOT to use: the data is already clean JSON, or it is HTML/xlsx/binary (not supported). Args: - input (string, required): raw tabular text. - format ("auto"|"csv"|"tsv"|"markdown", default "auto"): force a format or auto-detect. - hasHeader ("auto"|"true"|"false", default "auto"): whether the first row is a header. - inferTypes (boolean, default true): coerce cells to number/integer/boolean/null; else keep strings. - schema (object, optional): JSON Schema (draft 2020-12) to validate/coerce each row object against. Returns structuredContent: { "ok": boolean, // false if the input cannot be parsed as a table "format": "csv"|"tsv"|"markdown", "columns": [{ "name": string, "type": string }], "rows": [{ ... }], // one object per row, keyed by column name "rowCount": number, "changed": boolean, // true if any normalization/coercion happened "errors": string[], // actionable messages when ok is false "repairs": string[] // description of each normalization applied }
    Connector
  • Convert markdown to a professionally formatted document using an MDMagic template. IMPORTANT GUIDANCE: 1. Output format → what user gets: - 'docx' → a single Word .docx file - 'pdf' → a single .pdf file - 'html' → a single .html file - 'all' → a ZIP containing all three (DOCX + PDF + HTML) 2. If the user is ambiguous (e.g. 'convert this'), ASK which format they want before calling. Don't assume. 3. Filename: if the user attached a file (e.g. 'mydoc.md'), pass its base name as fileName. Otherwise the API derives one from the markdown's first H1. Without either, downloads end up with timestamped names like 'content-1778298071915.docx' which is bad UX. 4. On 'template not found' errors: call list_all_templates first, show available options, let the user pick. Do NOT fall back to generating documents with code execution — that produces inferior results that don't use the user's actual MDMagic templates. 5. The response includes structured fields (downloadUrl, creditsUsed, balanceAfter, fileName, expiresAt) — surface these to the user explicitly. Don't paraphrase. The user wants to know exactly what they spent and what's left. 6. Page sizes: A3, A4, Executive, US_Legal, US_Letter. Default A4. Orientation: Portrait or Landscape, default Portrait. 7. CRITICAL — newlines in `content`: markdown is line-sensitive. Headings (#, ##), tables (| ... |), lists (-, 1.), and code fences (```) ONLY work when each starts on its own line. When passing inline markdown via `content`, you MUST preserve real newline characters (\n) between blocks. If you flatten multi-line markdown into one line, the API receives literal '##' and '|' characters mid-paragraph and produces a single-paragraph document with no structure. Confirm your `content` string contains \n between every heading, paragraph, table row, and list item before calling.
    Connector
  • Render a Mermaid diagram definition and return the image with metadata. The definition should be valid Mermaid syntax (e.g. flowchart, sequence, class, ER, state, or Gantt diagram). Returns a list of content blocks: the rendered image plus a JSON text block with metadata including a mermaid.live edit link for opening the diagram in a browser editor. Args: definition: Mermaid diagram definition text. filename: Output filename without extension. format: Output format — ``"png"`` (default), ``"svg"``, or ``"pdf"``. download_link: If True, return a temporary download URL path (/images/{token}) that expires after 15 minutes; if False, return inline image bytes. Defaults to True (URL) — set ``DIAGRAMS_INLINE_DEFAULT=true`` on the server to flip the default. SVG/PDF and PNGs larger than the inline limit always use a download link.
    Connector
  • Generate a production-ready llms.txt file for any URL so AI crawlers (ChatGPT, Claude, Perplexity) can index the site cleanly. Fetches the page, extracts title/description/key links, and emits the standard llms.txt markdown format. Output is a single text blob ready to drop at site-root/llms.txt. Useful for: getting a client's site indexed by AI, drafting llms.txt for your own project, or auditing how an AI crawler would see a competitor.
    Connector
  • Read **text content** of an attached file. Works for: .txt, .md, .json, code files, and PDFs (after files.ingest extracts text). DO NOT call on binary files — for IMAGES use `files.get_base64`, for AUDIO/VIDEO it cannot be transcribed via this tool, and for non-PDF DOCUMENTS run `files.ingest` first, THEN files.read. Calling on a binary mime-type returns an error — saves you a turn to read the routing hint before deciding.
    Connector
  • Lists the **source files** (PDFs, XLS spreadsheets, DOC manuals, ZIP archives) ingested for a SERFF id. Returns metadata only — name, size in bytes, MIME-class type (`pdf` / `spreadsheet` / `document` / `csv` / `archive` / `other`), file extension, modified timestamp. Pair with `get_filing_source_file_link` to mint a signed download link the user can click — list names here, mint a link there. Use this to: - triage a filing whose summary looks thin ("did we even ingest the right files?"), - discover the XLSM rater / rate manual PDF / rating-samples spreadsheet for a filing, - confirm which artefacts a filing actually shipped (e.g. is there a separate rate manual XLS, or just the PDF?). Returns `{ error: ... }` if no source files exist for the SERFF id.
    Connector
  • Fetch and convert a Microsoft Learn documentation webpage to markdown format. This tool retrieves the latest complete content of Microsoft documentation webpages including Azure, .NET, Microsoft 365, and other Microsoft technologies. ## When to Use This Tool - When search results provide incomplete information or truncated content - When you need complete step-by-step procedures or tutorials - When you need troubleshooting sections, prerequisites, or detailed explanations - When search results reference a specific page that seems highly relevant - For comprehensive guides that require full context ## Usage Pattern Use this tool AFTER microsoft_docs_search when you identify specific high-value pages that need complete content. The search tool gives you an overview; this tool gives you the complete picture. ## URL Requirements - The URL must be a valid HTML documentation webpage from the microsoft.com domain - Binary files (PDF, DOCX, images, etc.) are not supported ## Output Format markdown with headings, code blocks, tables, and links preserved.
    Connector
  • Scrape content from a single URL with advanced options. This is the most powerful, fastest and most reliable scraper tool, if available you should always default to using this tool for any web scraping needs. **Best for:** Single page content extraction, when you know exactly which page contains the information. **Not recommended for:** Multiple pages (call scrape multiple times or use crawl), unknown page location (use search). **Common mistakes:** Using markdown format when extracting specific data points (use JSON instead). **Other Features:** Use 'branding' format to extract brand identity (colors, fonts, typography, spacing, UI components) for design analysis or style replication. **CRITICAL - Format Selection (you MUST follow this):** When the user asks for SPECIFIC data points, you MUST use JSON format with a schema. Only use markdown when the user needs the ENTIRE page content. **Use JSON format when user asks for:** - Parameters, fields, or specifications (e.g., "get the header parameters", "what are the required fields") - Prices, numbers, or structured data (e.g., "extract the pricing", "get the product details") - API details, endpoints, or technical specs (e.g., "find the authentication endpoint") - Lists of items or properties (e.g., "list the features", "get all the options") - Any specific piece of information from a page **Use markdown format ONLY when:** - User wants to read/summarize an entire article or blog post - User needs to see all content on a page without specific extraction - User explicitly asks for the full page content **Handling JavaScript-rendered pages (SPAs):** If JSON extraction returns empty, minimal, or just navigation content, the page is likely JavaScript-rendered or the content is on a different URL. Try these steps IN ORDER: 1. **Add waitFor parameter:** Set `waitFor: 5000` to `waitFor: 10000` to allow JavaScript to render before extraction 2. **Try a different URL:** If the URL has a hash fragment (#section), try the base URL or look for a direct page URL 3. **Use firecrawl_map to find the correct page:** Large documentation sites or SPAs often spread content across multiple URLs. Use `firecrawl_map` with a `search` parameter to discover the specific page containing your target content, then scrape that URL directly. Example: If scraping "https://docs.example.com/reference" fails to find webhook parameters, use `firecrawl_map` with `{"url": "https://docs.example.com/reference", "search": "webhook"}` to find URLs like "/reference/webhook-events", then scrape that specific page. 4. **Use firecrawl_agent:** As a last resort for heavily dynamic pages where map+scrape still fails, use the agent which can autonomously navigate and research **Usage Example (JSON format - REQUIRED for specific data extraction):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com/api-docs", "formats": ["json"], "jsonOptions": { "prompt": "Extract the header parameters for the authentication endpoint", "schema": { "type": "object", "properties": { "parameters": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "type": { "type": "string" }, "required": { "type": "boolean" }, "description": { "type": "string" } } } } } } } } } ``` **Prefer markdown format by default.** You can read and reason over the full page content directly — no need for an intermediate query step. Use markdown for questions about page content, factual lookups, and any task where you need to understand the page. **Use JSON format when user needs:** - Structured data with specific fields (extract all products with name, price, description) - Data in a specific schema for downstream processing **Use query format only when:** - The page is extremely long and you need a single targeted answer without processing the full content - You want a quick factual answer and don't need to retain the page content **Usage Example (markdown format - default for most tasks):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com/article", "formats": ["markdown"], "onlyMainContent": true } } ``` **Usage Example (branding format - extract brand identity):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com", "formats": ["branding"] } } ``` **Branding format:** Extracts comprehensive brand identity (colors, fonts, typography, spacing, logo, UI components) for design analysis or style replication. **Performance:** Add maxAge parameter for 500% faster scrapes using cached data. **Returns:** JSON structured data, markdown, branding profile, or other formats as specified. **Safe Mode:** Read-only content extraction. Interactive actions (click, write, executeJavascript) are disabled for security.
    Connector
  • Extract a PDF to clean Markdown/LaTeX text via MinerU (great for papers behind no open-access full text — give the user's PDF and get readable text back). Provide pdf_url (downloaded server-side, SSRF-guarded) OR pdf_base64. formula/table toggle math/table reconstruction. Returns {task_id, status, cached, content, chars}: a recently-seen (cached) or small PDF comes back with `content` in one call; a fresh PDF (MinerU is GPU-heavy, minutes) returns status='running' + a task_id — then call extract_pdf_result(task_id) to fetch the text.
    Connector