Skip to main content
Glama
136,886 tools. Last updated 2026-05-21 16:31

"Tools for converting Word and PDF documents to Markdown format" matching MCP tools:

  • Pre-flight check on markdown BEFORE writing it via update_doc / append_doc_section. Returns { ok, errors, warnings, parsed } with parsed counts per format type (imageCount, videoCount, mermaidCount, mathCount, svgCount, calloutCount, crossRefCount, mentionCount, embedCount, detailsCount, headingCount, byteSize, nodeCount, depth) plus structured DocGuardError-equivalent errors (cap breaches) and non-blocking warnings (cross-refs that don't resolve, mention ids that don't resolve, oversize sources, cap-approaching counts). NEVER writes anything; pure parse + analysis. Use when iterating on rich-format markdown to catch problems before burning a write. Cross-ref + mention resolution is gated on caller's accessible workspace set, so unresolved tokens surface in warnings.
    Connector
  • Convert markdown to a professionally formatted document using an MDMagic template. IMPORTANT GUIDANCE: 1. Output format → what user gets: - 'docx' → a single Word .docx file - 'pdf' → a single .pdf file - 'html' → a single .html file - 'all' → a ZIP containing all three (DOCX + PDF + HTML) 2. If the user is ambiguous (e.g. 'convert this'), ASK which format they want before calling. Don't assume. 3. Filename: if the user attached a file (e.g. 'mydoc.md'), pass its base name as fileName. Otherwise the API derives one from the markdown's first H1. Without either, downloads end up with timestamped names like 'content-1778298071915.docx' which is bad UX. 4. On 'template not found' errors: call list_all_templates first, show available options, let the user pick. Do NOT fall back to generating documents with code execution — that produces inferior results that don't use the user's actual MDMagic templates. 5. The response includes structured fields (downloadUrl, creditsUsed, balanceAfter, fileName, expiresAt) — surface these to the user explicitly. Don't paraphrase. The user wants to know exactly what they spent and what's left. 6. Page sizes: A3, A4, Executive, US_Legal, US_Letter. Default A4. Orientation: Portrait or Landscape, default Portrait. 7. CRITICAL — newlines in `content`: markdown is line-sensitive. Headings (#, ##), tables (| ... |), lists (-, 1.), and code fences (```) ONLY work when each starts on its own line. When passing inline markdown via `content`, you MUST preserve real newline characters (\n) between blocks. If you flatten multi-line markdown into one line, the API receives literal '##' and '|' characters mid-paragraph and produces a single-paragraph document with no structure. Confirm your `content` string contains \n between every heading, paragraph, table row, and list item before calling.
    Connector
  • Convert markdown to a professionally formatted document using an MDMagic template. IMPORTANT GUIDANCE: 1. Output format → what user gets: - 'docx' → a single Word .docx file - 'pdf' → a single .pdf file - 'html' → a single .html file - 'all' → a ZIP containing all three (DOCX + PDF + HTML) 2. If the user is ambiguous (e.g. 'convert this'), ASK which format they want before calling. Don't assume. 3. Filename: if the user attached a file (e.g. 'mydoc.md'), pass its base name as fileName. Otherwise the API derives one from the markdown's first H1. Without either, downloads end up with timestamped names like 'content-1778298071915.docx' which is bad UX. 4. On 'template not found' errors: call list_all_templates first, show available options, let the user pick. Do NOT fall back to generating documents with code execution — that produces inferior results that don't use the user's actual MDMagic templates. 5. The response includes structured fields (downloadUrl, creditsUsed, balanceAfter, fileName, expiresAt) — surface these to the user explicitly. Don't paraphrase. The user wants to know exactly what they spent and what's left. 6. Page sizes: A3, A4, Executive, US_Legal, US_Letter. Default A4. Orientation: Portrait or Landscape, default Portrait. 7. CRITICAL — newlines in `content`: markdown is line-sensitive. Headings (#, ##), tables (| ... |), lists (-, 1.), and code fences (```) ONLY work when each starts on its own line. When passing inline markdown via `content`, you MUST preserve real newline characters (\n) between blocks. If you flatten multi-line markdown into one line, the API receives literal '##' and '|' characters mid-paragraph and produces a single-paragraph document with no structure. Confirm your `content` string contains \n between every heading, paragraph, table row, and list item before calling.
    Connector
  • Transcribe audio or video to text, including per-word timestamps for precise editing. Three-call flow: (1) call with `filename` to receive {job_id, payment_challenge}; (2) pay via MPP, then call with `job_id` + `payment_credential` to receive {upload_url} (presigned PUT, 1h expiry); (3) PUT the bytes, then complete_upload(job_id), then poll get_job_status(job_id). On completion, get_job_status returns presigned download URLs for two files: role `transcript` (SRT) and role `transcript-words` (JSON matching /.well-known/weftly-transcript-v2.schema.json, with segment-level and per-word timestamps). For other formats, pass `format=srt|txt|vtt|json|words` to get_job_status to receive content inline — `txt` and `vtt` are derived from SRT, `json` is v1 (segments only), `words` is v2 (segments + words). Flat price: audio $0.50, video $1.00 — see /.well-known/mpp.json for the authoritative table. Use for podcasts, interviews, meetings, lectures, and especially for creating clips, multicamera edits, or edit-video-from-transcript where word boundaries matter. Retrying any call with `job_id` alone returns current state (idempotent). Failed jobs auto-refund.
    Connector
  • Read **text content** of an attached file. Works for: .txt, .md, .json, code files, and PDFs (after files.ingest extracts text). DO NOT call on binary files — for IMAGES use `files.get_base64`, for AUDIO/VIDEO it cannot be transcribed via this tool, and for non-PDF DOCUMENTS run `files.ingest` first, THEN files.read. Calling on a binary mime-type returns an error — saves you a turn to read the routing hint before deciding.
    Connector
  • Get a complete overview of all senses for a Danish word in a single call. Replaces the common pattern of calling get_word_synsets → get_synset_info per result → get_word_synonyms, collapsing 5-15 HTTP round-trips into one SPARQL query. Only returns synsets where the word is a primary lexical member (i.e. the word itself has a direct sense in the synset), excluding multi-word expressions that merely contain the word as a component. Args: word: The Danish word to look up Returns: List of dicts, one per synset, each containing: - synset_id: Clean synset identifier (e.g. "synset-3047") - label: Human-readable synset label - definition: Synset definition (may be truncated with "…") - ontological_types: List of dnc: type URIs - synonyms: List of co-member lemmas (true synonyms only) - hypernym: Dict with synset_id and label of the immediate broader concept, or null - lexfile: WordNet lexicographer file name (e.g. "noun.animal"), or null if absent Example: overview = get_word_overview("hund") # Returns list of 4 synsets, the first being: # {"synset_id": "synset-3047", # "label": "{hund_1§1; køter_§1; vovhund_§1; vovse_§1}", # "definition": "pattedyr som har god lugtesans ...", # "ontological_types": ["dnc:Animal", "dnc:Object"], # "synonyms": ["køter", "vovhund", "vovse"], # "lexfile": "noun.animal"} # Pass synset_id to get_synset_info() for full JSON-LD data on any result: # full_data = get_synset_info(overview[0]["synset_id"])
    Connector

Matching MCP Servers

Matching MCP Connectors

  • Send transactional pdfs for AI agents via SMTP. Templates included.

  • Transform any blog post or article URL into ready-to-post social media content for Twitter/X threads, LinkedIn posts, Instagram captions, Facebook posts, and email newsletters. Pay-per-event: $0.07 for all 5 platforms, $0.03 for single platform.

  • Scrape content from a single URL with advanced options. This is the most powerful, fastest and most reliable scraper tool, if available you should always default to using this tool for any web scraping needs. **Best for:** Single page content extraction, when you know exactly which page contains the information. **Not recommended for:** Multiple pages (call scrape multiple times or use crawl), unknown page location (use search). **Common mistakes:** Using markdown format when extracting specific data points (use JSON instead). **Other Features:** Use 'branding' format to extract brand identity (colors, fonts, typography, spacing, UI components) for design analysis or style replication. **CRITICAL - Format Selection (you MUST follow this):** When the user asks for SPECIFIC data points, you MUST use JSON format with a schema. Only use markdown when the user needs the ENTIRE page content. **Use JSON format when user asks for:** - Parameters, fields, or specifications (e.g., "get the header parameters", "what are the required fields") - Prices, numbers, or structured data (e.g., "extract the pricing", "get the product details") - API details, endpoints, or technical specs (e.g., "find the authentication endpoint") - Lists of items or properties (e.g., "list the features", "get all the options") - Any specific piece of information from a page **Use markdown format ONLY when:** - User wants to read/summarize an entire article or blog post - User needs to see all content on a page without specific extraction - User explicitly asks for the full page content **Handling JavaScript-rendered pages (SPAs):** If JSON extraction returns empty, minimal, or just navigation content, the page is likely JavaScript-rendered or the content is on a different URL. Try these steps IN ORDER: 1. **Add waitFor parameter:** Set `waitFor: 5000` to `waitFor: 10000` to allow JavaScript to render before extraction 2. **Try a different URL:** If the URL has a hash fragment (#section), try the base URL or look for a direct page URL 3. **Use firecrawl_map to find the correct page:** Large documentation sites or SPAs often spread content across multiple URLs. Use `firecrawl_map` with a `search` parameter to discover the specific page containing your target content, then scrape that URL directly. Example: If scraping "https://docs.example.com/reference" fails to find webhook parameters, use `firecrawl_map` with `{"url": "https://docs.example.com/reference", "search": "webhook"}` to find URLs like "/reference/webhook-events", then scrape that specific page. 4. **Use firecrawl_agent:** As a last resort for heavily dynamic pages where map+scrape still fails, use the agent which can autonomously navigate and research **Usage Example (JSON format - REQUIRED for specific data extraction):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com/api-docs", "formats": ["json"], "jsonOptions": { "prompt": "Extract the header parameters for the authentication endpoint", "schema": { "type": "object", "properties": { "parameters": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "type": { "type": "string" }, "required": { "type": "boolean" }, "description": { "type": "string" } } } } } } } } } ``` **Prefer markdown format by default.** You can read and reason over the full page content directly — no need for an intermediate query step. Use markdown for questions about page content, factual lookups, and any task where you need to understand the page. **Use JSON format when user needs:** - Structured data with specific fields (extract all products with name, price, description) - Data in a specific schema for downstream processing **Use query format only when:** - The page is extremely long and you need a single targeted answer without processing the full content - You want a quick factual answer and don't need to retain the page content **Usage Example (markdown format - default for most tasks):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com/article", "formats": ["markdown"], "onlyMainContent": true } } ``` **Usage Example (branding format - extract brand identity):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com", "formats": ["branding"] } } ``` **Branding format:** Extracts comprehensive brand identity (colors, fonts, typography, spacing, logo, UI components) for design analysis or style replication. **Performance:** Add maxAge parameter for 500% faster scrapes using cached data. **Returns:** JSON structured data, markdown, branding profile, or other formats as specified. **Safe Mode:** Read-only content extraction. Interactive actions (click, write, executeJavascript) are disabled for security.
    Connector
  • Add a document to a deal's data room. Creates the deal if needed. This is the primary way to get documents into Sieve for screening. Upload a pitch deck, financials, or any document -- then call sieve_screen to analyze everything in the data room. Provide company_name to create a new deal (or find existing), or deal_id to add to an existing deal. Provide exactly one content source: file_path (local file), text (raw text/markdown), or url (fetch from URL). Args: title: Document title (e.g. "Pitch Deck Q1 2026"). company_name: Company name -- creates deal if new, finds existing if not. deal_id: Add to an existing deal (from sieve_deals or previous sieve_dataroom_add). website_url: Company website URL (used when creating a new deal). document_type: Type: 'pitch_deck', 'financials', 'legal', or 'other'. file_path: Path to a local file (PDF, DOCX, XLSX). The tool reads and uploads it. text: Raw text or markdown content (alternative to file). url: URL to fetch document from (alternative to file).
    Connector
  • Fetch and convert a Microsoft Learn documentation webpage to markdown format. This tool retrieves the latest complete content of Microsoft documentation webpages including Azure, .NET, Microsoft 365, and other Microsoft technologies. ## When to Use This Tool - When search results provide incomplete information or truncated content - When you need complete step-by-step procedures or tutorials - When you need troubleshooting sections, prerequisites, or detailed explanations - When search results reference a specific page that seems highly relevant - For comprehensive guides that require full context ## Usage Pattern Use this tool AFTER microsoft_docs_search when you identify specific high-value pages that need complete content. The search tool gives you an overview; this tool gives you the complete picture. ## URL Requirements - The URL must be a valid HTML documentation webpage from the microsoft.com domain - Binary files (PDF, DOCX, images, etc.) are not supported ## Output Format markdown with headings, code blocks, tables, and links preserved.
    Connector
  • Extract text from PDFs and images as clean Markdown. Uses Mistral OCR — handles complex layouts, tables, handwriting, multi-column documents, and mathematical notation. Preserves document hierarchy in structured Markdown. 10 sats/page. Pay per request with Bitcoin Lightning — no API key or signup needed. Requires create_payment with toolName='extract_document' and quantity=pageCount for multi-page PDFs.
    Connector
  • Query SEC filings and financial documents from US capital markets and exchanges. This tool searches through 10-K annual reports, 10-Q quarterly reports, 8-K current reports, proxy statements, earnings call transcripts, investor presentations, and other SEC-mandated filings from US companies. Use for questions about US company financials, executive compensation, business operations, or regulatory disclosures. Limited to official SEC filings and related documents only.
    Connector
  • Estimate credit cost for a conversion BEFORE running it. Returns word count, page calculation (300 words/page), and a credit breakdown by format and template type. Use this when the user asks 'how much will this cost?' or when you suspect a conversion might exceed their balance — convert_document refuses to run if credits are insufficient, so estimating first is friendlier.
    Connector
  • Get detailed CV version including structured content, sections, word count, and audience profile. cv_version_id from ceevee_upload_cv or ceevee_list_versions. Use to inspect CV content before running analysis tools. Free.
    Connector
  • Read a workspace's doc (TipTap rich-text) body. Format is negotiable via `format`: `markdown` (default — CommonMark + GFM, ready to feed to an LLM or render in a non-ProseMirror surface), `content` (TipTap JSON, round-trippable into update_doc for structural edits), `text` (plain text, best for search, summarisation, word-count heuristics), or `all` for the legacy three-in-one shape. Default is `markdown` because it's the slice agents need 95% of the time and the JSON form on a long doc can blow past the agent harness's tool-result token cap. Pass `format: "content"` only when you're round-tripping into update_doc for a structural edit. A workspace can hold any combination of doc and table surfaces, one or many of either kind; omit `surface_slug` to read the primary doc surface, or pass it to target a specific doc tab (use `list_surfaces` to enumerate). An unwritten or absent doc returns the requested format empty (markdown="", content={}, text=""); a `surface_slug` that doesn't match any live doc surface 404s.
    Connector
  • Get synsets (word meanings) for a Danish word, returning a sorted list of lexical concepts. DanNet follows the OntoLex-Lemon model where: - Words (ontolex:LexicalEntry) evoke concepts through senses - Synsets (ontolex:LexicalConcept) represent units of meaning - Multiple words can share the same synset (synonyms) - One word can have multiple synsets (polysemy) This function returns all synsets associated with a word, effectively giving you all the different meanings/senses that word can have. Each synset represents a distinct semantic concept with its own definition and semantic relationships. Common patterns in Danish: - Nouns often have multiple senses (e.g., "kage" = cake/lump) - Verbs distinguish motion vs. state (e.g., "løbe" = run/flow) - Check synset's dns:ontologicalType for semantic classification DDO CONNECTION AND SYNSET LABELS: Synset labels are compositions of DDO-derived sense labels, showing all words that express the same meaning. For example: - "{hund_1§1; køter_§1; vovhund_§1; vovse_§1}" = all words meaning "domestic dog" - "{forlygte_§2; babs_§1; bryst_§2; patte_1§1a}" = all words meaning "female breast" Each individual sense label follows DDO structure: - "hund_1§1" = word "hund", entry 1, definition 1 in DDO (ordnet.dk) - "patte_1§1a" = word "patte", entry 1, definition 1, subdefinition a - The § notation connects directly to DDO's definition numbering system This composition reveals the semantic relationships between Danish words and their shared meanings, all traceable back to authoritative DDO lexicographic data. RETURN BEHAVIOR: This function has two possible return modes depending on search results: 1. MULTIPLE RESULTS: Returns List[SearchResult] with basic information for each synset 2. SINGLE RESULT (redirect): Returns full synset data Dict when DanNet automatically redirects to a single synset. This provides immediate access to all semantic relationships, ontological types, sentiment data, and other rich information without requiring a separate get_synset_info() call. The single-result case is equivalent to calling get_synset_info() on the synset, providing the same comprehensive RDF data structure with all semantic relations. Args: query: The Danish word or phrase to search for language: Language for labels and definitions in results (default: "da" for Danish, "en" for English when available) Note: Only Danish words can be searched regardless of this parameter Returns: MULTIPLE RESULTS: List of SearchResult objects with: - word: The lexical form - synset_id: Unique synset identifier (format: synset-NNNNN) - label: Human-readable synset label (e.g., "{kage_1§1}") - definition: Brief semantic definition (may be truncated with "...") SINGLE RESULT: Dict with complete synset data including: - All RDF properties with namespace prefixes (e.g., wn:hypernym) - dns:ontologicalType → semantic types with @set array - dns:sentiment → parsed sentiment (if present) - synset_id → clean identifier for convenience - All semantic relationships and linguistic properties Examples: # Multiple results case results = get_word_synsets("hund") # Returns list of search result dictionaries for all meanings of "hund" # => [{"word": "hund", "synset_id": "synset-3047", ...}, ...] # Single result case (redirect) result = get_word_synsets("svinkeærinde") # Returns complete synset data for unique word # => {'wn:hypernym': 'dn:synset-11677', 'dns:sentiment': {...}, ...}
    Connector
  • Estimate credit cost for a conversion BEFORE running it. Returns word count, page calculation (300 words/page), and a credit breakdown by format and template type. Use this when the user asks 'how much will this cost?' or when you suspect a conversion might exceed their balance — convert_document refuses to run if credits are insufficient, so estimating first is friendlier.
    Connector
  • Return a 15-minute presigned download URL for a report in the requested binary format. `format=md` presigns the cached markdown — instant, no compute. `format=docx` returns a branded Word document with a cover page (logo, title, ticker, tier badge), the report body (abstract, sections, citations table with clickable SEC EDGAR links), and a back page (methodology, sources, disclaimer). The DOCX is cached in R2 alongside the markdown after first build so repeat downloads are instant; pass `force_regenerate: true` to bust the cache (e.g. right after `update_report`). Tier gate mirrors `get_report`: authors always see their own reports; non-authors below the report's required tier get an upgrade prompt.
    Connector
  • Read **text content** of an attached file. Works for: .txt, .md, .json, code files, and PDFs (after files.ingest extracts text). DO NOT call on binary files — for IMAGES use `files.get_base64`, for AUDIO/VIDEO it cannot be transcribed via this tool, and for non-PDF DOCUMENTS run `files.ingest` first, THEN files.read. Calling on a binary mime-type returns an error — saves you a turn to read the routing hint before deciding.
    Connector
  • Get autocomplete suggestions for Danish word prefixes. Useful for discovering Danish vocabulary or finding the correct spelling of words. Returns lemma forms (dictionary forms) of words. Args: prefix: The beginning of a Danish word (minimum 3 characters required) max_results: Maximum number of suggestions to return (default: 10) Returns: Comma-separated string of word completions in alphabetical order Note: Autocomplete requires at least 3 characters to prevent excessive results. Example: suggestions = autocomplete_danish_word("hyg", 5) # Returns: "hygge, hyggelig, hygiejne"
    Connector
  • Fetch a report by id. `format=markdown` returns the rendered body, `format=json` returns the full structured payload (sections + citations + report-type-specific data), `format=preview` returns abstract-only. Sample tier is downgraded to preview regardless of input.
    Connector