Skip to main content
Glama
262,495 tools. Last updated 2026-07-05 19:51

"A tool for reading and extracting content from PDF papers" matching MCP tools:

  • Extract figures, tables, and equations from PDF documents using layout detection. Perfect for extracting visual elements from academic papers on arXiv or any PDF URL. Returns base64-encoded images of detected elements with metadata.
    Connector
  • Convert any PDF document (via URL) to clean Markdown. Returns full markdown text, page count, word count, table count, metadata, and an AI summary. Useful for extracting structured content from whitepapers, reports, contracts, or research papers.
    Connector
  • Download a PDF from a URL and extract all text content, page by page. Use this to read the full text of a specific document — for example, an annual report PDF linked from a search_filings result. Best combined with search_filings: use search_filings to locate the document, then parse_pdf_to_text for the full text. Do not use for PDFs that are already well-represented in the database — search_filings is faster and returns pre-ranked, relevant excerpts. Not suitable for scanned (image-only) PDFs without embedded text; those pages will be returned as "(no extractable text)". Args: pdf_url: Direct HTTPS URL to the PDF file, e.g. https://example.com/report.pdf. Must be publicly accessible; authentication-protected URLs will fail. Returns: All text from the PDF with "--- Page N ---" separators between pages. Returns an error string if the download fails, the URL does not point to a valid PDF, or the document exceeds the 60-second download timeout.
    Connector
  • USE WHEN reading the full content of a Pine Script v6 documentation file. Returns the file content; when limit is set, a header shows the char range and offset to continue reading. AFTER calling this tool, use offset=<end> to continue if the header indicates more content is available. For large files (ta.md, strategy.md, collections.md, drawing.md, general.md), prefer list_sections() + get_section() instead. Data sourced from bundled Pine Script v6 documentation.
    Connector
  • Full profile for a single tax-exempt org by EIN: legal name, address, NTEE classification, 501(c) type, IRS ruling date, and a financial snapshot from the most recent Form 990 filing (revenue, expenses, assets, net assets, and the source PDF link). Use nonprofit_search first if you only have an org name — this tool requires an EIN. Data lags 1–2 years; the tax year is shown prominently. Data from ProPublica Nonprofit Explorer, sourced from IRS Form 990 filings.
    Connector
  • Upload and normalize a FINISHED, ready-to-mail document to PDF. Choose this when the content is final and IDENTICAL for every recipient — including when you mail the same letter to many people (just quote/pay once per recipient with the same documentId). The exact bytes you give are what gets printed. Use create_template instead only when the content must vary per recipient via {{fields}}. Returns a documentId, the stored page count, byte size, and source format. Free; no payment required. Provide the document EXACTLY ONE way: `content` (inline text, for html/markdown/text), `contentBase64` (base64-encoded binary, for pdf/docx/image), or `url` (a publicly reachable URL the server fetches). Supplying none, or more than one, is an error. Maximum upload size is 31457280 bytes (~30 MB); output page size is US Letter. Any `{{...}}` text is printed LITERALLY here — it is NOT treated as a merge field. If you want personalized mail merge across recipients, use `create_template` instead. Reserved address zone: a recipient address block is printed over the top ~3 inches of page 1, so the server reserves that space for you automatically. For text/html/markdown/docx, page-1 content is pushed below the block (content may therefore flow onto an additional page); for pdf and image inputs, a blank first page is prepended. As a result the returned page count — and the selected-provider cost behind the resulting quote — can be higher than your source document (e.g. a single-page PDF is stored as 2 pages). You do NOT need to leave the top of your document blank yourself. See the postagent://formats resource for per-format details.
    Connector

Matching MCP Servers

  • F
    license
    -
    quality
    D
    maintenance
    Enables document conversion between PDF, DOCX, and Markdown formats to facilitate reading and editing complex files in AI tools like Claude Desktop or Cursor. It utilizes marker-pdf and pandoc to provide structured text versions of documents, helping to manage context and support unsupported file types.
    Last updated
    1
  • A
    license
    B
    quality
    A
    maintenance
    MCP bridge for PDF Content Search — full-text PDF search with Apple Vision OCR across thousands of documents in under a second from Claude, Cursor, or any MCP client. Advanced filters (date, category, sender, amount), wildcards, boolean operators. Bridge open-source (MIT), PDF Content Search app is commercial with free iOS+Android companion scanner apps.
    Last updated
    84
    MIT

Matching MCP Connectors

  • The verified hub for conferences and journals. Powered by AI to match your scholarly ambitions with the world's most prestigious academic opportunities.

  • Markdown to PDF: headings, bold, code, lists, rules. A4/Letter/Legal. Free 30/hr. MCP + REST.

  • GET /rooms/:roomID — Get a single room Get a single room's metadata + its latest daily AND weekly AI summaries (when they exist). **Access:** members and subscribers of the room, plus any DCer for browsable public channels/discussions/quick-questions. Private rooms, DMs, group DMs, and event/city rooms you are not a member of return 403. Reading this endpoint does **not** mark the room as read or modify any unread state. **AI summaries:** the latest daily digest is embedded under `aiSummaryDaily`, the latest weekly digest under `aiSummaryWeekly`. Rooms that don't have a given type yet return `null` for that slot. For history (older summaries), call `GET /rooms/:roomID/summaries/daily` or `/weekly`. **See also:** For specific content (`did anyone mention X?`), `POST /search/messages` with `q=` and `roomID=` is faster than paginating `/rooms/:roomID/messages` or reading summaries. The AI summaries cover broad activity per window; search is the tool for targeted lookup.
    Connector
  • Read **text content** of an attached file. Works for: .txt, .md, .json, code files, and PDFs (after files.ingest extracts text). DO NOT call on binary files — for IMAGES use `files.get_base64`, for AUDIO/VIDEO it cannot be transcribed via this tool, and for non-PDF DOCUMENTS run `files.ingest` first, THEN files.read. Calling on a binary mime-type returns an error — saves you a turn to read the routing hint before deciding.
    Connector
  • Read **text content** of an attached file. Works for: .txt, .md, .json, code files, and PDFs (after files.ingest extracts text). DO NOT call on binary files — for IMAGES use `files.get_base64`, for AUDIO/VIDEO it cannot be transcribed via this tool, and for non-PDF DOCUMENTS run `files.ingest` first, THEN files.read. Calling on a binary mime-type returns an error — saves you a turn to read the routing hint before deciding.
    Connector
  • Upload and normalize a FINISHED, ready-to-mail document to PDF. Choose this when the content is final and IDENTICAL for every recipient — including when you mail the same letter to many people (just quote/pay once per recipient with the same documentId). The exact bytes you give are what gets printed. Use create_template instead only when the content must vary per recipient via {{fields}}. Returns a documentId, the stored page count, byte size, and source format. Free; no payment required. Provide the document EXACTLY ONE way: `content` (inline text, for html/markdown/text), `contentBase64` (base64-encoded binary, for pdf/docx/image), or `url` (a publicly reachable URL the server fetches). Supplying none, or more than one, is an error. Maximum upload size is 31457280 bytes (~30 MB); output page size is US Letter. Any `{{...}}` text is printed LITERALLY here — it is NOT treated as a merge field. If you want personalized mail merge across recipients, use `create_template` instead. Reserved address zone: a recipient address block is printed over the top ~3 inches of page 1, so the server reserves that space for you automatically. For text/html/markdown/docx, page-1 content is pushed below the block (content may therefore flow onto an additional page); for pdf and image inputs, a blank first page is prepended. As a result the returned page count — and the selected-provider cost behind the resulting quote — can be higher than your source document (e.g. a single-page PDF is stored as 2 pages). You do NOT need to leave the top of your document blank yourself. See the postagent://formats resource for per-format details.
    Connector
  • Upload and normalize a FINISHED, ready-to-mail document to PDF. Choose this when the content is final and IDENTICAL for every recipient — including when you mail the same letter to many people (just quote/pay once per recipient with the same documentId). The exact bytes you give are what gets printed. Use create_template instead only when the content must vary per recipient via {{fields}}. Returns a documentId, the stored page count, byte size, and source format. Free; no payment required. Provide the document EXACTLY ONE way: `content` (inline text, for html/markdown/text), `contentBase64` (base64-encoded binary, for pdf/docx/image), or `url` (a publicly reachable URL the server fetches). Supplying none, or more than one, is an error. Maximum upload size is 31457280 bytes (~30 MB); output page size is US Letter. Any `{{...}}` text is printed LITERALLY here — it is NOT treated as a merge field. If you want personalized mail merge across recipients, use `create_template` instead. Reserved address zone: a recipient address block is printed over the top ~3 inches of page 1, so the server reserves that space for you automatically. For text/html/markdown/docx, page-1 content is pushed below the block (content may therefore flow onto an additional page); for pdf and image inputs, a blank first page is prepended. As a result the returned page count — and the selected-provider cost behind the resulting quote — can be higher than your source document (e.g. a single-page PDF is stored as 2 pages). You do NOT need to leave the top of your document blank yourself. See the postagent://formats resource for per-format details.
    Connector
  • Parse a file using Firecrawl's /v2/parse endpoint. In local/non-cloud MCP mode, this tool reads filePath from the MCP server filesystem and posts multipart data to the configured self-hosted FIRECRAWL_API_URL, preserving the existing direct-read behavior. In hosted CLOUD_SERVICE mode, this tool is a two-call flow because hosted MCP cannot read your local filesystem: 1. Call with filePath, contentType, parse options, and optional declaredSizeBytes. The hosted server mints a short-lived upload URL and returns a safe local curl PUT command plus nextToolCall. 2. Run the returned curl command locally, then call firecrawl_parse again with uploadRef and the desired parse options. The hosted server calls /v2/parse server-side with your session credential. **Best for:** Extracting content from a local document (PDF, Word, Excel, HTML, etc.); pulling structured data out of a file with JSON format; converting binary documents into markdown for downstream reasoning. **Not recommended for:** Remote URLs (use firecrawl_scrape); multiple files at once (call parse multiple times); documents that require interactive actions, screenshots, or change tracking — those aren't supported by the parse endpoint. **Common mistakes:** In hosted mode, do not pass both filePath and uploadRef. Phase 1 uses filePath only to generate upload instructions; phase 2 uses uploadRef only to parse server-side. **Supported file types:** .html, .htm, .xhtml, .pdf, .docx, .doc, .odt, .rtf, .xlsx, .xls **Unsupported options:** actions, screenshot/branding/changeTracking formats, waitFor > 0, location, mobile, proxy values other than "auto" or "basic". **Privacy:** Set `redactPII: true` to return content with personally identifiable information redacted. **CRITICAL - Format Selection (same rules as firecrawl_scrape):** When the user asks for SPECIFIC data points from a document, you MUST use JSON format with a schema. Only use markdown when the user needs the ENTIRE document content. **Handling PDFs:** Add `"parsers": ["pdf"]` (optionally with `pdfOptions.maxPages`) when parsing a PDF so the PDF engine is invoked explicitly. For very long documents, cap `maxPages` to keep the response within token limits. **Hosted phase 1 example:** ```json { "name": "firecrawl_parse", "arguments": { "filePath": "/absolute/path/to/document.pdf", "contentType": "application/pdf", "formats": ["markdown"], "parsers": ["pdf"], "zeroDataRetention": true } } ``` **Hosted phase 2 example:** ```json { "name": "firecrawl_parse", "arguments": { "uploadRef": "upload-ref-from-phase-1", "formats": ["markdown"], "parsers": ["pdf"], "zeroDataRetention": true } } ``` **Returns:** Phase 1 hosted upload instructions or a parsed document with markdown, html, links, summary, json, or query results depending on the requested formats.
    Connector
  • Fetch and convert a Microsoft Learn documentation webpage to markdown format. This tool retrieves the latest complete content of Microsoft documentation webpages including Azure, .NET, Microsoft 365, and other Microsoft technologies. ## When to Use This Tool - When search results provide incomplete information or truncated content - When you need complete step-by-step procedures or tutorials - When you need troubleshooting sections, prerequisites, or detailed explanations - When search results reference a specific page that seems highly relevant - For comprehensive guides that require full context ## Usage Pattern Use this tool AFTER microsoft_docs_search when you identify specific high-value pages that need complete content. The search tool gives you an overview; this tool gives you the complete picture. ## URL Requirements - The URL must be a valid HTML documentation webpage from the microsoft.com domain - Binary files (PDF, DOCX, images, etc.) are not supported ## Output Format markdown with headings, code blocks, tables, and links preserved.
    Connector
  • Download a completed report as PDF. Returns base64-encoded PDF content. Confirm report status='completed' via atlas_get_report(report_id) first. report_id from atlas_start_report response or atlas_list_reports. Free.
    Connector
  • Returns all published Arco sources for a term — Lexicon entries, blog articles, wiki pages, and podcast episodes — ordered by recommended reading sequence. Read-only. Use this when you need a reading list or reference list for a term. Use cite_term instead when you need a formatted citation for a specific publication type.
    Connector
  • Prepare a paid PDF render from arbitrary Handlebars-flavoured HTML. Use only when no starter fits (one-off layouts, custom branding). Prefer render_template_to_pdf when a starter matches. Validates your HTML and returns the exact, ready-to-execute HTTP request to run against pdfzen's render endpoint — POST /v402/render/pdf (x402, $0.006 USDC on Base, no API key) or POST /v1/render/pdf (pdfzen API key). pdfzen renders are executed over HTTP, not streamed in-band over MCP; this tool is the bridge.
    Connector
  • Fetch one or more URLs and return their content as clean markdown. Use this to read articles, documentation, blog posts, or any page where you need the complete text, not just a snippet from search. Also supports PDF, DOCX, and other document formats. Costs 1 credit per URL. Max 10 URLs per request. Failed URLs are not charged. Set include_raw_html=true to also get the raw HTML source in each result. Useful for inspecting embedded URLs, data attributes, iframes, or script tags that are stripped during markdown conversion. Returns null for non-HTML content (PDF, DOCX, etc.). Same cost. Returns: results (array of {title, url, content, raw_html, published_time, success, error}), credits_used, credits_remaining. Args: urls: List of URLs to fetch (max 10) include_raw_html: Include raw HTML source in each result (default false)
    Connector
  • Return an inline PDF artifact from supplied report_meta, tables, metrics, and summary content; this read-only renderer does not persist hosted files. Use this only when a structured report payload already exists; use report_docx_generate for editable Word output or compliance_edd_report to build the memo first.
    Connector
  • Extract plain text from a PDF or image (base64-encoded). Use when you need raw text for downstream AI analysis (summarization, claim checking, structured extraction). For documents at a public URL, use extract_url instead (no base64 encoding needed). Returns: { pages: number, text: string } Example prompts: - "Extract the text from this scanned contract so I can search it." - "Give me the raw text from this PDF document." - "OCR this image and return the text content."
    Connector
  • Extract tables and forms as Markdown from a PDF or image (base64-encoded). Use when the document contains structured tabular data such as financial statements, data sheets, or forms. For plain prose documents, use extract_text instead. Returns: { pages: number, text: string } — text contains Markdown-formatted tables. Example prompts: - "Extract the tables from this financial statement." - "Pull the data table from this PDF into Markdown format." - "Get the tabular data from this form document."
    Connector