Skip to main content
Glama
207,056 tools. Last updated 2026-06-17 18:12

"Tools for Extracting Content from Websites and PDFs" matching MCP tools:

  • Run an Agent402 tool by slug (find slugs with search_tools). The 1065 pure-CPU tools execute free on this hosted connector (rate-limited). Wallet-only tools (live search, browser rendering, PDFs, durable memory) return instructions for paid access instead.
    Connector
  • Download a PDF from a URL and extract all text content, page by page. Use this to read the full text of a specific document — for example, an annual report PDF linked from a search_filings result. Best combined with search_filings: use search_filings to locate the document, then parse_pdf_to_text for the full text. Do not use for PDFs that are already well-represented in the database — search_filings is faster and returns pre-ranked, relevant excerpts. Not suitable for scanned (image-only) PDFs without embedded text; those pages will be returned as "(no extractable text)". Args: pdf_url: Direct HTTPS URL to the PDF file, e.g. https://example.com/report.pdf. Must be publicly accessible; authentication-protected URLs will fail. Returns: All text from the PDF with "--- Page N ---" separators between pages. Returns an error string if the download fails, the URL does not point to a valid PDF, or the document exceeds the 60-second download timeout.
    Connector
  • USE WHEN reading the full content of a Pine Script v6 documentation file. Returns the file content; when limit is set, a header shows the char range and offset to continue reading. AFTER calling this tool, use offset=<end> to continue if the header indicates more content is available. For large files (ta.md, strategy.md, collections.md, drawing.md, general.md), prefer list_sections() + get_section() instead. Data sourced from bundled Pine Script v6 documentation.
    Connector
  • Fetches a single URL and returns its content. Use this when you have a specific URL in mind — for example, after web.search returns a link you want to read, or when the user pastes a URL. Modes (extract): - 'auto' (default): picks the right mode based on response content type. - 'markdown': for HTML pages; returns cleaned markdown plus the page <title>. - 'text': for JSON/XML/plaintext APIs; returns the raw decoded body. - 'file': for images, PDFs, audio, video, archives, or any binary — ingests the bytes into the user's file storage and returns a file_id you can pass to messages.send (to send as an attachment), agents.add_file (to add to agent knowledge), or files.read. Use web.fetch (not files.upload) when you need the file_id immediately for the next tool call — files.upload(source_url=…) is async and won't have the file ready in the same turn. Use web.search (not web.fetch) when you don't have a specific URL yet and need to find one.
    Connector
  • Read **text content** of an attached file. Works for: .txt, .md, .json, code files, and PDFs (after files.ingest extracts text). DO NOT call on binary files — for IMAGES use `files.get_base64`, for AUDIO/VIDEO it cannot be transcribed via this tool, and for non-PDF DOCUMENTS run `files.ingest` first, THEN files.read. Calling on a binary mime-type returns an error — saves you a turn to read the routing hint before deciding.
    Connector
  • Extract structured transaction data from a contract at a URL. Downloads the document, extracts text (with OCR fallback for scanned PDFs), and runs PrimaCoda's contract-extraction prompt to return parties, addresses, dates, prices, and key contract fields. Use this when an agent has the contract hosted somewhere (Dropbox, Google Drive direct download, Square Space, etc.) and wants to skip the upload step. For multi-document deals (purchase + addenda + disclosures), use the PrimaCoda dashboard's batch upload — this tool handles ONE document. Args: pdf_url: Direct download URL for the contract (PDF, DOCX, TXT, or image). Must be reachable from the PrimaCoda server. Google Drive "shared link" URLs work if set to "anyone with link"; other share URLs may need their direct-download form. api_key: Your PrimaCoda MCP API key (starts 'pck_').
    Connector

Matching MCP Servers

  • A
    license
    B
    quality
    B
    maintenance
    Extract content from URLs, documents, videos, and audio files using intelligent auto-engine selection. Supports web pages, PDFs, Word docs, YouTube transcripts, and more with structured JSON responses.
    Last updated
    1
    160
    MIT
  • A
    license
    -
    quality
    D
    maintenance
    A fully working MCP server built from scratch in plain Node.js, implementing tools, resources, prompts, notifications, and sampling according to the MCP specification, designed to connect to Claude Desktop or any MCP client.
    Last updated
    17
    MIT

Matching MCP Connectors

  • GOV.UK Content + Search APIs (every gov.uk page + full search)

  • Transform any blog post or article URL into ready-to-post social media content for Twitter/X threads, LinkedIn posts, Instagram captions, Facebook posts, and email newsletters. Pay-per-event: $0.07 for all 5 platforms, $0.03 for single platform.

  • Fetches a single URL and returns its content. Use this when you have a specific URL in mind — for example, after web.search returns a link you want to read, or when the user pastes a URL. Modes (extract): - 'auto' (default): picks the right mode based on response content type. - 'markdown': for HTML pages; returns cleaned markdown plus the page <title>. - 'text': for JSON/XML/plaintext APIs; returns the raw decoded body. - 'file': for images, PDFs, audio, video, archives, or any binary — ingests the bytes into the user's file storage and returns a file_id you can pass to messages.send (to send as an attachment), agents.add_file (to add to agent knowledge), or files.read. Use web.fetch (not files.upload) when you need the file_id immediately for the next tool call — files.upload(source_url=…) is async and won't have the file ready in the same turn. Use web.search (not web.fetch) when you don't have a specific URL yet and need to find one.
    Connector
  • Retrieve / download / get the file for a digital product after the user paid for it. Use after `pay_merchant` succeeds for digital goods (PDFs, ebooks, cheatsheets, datasets). Pass the on-chain `txHash` from `pay_merchant` OR a Coal checkout `sessionId`. Returns a verified download URL the user can click. Supported product slugs: `0g-cheatsheet` (The 0G Builder's Cheatsheet, $0.10).
    Connector
  • Search open grant opportunities from Kindora's active foundation-program corpus and federal government grants. Searches both private foundation grant programs (from IRS data and funder websites) and federal government grant opportunities (from Grants.gov). Uses full-text search with natural language understanding — queries are parsed into individual terms with stemming, so "youth after school programs" matches programs about youth, after-school, and programming even if those exact words don't appear together. Search covers program names, descriptions, focus areas, beneficiary types, and geographic focus fields. Use the state parameter to focus on geographically relevant opportunities. Query syntax: - Natural language: "affordable housing for seniors" (matches any of these terms) - Quoted phrases: '"after school"' (matches exact phrase) - Exclusion: "education -higher" (matches education, excludes higher education) - Combine: '"mental health" youth -adult' (phrase + term + exclusion) - No query: returns broadly open programs sorted by upcoming deadlines (browsing mode)
    Connector
  • Read **text content** of an attached file. Works for: .txt, .md, .json, code files, and PDFs (after files.ingest extracts text). DO NOT call on binary files — for IMAGES use `files.get_base64`, for AUDIO/VIDEO it cannot be transcribed via this tool, and for non-PDF DOCUMENTS run `files.ingest` first, THEN files.read. Calling on a binary mime-type returns an error — saves you a turn to read the routing hint before deciding.
    Connector
  • Extract text from PDFs and images as clean Markdown. Uses Mistral OCR — handles complex layouts, tables, handwriting, multi-column documents, and mathematical notation. Preserves document hierarchy in structured Markdown. 10 sats/page. Pay per request with Bitcoin Lightning — no API key or signup needed. Requires create_payment with toolName='extract_document' and quantity=pageCount for multi-page PDFs.
    Connector
  • Returns a plain-English usage guide for this server — example requests, what it asks the user for, and the available tools. Call this if the user asks how to use Abby SEO, or to orient yourself before starting. (Same content as the 'getting_started' prompt, exposed as a tool for clients that don't surface MCP prompts.) Takes no arguments.
    Connector
  • Create a job description from text within a hiring context. Returns a JD object with 'id' and stored content. Use JD content as jd_text in atlas_fit_match, atlas_fit_rank, atlas_start_jd_fit_batch, and atlas_start_jd_analysis. Requires context_id from atlas_create_context or atlas_list_contexts. Free.
    Connector
  • List all generated reports with status and summary info. Returns an array of report objects with id, report_type, status, title, and summary. Use the report id with atlas_get_report for details or atlas_download_report to download completed PDFs. Free.
    Connector
  • Search Google Scholar for academic papers, citations, and scholarly articles. Returns results with titles, authors, publication info, citation counts, and links to PDFs. Use cites parameter to find papers citing a specific work, or cluster to find all versions of a paper. For US court opinions and case law, use google_scholar_cases instead.
    Connector
  • Get detailed CV version including structured content, sections, word count, and audience profile. cv_version_id from ceevee_upload_cv or ceevee_list_versions. Use to inspect CV content before running analysis tools. Free.
    Connector
  • Get your agent's real mailing address beta endpoint when the account has explicit beta access: street address + mailbox number for approved accounts. For generally available inbound context, use list_inbound_forwarding_addresses instead; that returns a private intake alias for scans, PDFs, photos, provider notices, and notes from addresses the operator already uses.
    Connector
  • One-call complete archive research package for a document, PDF, or editorial brief. Input: title, audience, themes, archives to draw from, things to avoid, number of colours. Output: ranked colour cards with full provenance, story order, source confidence flags, pull quote, CTA line, CSS tokens, image prompt for Midjourney/Flux/DALLE, editorial argument, weakest and strongest entries identified. Replaces chaining archive_search + get_colour_card + cliche_breaker + agent_brief separately. Two Claude calls total. This is the endpoint for building premium archive documents, PDFs, briefs, and editorial content. Use this first for any document workflow.
    Connector
  • List all positioning sessions (market analysis through lens selection to targeted edits). Returns an array of session objects with id, status, cv_version_id, and created_at. Use the session id with ceevee_get_positioning_session for full details including analysis results, edits, and PDFs. Free.
    Connector
  • Load educational slides or cloud file attachments. Use laminasAnexos for educational slides/laminas (~238 items with PDFs about nutrition topics), cloudAnexos for uploaded cloud files. For guidelines/orientations specifically, use webdiet_orientacoes action=list_banco. Bulk support: accepts patient_ids for batched execution.
    Connector