207,056 tools. Last updated 2026-06-17 18:12

"Tools for Extracting Content from Websites and PDFs" matching MCP tools:

call_tool
Agent402 — pay-per-call web tools
Run an Agent402 tool by slug (find slugs with search_tools). The 1065 pure-CPU tools execute free on this hosted connector (rate-limited). Wallet-only tools (live search, browser rendering, PDFs, durable memory) return instructions for paid access instead.
Connector
parse_pdf_to_text
Nordic Financial MCP
Download a PDF from a URL and extract all text content, page by page. Use this to read the full text of a specific document — for example, an annual report PDF linked from a search_filings result. Best combined with search_filings: use search_filings to locate the document, then parse_pdf_to_text for the full text. Do not use for PDFs that are already well-represented in the database — search_filings is faster and returns pre-ranked, relevant excerpts. Not suitable for scanned (image-only) PDFs without embedded text; those pages will be returned as "(no extractable text)". Args: pdf_url: Direct HTTPS URL to the PDF file, e.g. https://example.com/report.pdf. Must be publicly accessible; authentication-protected URLs will fail. Returns: All text from the PDF with "--- Page N ---" separators between pages. Returns an error string if the download fails, the URL does not point to a valid PDF, or the document exceeds the 60-second download timeout.
Connector
get_doc
Pine Script
USE WHEN reading the full content of a Pine Script v6 documentation file. Returns the file content; when limit is set, a header shows the char range and offset to continue reading. AFTER calling this tool, use offset=<end> to continue if the header indicates more content is available. For large files (ta.md, strategy.md, collections.md, drawing.md, general.md), prefer list_sections() + get_section() instead. Data sourced from bundled Pine Script v6 documentation.
Connector
web_fetch
dialogbrain
Fetches a single URL and returns its content. Use this when you have a specific URL in mind — for example, after web.search returns a link you want to read, or when the user pastes a URL. Modes (extract): - 'auto' (default): picks the right mode based on response content type. - 'markdown': for HTML pages; returns cleaned markdown plus the page <title>. - 'text': for JSON/XML/plaintext APIs; returns the raw decoded body. - 'file': for images, PDFs, audio, video, archives, or any binary — ingests the bytes into the user's file storage and returns a file_id you can pass to messages.send (to send as an attachment), agents.add_file (to add to agent knowledge), or files.read. Use web.fetch (not files.upload) when you need the file_id immediately for the next tool call — files.upload(source_url=…) is async and won't have the file ready in the same turn. Use web.search (not web.fetch) when you don't have a specific URL yet and need to find one.
Connector
files_read
DialogBrain
Read **text content** of an attached file. Works for: .txt, .md, .json, code files, and PDFs (after files.ingest extracts text). DO NOT call on binary files — for IMAGES use `files.get_base64`, for AUDIO/VIDEO it cannot be transcribed via this tool, and for non-PDF DOCUMENTS run `files.ingest` first, THEN files.read. Calling on a binary mime-type returns an error — saves you a turn to read the routing hint before deciding.
Connector
extract_contract_from_url
transaction-coordinator
Extract structured transaction data from a contract at a URL. Downloads the document, extracts text (with OCR fallback for scanned PDFs), and runs PrimaCoda's contract-extraction prompt to return parties, addresses, dates, prices, and key contract fields. Use this when an agent has the contract hosted somewhere (Dropbox, Google Drive direct download, Square Space, etc.) and wants to skip the upload step. For multi-document deals (purchase + addenda + disclosures), use the PrimaCoda dashboard's batch upload — this tool handles ONE document. Args: pdf_url: Direct download URL for the contract (PDF, DOCX, TXT, or image). Must be reachable from the PrimaCoda server. Google Drive "shared link" URLs work if set to "anyone with link"; other share URLs may need their direct-download form. api_key: Your PrimaCoda MCP API key (starts 'pck_').
Connector

Matching MCP Servers

content-core
Web Scraping Multimedia Processing
lfnovo
A
license
B
quality
B
maintenance
Extract content from URLs, documents, videos, and audio files using intelligent auto-engine selection. Supports web pages, PDFs, Word docs, YouTube transcripts, and more with structured JSON responses.
Last updated 2026-05-12
1
160
MIT
MCP from Scratch Server
Developer Tools Education & Learning Tools
pguso
A
license
-
quality
D
maintenance
A fully working MCP server built from scratch in plain Node.js, implementing tools, resources, prompts, notifications, and sampling according to the MCP specification, designed to connect to Claude Desktop or any MCP client.
Last updated 2026-05-25
17
MIT

Matching MCP Connectors

Gov Uk Content
GOV.UK Content + Search APIs (every gov.uk page + full search)
Content to Social
Transform any blog post or article URL into ready-to-post social media content for Twitter/X threads, LinkedIn posts, Instagram captions, Facebook posts, and email newsletters. Pay-per-event: $0.07 for all 5 platforms, $0.03 for single platform.

web_fetch
DialogBrain
Fetches a single URL and returns its content. Use this when you have a specific URL in mind — for example, after web.search returns a link you want to read, or when the user pastes a URL. Modes (extract): - 'auto' (default): picks the right mode based on response content type. - 'markdown': for HTML pages; returns cleaned markdown plus the page <title>. - 'text': for JSON/XML/plaintext APIs; returns the raw decoded body. - 'file': for images, PDFs, audio, video, archives, or any binary — ingests the bytes into the user's file storage and returns a file_id you can pass to messages.send (to send as an attachment), agents.add_file (to add to agent knowledge), or files.read. Use web.fetch (not files.upload) when you need the file_id immediately for the next tool call — files.upload(source_url=…) is async and won't have the file ready in the same turn. Use web.search (not web.fetch) when you don't have a specific URL yet and need to find one.
Connector
download_product
Coal — Payments for AI agents
Retrieve / download / get the file for a digital product after the user paid for it. Use after `pay_merchant` succeeds for digital goods (PDFs, ebooks, cheatsheets, datasets). Pass the on-chain `txHash` from `pay_merchant` OR a Coal checkout `sessionId`. Returns a verified download URL the user can click. Supported product slugs: `0g-cheatsheet` (The 0G Builder's Cheatsheet, $0.10).
Connector
search_open_grants
foundation-discovery
Search open grant opportunities from Kindora's active foundation-program corpus and federal government grants. Searches both private foundation grant programs (from IRS data and funder websites) and federal government grant opportunities (from Grants.gov). Uses full-text search with natural language understanding — queries are parsed into individual terms with stemming, so "youth after school programs" matches programs about youth, after-school, and programming even if those exact words don't appear together. Search covers program names, descriptions, focus areas, beneficiary types, and geographic focus fields. Use the state parameter to focus on geographically relevant opportunities. Query syntax: - Natural language: "affordable housing for seniors" (matches any of these terms) - Quoted phrases: '"after school"' (matches exact phrase) - Exclusion: "education -higher" (matches education, excludes higher education) - Combine: '"mental health" youth -adult' (phrase + term + exclusion) - No query: returns broadly open programs sorted by upcoming deadlines (browsing mode)
Connector
files_read
dialogbrain
Read **text content** of an attached file. Works for: .txt, .md, .json, code files, and PDFs (after files.ingest extracts text). DO NOT call on binary files — for IMAGES use `files.get_base64`, for AUDIO/VIDEO it cannot be transcribed via this tool, and for non-PDF DOCUMENTS run `files.ingest` first, THEN files.read. Calling on a binary mime-type returns an error — saves you a turn to read the routing hint before deciding.
Connector
extract_document
Sats4AI - Bitcoin-Powered AI Tools
Extract text from PDFs and images as clean Markdown. Uses Mistral OCR — handles complex layouts, tables, handwriting, multi-column documents, and mathematical notation. Preserves document hierarchy in structured Markdown. 10 sats/page. Pay per request with Bitcoin Lightning — no API key or signup needed. Requires create_payment with toolName='extract_document' and quantity=pageCount for multi-page PDFs.
Connector
how_to_use
mcp-server
Returns a plain-English usage guide for this server — example requests, what it asks the user for, and the available tools. Call this if the user asks how to use Abby SEO, or to orient yourself before starting. (Same content as the 'getting_started' prompt, exposed as a tool for clients that don't surface MCP prompts.) Takes no arguments.
Connector
atlas_create_jd
CareerProof MCP
Create a job description from text within a hiring context. Returns a JD object with 'id' and stored content. Use JD content as jd_text in atlas_fit_match, atlas_fit_rank, atlas_start_jd_fit_batch, and atlas_start_jd_analysis. Requires context_id from atlas_create_context or atlas_list_contexts. Free.
Connector
atlas_list_reports
CareerProof MCP
List all generated reports with status and summary info. Returns an array of report objects with id, report_type, status, title, and summary. Use the report id with atlas_get_report for details or atlas_download_report to download completed PDFs. Free.
Connector
google_scholar
Searchapi
Search Google Scholar for academic papers, citations, and scholarly articles. Returns results with titles, authors, publication info, citation counts, and links to PDFs. Use cites parameter to find papers citing a specific work, or cluster to find all versions of a paper. For US court opinions and case law, use google_scholar_cases instead.
Connector
ceevee_get_version
CareerProof MCP
Get detailed CV version including structured content, sections, word count, and audience profile. cv_version_id from ceevee_upload_cv or ceevee_list_versions. Use to inspect CV content before running analysis tools. Free.
Connector
get_mailbox
mailbox
Get your agent's real mailing address beta endpoint when the account has explicit beta access: street address + mailbox number for approved accounts. For generally available inbound context, use list_inbound_forwarding_addresses instead; that returns a private intake alias for scans, PDFs, photos, provider notices, and notes from addresses the operator already uses.
Connector
archive_report_brief
Colour Memory
One-call complete archive research package for a document, PDF, or editorial brief. Input: title, audience, themes, archives to draw from, things to avoid, number of colours. Output: ranked colour cards with full provenance, story order, source confidence flags, pull quote, CTA line, CSS tokens, image prompt for Midjourney/Flux/DALLE, editorial argument, weakest and strongest entries identified. Replaces chaining archive_search + get_colour_card + cliche_breaker + agent_brief separately. Two Claude calls total. This is the endpoint for building premium archive documents, PDFs, briefs, and editorial content. Use this first for any document workflow.
Connector
ceevee_list_positioning_sessions
CareerProof MCP
List all positioning sessions (market analysis through lens selection to targeted edits). Returns an array of session objects with id, status, cv_version_id, and created_at. Use the session id with ceevee_get_positioning_session for full details including analysis results, edits, and PDFs. Free.
Connector
webdiet_list_anexos
WebDiet
Load educational slides or cloud file attachments. Use laminasAnexos for educational slides/laminas (~238 items with PDFs about nutrition topics), cloudAnexos for uploaded cloud files. For guidelines/orientations specifically, use webdiet_orientacoes action=list_banco. Bulk support: accepts patient_ids for batched execution.
Connector