Skip to main content
Glama
213,191 tools. Last updated 2026-06-19 13:08

"Understanding the text content of a webpage" matching MCP tools:

  • Upload a REUSABLE template containing `{{field}}` placeholders (e.g. `Dear {{name}},` or `Balance due: {{amount}}`). Choose this ONLY when the content must vary per recipient (mail merge) — recipient count is irrelevant, so a single personalized letter belongs here too. If the content is identical for everyone, use create_letter instead (this tool rejects input with no `{{fields}}`). Returns a documentId with `kind: "html_template"`, a `mergeFields` list of the detected field names, and an `estimatedPageCount`. Free; no payment required. Template source must be TEXT-BASED (html, markdown, or text) and must contain at least one `{{field}}`, or the upload is rejected — for a finished document with no merge fields, use `create_letter`. Provide the template EXACTLY ONE way: `content` (inline text), `contentBase64` (base64-encoded text), or `url` (a publicly reachable URL the server fetches). Supplying none, or more than one, is an error. Maximum upload size is 31457280 bytes (~30 MB); output page size is US Letter. Reuse one template documentId across recipients: call create_mail_quote ONCE PER RECIPIENT, supplying that recipient's values via `mergeVariables` (every field in `mergeFields` must have a non-empty value). The server substitutes the values and renders that recipient's personalized PDF at quote time, so `estimatedPageCount` is only a baseline — the binding page count and price are set per quote from the actual rendered output. Reserved address zone: a recipient address block is printed over the top ~3 inches of page 1, so the server reserves that space automatically (page-1 content is pushed below the block and may flow onto an additional page). You do NOT need to leave the top blank yourself. See the postagent://formats resource for details.
    Connector
  • Read a workspace's doc (TipTap rich-text) body. Format is negotiable via `format`: `markdown` (default — CommonMark + GFM, ready to feed to an LLM or render in a non-ProseMirror surface), `content` (TipTap JSON, round-trippable into update_doc for structural edits), `text` (plain text, best for search, summarisation, word-count heuristics), or `all` for the legacy three-in-one shape. Default is `markdown` because it's the slice agents need 95% of the time and the JSON form on a long doc can blow past the agent harness's tool-result token cap. Pass `format: "content"` only when you're round-tripping into update_doc for a structural edit. A workspace can hold any combination of doc and table surfaces, one or many of either kind; omit `surface_slug` to read the primary doc surface, or pass it to target a specific doc tab (use `list_surfaces` to enumerate). An unwritten or absent doc returns the requested format empty (markdown="", content={}, text=""); a `surface_slug` that doesn't match any live doc surface 404s.
    Connector
  • Fetches any public web page and returns clean, readable plain text stripped of HTML, navigation, scripts, advertisements, and boilerplate. Returns the page title, meta description, word count, and main body text ready for analysis or summarisation. Use this tool when an agent needs to read the content of a specific web page or article URL — for example to summarise an article, extract facts from a page, verify a claim by reading the source, or convert a web page into plain text to pass to another tool. Pass article URLs returned by web_news_headlines to this tool to read full article content. Do not use this tool to discover current news headlines — use web_news_headlines instead. Does not execute JavaScript — best suited for standard HTML content pages. Will not work with paywalled, login-protected, or JavaScript-rendered single-page applications.
    Connector
  • Download a PDF from a URL and extract all text content, page by page. Use this to read the full text of a specific document — for example, an annual report PDF linked from a search_filings result. Best combined with search_filings: use search_filings to locate the document, then parse_pdf_to_text for the full text. Do not use for PDFs that are already well-represented in the database — search_filings is faster and returns pre-ranked, relevant excerpts. Not suitable for scanned (image-only) PDFs without embedded text; those pages will be returned as "(no extractable text)". Args: pdf_url: Direct HTTPS URL to the PDF file, e.g. https://example.com/report.pdf. Must be publicly accessible; authentication-protected URLs will fail. Returns: All text from the PDF with "--- Page N ---" separators between pages. Returns an error string if the download fails, the URL does not point to a valid PDF, or the document exceeds the 60-second download timeout.
    Connector
  • Upload and normalize a FINISHED, ready-to-mail document to PDF. Choose this when the content is final and IDENTICAL for every recipient — including when you mail the same letter to many people (just quote/pay once per recipient with the same documentId). The exact bytes you give are what gets printed. Use create_template instead only when the content must vary per recipient via {{fields}}. Returns a documentId, the stored page count, byte size, and source format. Free; no payment required. Provide the document EXACTLY ONE way: `content` (inline text, for html/markdown/text), `contentBase64` (base64-encoded binary, for pdf/docx/image), or `url` (a publicly reachable URL the server fetches). Supplying none, or more than one, is an error. Maximum upload size is 31457280 bytes (~30 MB); output page size is US Letter. Any `{{...}}` text is printed LITERALLY here — it is NOT treated as a merge field. If you want personalized mail merge across recipients, use `create_template` instead. Reserved address zone: a recipient address block is printed over the top ~3 inches of page 1, so the server reserves that space for you automatically. For text/html/markdown/docx, page-1 content is pushed below the block (content may therefore flow onto an additional page); for pdf and image inputs, a blank first page is prepended. As a result the returned page count — and the per-page price in the resulting quote — can be one more than your source document (e.g. a single-page PDF is stored as 2 pages). You do NOT need to leave the top of your document blank yourself. See the postagent://formats resource for per-format details.
    Connector
  • Create a NEW text node, or update an existing one (pass the same `id` to overwrite content/position in place — preferred over creating a duplicate). Supports cnvs markup (Markdown-ish) and Mermaid diagrams in the content. When using Mermaid, the ENTIRE content of this text node must be a single Mermaid diagram (one ```mermaid fenced block and nothing else — no heading, no prose before or after). If you need prose + a diagram, create two separate text nodes. `postit: true` renders as a yellow sticky; `diagram: true` renders as a framed box (2px border in the text colour, centred text) — the two are mutually exclusive. Coordinates are in board-world pixels, +x right, +y DOWN; pick a spot that does not overlap existing items (check `get_preview` first). Default width auto-fits content up to ~320 px; pass `width` for explicit wrapping (160–4096). Keep content under 100 000 chars.
    Connector

Matching MCP Servers

Matching MCP Connectors

  • Guardian Open Platform: content search, articles, sections, tags. Free dev key.

  • GOV.UK Content + Search APIs (every gov.uk page + full search)

  • Deploy a static site to a live URL — free, no account or API key required. **File content is plain text by default.** Pass HTML/CSS/JS/JSON/SVG/etc. directly in each file's `content` as a regular string. Only set `encoding: "base64"` per-file for binary content (images, fonts) — do not base64-encode text. Returns the live URL plus a claim URL (the site expires in 3 days unless claimed) — always show both to the user. To make the site private, pass `password`; always show the password to the user if you set one.
    Connector
  • Upload and normalize a FINISHED, ready-to-mail document to PDF. Choose this when the content is final and IDENTICAL for every recipient — including when you mail the same letter to many people (just quote/pay once per recipient with the same documentId). The exact bytes you give are what gets printed. Use create_template instead only when the content must vary per recipient via {{fields}}. Returns a documentId, the stored page count, byte size, and source format. Free; no payment required. Provide the document EXACTLY ONE way: `content` (inline text, for html/markdown/text), `contentBase64` (base64-encoded binary, for pdf/docx/image), or `url` (a publicly reachable URL the server fetches). Supplying none, or more than one, is an error. Maximum upload size is 31457280 bytes (~30 MB); output page size is US Letter. Any `{{...}}` text is printed LITERALLY here — it is NOT treated as a merge field. If you want personalized mail merge across recipients, use `create_template` instead. Reserved address zone: a recipient address block is printed over the top ~3 inches of page 1, so the server reserves that space for you automatically. For text/html/markdown/docx, page-1 content is pushed below the block (content may therefore flow onto an additional page); for pdf and image inputs, a blank first page is prepended. As a result the returned page count — and the per-page price in the resulting quote — can be one more than your source document (e.g. a single-page PDF is stored as 2 pages). You do NOT need to leave the top of your document blank yourself. See the postagent://formats resource for per-format details.
    Connector
  • Read **text content** of an attached file. Works for: .txt, .md, .json, code files, and PDFs (after files.ingest extracts text). DO NOT call on binary files — for IMAGES use `files.get_base64`, for AUDIO/VIDEO it cannot be transcribed via this tool, and for non-PDF DOCUMENTS run `files.ingest` first, THEN files.read. Calling on a binary mime-type returns an error — saves you a turn to read the routing hint before deciding.
    Connector
  • Upload and normalize a FINISHED, ready-to-mail document to PDF. Choose this when the content is final and IDENTICAL for every recipient — including when you mail the same letter to many people (just quote/pay once per recipient with the same documentId). The exact bytes you give are what gets printed. Use create_template instead only when the content must vary per recipient via {{fields}}. Returns a documentId, the stored page count, byte size, and source format. Free; no payment required. Provide the document EXACTLY ONE way: `content` (inline text, for html/markdown/text), `contentBase64` (base64-encoded binary, for pdf/docx/image), or `url` (a publicly reachable URL the server fetches). Supplying none, or more than one, is an error. Maximum upload size is 31457280 bytes (~30 MB); output page size is US Letter. Any `{{...}}` text is printed LITERALLY here — it is NOT treated as a merge field. If you want personalized mail merge across recipients, use `create_template` instead. Reserved address zone: a recipient address block is printed over the top ~3 inches of page 1, so the server reserves that space for you automatically. For text/html/markdown/docx, page-1 content is pushed below the block (content may therefore flow onto an additional page); for pdf and image inputs, a blank first page is prepended. As a result the returned page count — and the per-page price in the resulting quote — can be one more than your source document (e.g. a single-page PDF is stored as 2 pages). You do NOT need to leave the top of your document blank yourself. See the postagent://formats resource for per-format details.
    Connector
  • Retrieve the plain-text content of a Project Gutenberg book, stripped of the standard license header and footer so the response contains only the literary work. For long works — novels routinely run 500KB–2MB — use offset and limit to read in chunks rather than fetching the whole book at once. The response reports totalChars and remainingChars so the caller can page through without guessing. Prefers UTF-8 plain text; falls back to ASCII plain text; refuses audio books (media_type "Sound") with a clear error.
    Connector
  • Upload a REUSABLE template containing `{{field}}` placeholders (e.g. `Dear {{name}},` or `Balance due: {{amount}}`). Choose this ONLY when the content must vary per recipient (mail merge) — recipient count is irrelevant, so a single personalized letter belongs here too. If the content is identical for everyone, use create_letter instead (this tool rejects input with no `{{fields}}`). Returns a documentId with `kind: "html_template"`, a `mergeFields` list of the detected field names, and an `estimatedPageCount`. Free; no payment required. Template source must be TEXT-BASED (html, markdown, or text) and must contain at least one `{{field}}`, or the upload is rejected — for a finished document with no merge fields, use `create_letter`. Provide the template EXACTLY ONE way: `content` (inline text), `contentBase64` (base64-encoded text), or `url` (a publicly reachable URL the server fetches). Supplying none, or more than one, is an error. Maximum upload size is 31457280 bytes (~30 MB); output page size is US Letter. Reuse one template documentId across recipients: call create_mail_quote ONCE PER RECIPIENT, supplying that recipient's values via `mergeVariables` (every field in `mergeFields` must have a non-empty value). The server substitutes the values and renders that recipient's personalized PDF at quote time, so `estimatedPageCount` is only a baseline — the binding page count and price are set per quote from the actual rendered output. Reserved address zone: a recipient address block is printed over the top ~3 inches of page 1, so the server reserves that space automatically (page-1 content is pushed below the block and may flow onto an additional page). You do NOT need to leave the top blank yourself. See the postagent://formats resource for details.
    Connector
  • Read **text content** of an attached file. Works for: .txt, .md, .json, code files, and PDFs (after files.ingest extracts text). DO NOT call on binary files — for IMAGES use `files.get_base64`, for AUDIO/VIDEO it cannot be transcribed via this tool, and for non-PDF DOCUMENTS run `files.ingest` first, THEN files.read. Calling on a binary mime-type returns an error — saves you a turn to read the routing hint before deciding.
    Connector
  • Fetch a webpage and extract specific information using AI. Use this when you need structured data from a page (e.g. pricing, specs, contact info) rather than the raw content. Costs 5 credits. If the page has no usable text (empty or JavaScript-rendered body), the model is NOT called: content comes back empty and usage.low_content is true, rather than a fabricated answer. Gate on usage.low_content (or usage.content_chars) to detect pages you cannot ground on. Returns: content (the extracted text), url, credits_used, credits_remaining, usage (input_tokens, output_tokens, content_chars, low_content). Args: url: The URL to extract from prompt: What information to extract (e.g. "list all pricing tiers with features" or "extract the author name and publication date")
    Connector
  • Create a job description from text within a hiring context. Returns a JD object with 'id' and stored content. Use JD content as jd_text in atlas_fit_match, atlas_fit_rank, atlas_start_jd_fit_batch, and atlas_start_jd_analysis. Requires context_id from atlas_create_context or atlas_list_contexts. Free.
    Connector
  • Fetch and convert a Microsoft Learn documentation webpage to markdown format. This tool retrieves the latest complete content of Microsoft documentation webpages including Azure, .NET, Microsoft 365, and other Microsoft technologies. ## When to Use This Tool - When search results provide incomplete information or truncated content - When you need complete step-by-step procedures or tutorials - When you need troubleshooting sections, prerequisites, or detailed explanations - When search results reference a specific page that seems highly relevant - For comprehensive guides that require full context ## Usage Pattern Use this tool AFTER microsoft_docs_search when you identify specific high-value pages that need complete content. The search tool gives you an overview; this tool gives you the complete picture. ## URL Requirements - The URL must be a valid HTML documentation webpage from the microsoft.com domain - Binary files (PDF, DOCX, images, etc.) are not supported ## Output Format markdown with headings, code blocks, tables, and links preserved.
    Connector
  • Tag / extract named entities (NER) from free plain text using the RChilli NER Tagger Plugin — identifies job titles, cities, skills, degrees, and organizations with their positions. Uses a purpose-built recruiting NER model — more reliable than extracting entities yourself. Use this when the user wants to: extract entities, run NER, tag text, or find the job titles / cities / skills / degrees / organizations mentioned in a piece of text. Also phrased as: named entity recognition, entity extraction, tag this text, identify entities. Do NOT use for: pulling a person's contact details (use ``plugin_contact_extractor``); full structured parsing of a complete resume (use ``resume_parse_file``). Args: text: Plain text content to analyse (text only, not PDF/DOCX). userkey: RChilli API userkey. Leave blank to use the authenticated session key. subuserid: Sub-user identifier for multi-tenant isolation. Returns: A list of named entity objects, each containing: ``Type`` (e.g. ``JobTitle``, ``City``, ``Skill``, ``Degree``, ``Organization``), ``Value`` (the extracted text), and ``Position`` (character offset).
    Connector
  • Convert a document inline — pass the content directly as a string (or base64 for binary inputs like .docx). PREFERRED route for documents, and the one to use in sandboxed agent environments (claude.ai, Claude Desktop, Cursor): it runs entirely server-side, so it never needs the S3 upload those sandboxes block. Limit: up to 4 MB of content — already huge (a 500-page book is ~1 MB of text). For anything larger, use convert_from_url with a public URL. Supported inputs: md, html, rst, txt (plain text), docx (base64). Supported outputs: docx (Word), pdf, html, txt, md, rst, xlsx. Returns a job_id — poll get_job_status until 'complete', then get_output_content (inline bytes, sandbox-safe) or get_download_url (S3 link). Flat fee $0.05 per file. TIP: if you have shell access and are NOT sandboxed (e.g. a local coding agent), the `botverse` CLI (`npx botverse convert <file> --to <fmt>`) is faster for local files — it streams from disk instead of re-emitting the content through the model.
    Connector
  • Read the full content of an existing asset so you can inspect it before patching. Text assets (js, css, json, svg, txt, html) are returned as UTF-8 strings in `content` with `encoding: "utf-8"`. Binary assets (images, fonts, pdf) are returned as base64 in `content` with `encoding: "base64"`, but only if size ≤ 1MB; above that the content is omitted and you should use the `url` field to download directly. Always returns `version_hash` (first 8 hex chars of the SHA-256 of the content) for optimistic concurrency in patch_asset.
    Connector
  • Extract a PDF to clean Markdown/LaTeX text via MinerU (great for papers behind no open-access full text — give the user's PDF and get readable text back). Provide pdf_url (downloaded server-side, SSRF-guarded) OR pdf_base64. formula/table toggle math/table reconstruction. Returns {task_id, status, cached, content, chars}: a recently-seen (cached) or small PDF comes back with `content` in one call; a fresh PDF (MinerU is GPU-heavy, minutes) returns status='running' + a task_id — then call extract_pdf_result(task_id) to fetch the text.
    Connector