Skip to main content
Glama
213,558 tools. Last updated 2026-06-19 19:41

"Tools and methods for extracting HTML content from websites" matching MCP tools:

  • Publish HTML content to a live URL instantly. No account or API key required. Returns a public URL that anyone can visit. Sites expire after 24 hours unless the owner claims them. Use this when a user asks you to build, create, or deploy a website, landing page, invitation, portfolio, report, or any HTML content they want to share as a link. Supports bundled assets (CSS, JS, images) and multi-page sites (include additional .html files in assets, accessible at /{slug}/{pagename}). IMPORTANT: After publishing, always share the live URL and the claim URL with the user. The claim URL lets them take permanent ownership of the site. Store the edit_key from the response silently — do not show it to the user — you will need it if they ask you to make changes to the site later. If you lose the edit_key, ask the user to claim the site first (via the claim URL in the page footer), then provide you with their API key from the dashboard — you can use that instead.
    Connector
  • Fetches clean text from any public HTTPS URL. Use x711_web_search first to find the URL, then this tool to read it. Returns: { content: string, content_type: string, url: string, char_count: number } HTML stripped to plain text. JSON returned as-is. Blocked: localhost, private IPs, .internal domains.
    Connector
  • Fetches any public web page and returns clean, readable plain text stripped of HTML, navigation, scripts, advertisements, and boilerplate. Returns the page title, meta description, word count, and main body text ready for analysis or summarisation. Use this tool when an agent needs to read the content of a specific web page or article URL — for example to summarise an article, extract facts from a page, verify a claim by reading the source, or convert a web page into plain text to pass to another tool. Pass article URLs returned by web_news_headlines to this tool to read full article content. Do not use this tool to discover current news headlines — use web_news_headlines instead. Does not execute JavaScript — best suited for standard HTML content pages. Will not work with paywalled, login-protected, or JavaScript-rendered single-page applications.
    Connector
  • Re-pin a LIVE pane to swap its HTML (design) + event/input/record schemas in place — same URL, no new pane. Two ways: (1) pass `html` to EDIT AN INLINE PANE'S HTML in one call — the relay appends a fresh version with that HTML and re-pins (schemas you omit are inherited from the current version, so to change only the HTML pass only `html`); inline panes only. (2) pass `template_version` to re-pin to a version you already appended with the `template` tool (action: version) — for named/reusable templates. By default a strict schema-compat gate refuses an upgrade that would narrow the schema (returns schema_incompatible_upgrade + details.breaks); pass force:true to apply anyway. Returns { pane_id, template_version, upgraded, breaks, compat }.
    Connector
  • Sets or clears the default idle content for a display. Idle content is shown whenever the display has no active live content. Provide html OR url to set idle content (mutually exclusive — url is wrapped in a full-page iframe document), or omit both to clear idle content. Provide content_description to make later state reads easier for agents. When the display is currently idle (no active live content), the new idle is pushed to the display immediately; otherwise it stays dormant until the live content ends. Requires admin scope.
    Connector
  • Pushes raw HTML to one display, replacing current content. Prefer send_url only when the user explicitly wants an external web page. Include a human-readable description so get_display_content can summarize intent without reading raw HTML. Before complex content, call get_display_capabilities to match the real browser/runtime. When no design system is supplied, use premium digital-signage quality: full-screen layout, strong hierarchy, refined typography, robust fallback data, and no action buttons unless touch is requested. Exactly one of html or base64_html is required. Requires content_only scope and display management access. Returns id, name, duration, file and version.
    Connector

Matching MCP Servers

  • A
    license
    -
    quality
    C
    maintenance
    Provides MCP tool adapters for Bioconductor methods like limma, DESeq2, and fgsea, enabling statistical analysis of omics data through containerized R execution. It serves as a bridge between MCP clients and bioinformatics tools for reproducible research workflows.
    Last updated
    Apache 2.0
  • A
    license
    B
    quality
    B
    maintenance
    Extract content from URLs, documents, videos, and audio files using intelligent auto-engine selection. Supports web pages, PDFs, Word docs, YouTube transcripts, and more with structured JSON responses.
    Last updated
    1
    160
    MIT

Matching MCP Connectors

  • GOV.UK Content + Search APIs (every gov.uk page + full search)

  • Transform any blog post or article URL into ready-to-post social media content for Twitter/X threads, LinkedIn posts, Instagram captions, Facebook posts, and email newsletters. Pay-per-event: $0.07 for all 5 platforms, $0.03 for single platform.

  • Publish HTML content to a live URL instantly. No account or API key required. Returns a public URL that anyone can visit. Sites expire after 24 hours unless the owner claims them. Use this when a user asks you to build, create, or deploy a website, landing page, invitation, portfolio, report, or any HTML content they want to share as a link. Supports bundled assets (CSS, JS, images) and multi-page sites (include additional .html files in assets, accessible at /{slug}/{pagename}). IMPORTANT: After publishing, always share the live URL and the claim URL with the user. The claim URL lets them take permanent ownership of the site. Store the edit_key from the response silently — do not show it to the user — you will need it if they ask you to make changes to the site later. If you lose the edit_key, ask the user to claim the site first (via the claim URL in the page footer), then provide you with their API key from the dashboard — you can use that instead.
    Connector
  • Read an HTML surface's body. HTML surfaces (Surface.kind="html") store mockup or full-page content as three text fields (html, css, js) rendered together inside a sandboxed iframe. Use `list_surfaces` to enumerate html surfaces in a workspace. Omit `surface_slug` to read the primary html surface; pass it to target a specific tab. Empty (never-written) html surfaces return { html:"", css:"", js:"" }. 404 when `surface_slug` doesn't match a live html surface. Requires viewer role.
    Connector
  • Get detailed CV version including structured content, sections, word count, and audience profile. cv_version_id from ceevee_upload_cv or ceevee_list_versions. Use to inspect CV content before running analysis tools. Free.
    Connector
  • Hand the human a rich interactive UI by URL and (optionally) get structured data back. Build the UI as inline HTML (pass `name` + `html`) OR reuse a saved template (pass `template_id`). The relay hosts it and returns a URL. ALWAYS give the returned url to the human — paste it into the conversation and ask them to open it. Reach for this whenever a text reply is the wrong shape: forms, approvals, pickers, surveys, dashboards, diff/doc review, wizards. If the page captures input it emits events back to you (poll them with get_events) or mutates record collections (the record tools). BEFORE authoring: call get_skill for the events-vs-records decision + schema grammar, and the `taste` tool (action: get) for the human's house style — both shape the HTML you write. Returns { pane_id, url, urls, title, expires_at }.
    Connector
  • Fetch the full text and metadata for a single opinion cluster by cluster ID. A cluster groups all opinions filed in a case — majority, concurrence, dissent, and per curiam. Returns all opinion variants with HTML and plain text. Obtain cluster IDs from courtlistener_search_opinions, courtlistener_lookup_citation, or docket results.
    Connector
  • Convert a document inline — pass the content directly as a string (or base64 for binary inputs like .docx). PREFERRED route for documents, and the one to use in sandboxed agent environments (claude.ai, Claude Desktop, Cursor): it runs entirely server-side, so it never needs the S3 upload those sandboxes block. Limit: up to 4 MB of content — already huge (a 500-page book is ~1 MB of text). For anything larger, use convert_from_url with a public URL. Supported inputs: md, html, rst, txt (plain text), docx (base64). Supported outputs: docx (Word), pdf, html, txt, md, rst, xlsx. Returns a job_id — poll get_job_status until 'complete', then get_output_content (inline bytes, sandbox-safe) or get_download_url (S3 link). Flat fee $0.05 per file. TIP: if you have shell access and are NOT sandboxed (e.g. a local coding agent), the `botverse` CLI (`npx botverse convert <file> --to <fmt>`) is faster for local files — it streams from disk instead of re-emitting the content through the model.
    Connector
  • Fetch one or more URLs and return their content as clean markdown. Use this to read articles, documentation, blog posts, or any page where you need the complete text, not just a snippet from search. Also supports PDF, DOCX, and other document formats. Costs 1 credit per URL. Max 10 URLs per request. Failed URLs are not charged. Set include_raw_html=true to also get the raw HTML source in each result. Useful for inspecting embedded URLs, data attributes, iframes, or script tags that are stripped during markdown conversion. Returns null for non-HTML content (PDF, DOCX, etc.). Same cost. Returns: results (array of {title, url, content, raw_html, published_time, success, error}), credits_used, credits_remaining. Args: urls: List of URLs to fetch (max 10) include_raw_html: Include raw HTML source in each result (default false)
    Connector
  • Download a synthetic HTML sales report for a given period. Period logic: omit all date fields to get yesterday's report; provide y only for a full-year report; y + m for a full-month report; y + m + d for a specific day. Returns an HTML summary including total revenue, number of orders, breakdown by department, VAT summary, and payment methods.
    Connector
  • Read the full content of an existing asset so you can inspect it before patching. Text assets (js, css, json, svg, txt, html) are returned as UTF-8 strings in `content` with `encoding: "utf-8"`. Binary assets (images, fonts, pdf) are returned as base64 in `content` with `encoding: "base64"`, but only if size ≤ 1MB; above that the content is omitted and you should use the `url` field to download directly. Always returns `version_hash` (first 8 hex chars of the SHA-256 of the content) for optimistic concurrency in patch_asset.
    Connector
  • Fetch the full, untruncated definition from DDO (Den Danske Ordbog) for a synset. This tool addresses the issue that DanNet synset definitions (:skos/definition) may be capped at a certain length. It retrieves the complete definition from the authoritative DDO source by following sense source URLs. WORKFLOW: 1. Get synset information to find associated senses 2. Extract DDO source URLs from sense data (dns:source) 3. Fetch DDO HTML pages and parse for definitions 4. Find elements with class "definitionBox selected" and extract span.definition content IMPORTANT NOTES: - Looks for CSS classes "definitionBox selected" and child span.definition - DDO and DanNet have diverged over time, so source URLs may not always work - This implementation uses httpx for web requests and regex-based HTML parsing Args: synset_id: Synset identifier (e.g., "synset-1876" or just "1876") Returns: Dict containing: - synset_id: The queried synset ID - ddo_definitions: List of definitions found from DDO pages - source_urls: List of DDO URLs that were attempted - success_urls: List of URLs that successfully returned definitions - errors: List of any errors encountered - truncated_definition: The original DanNet definition for comparison Example: result = fetch_ddo_definition("synset-3047") # Check result['ddo_definitions'] for full DDO definitions # Compare with result['truncated_definition'] from DanNet
    Connector
  • Fetches any public web page and returns clean, readable plain text stripped of HTML, navigation, scripts, advertisements, and boilerplate. Returns the page title, meta description, word count, and main body text ready for analysis or summarisation. Use this tool when an agent needs to read the content of a specific web page or article URL — for example to summarise an article, extract facts from a page, verify a claim by reading the source, or convert a web page into plain text to pass to another tool. Pass article URLs returned by web_news_headlines to this tool to read full article content. Do not use this tool to discover current news headlines — use web_news_headlines instead. Does not execute JavaScript — best suited for standard HTML content pages. Will not work with paywalled, login-protected, or JavaScript-rendered single-page applications.
    Connector
  • Fetches any public web page and returns clean, readable plain text stripped of HTML, navigation, scripts, advertisements, and boilerplate. Returns the page title, meta description, word count, and main body text ready for analysis or summarisation. Use this tool when an agent needs to read the content of a specific web page or article URL — for example to summarise an article, extract facts from a page, verify a claim by reading the source, or convert a web page into plain text to pass to another tool. Pass article URLs returned by web_news_headlines to this tool to read full article content. Do not use this tool to discover current news headlines — use web_news_headlines instead. Does not execute JavaScript — best suited for standard HTML content pages. Will not work with paywalled, login-protected, or JavaScript-rendered single-page applications.
    Connector
  • Fetches up to 32KB of the domain's HTML and response headers from the edge, then fingerprints the content for known CMS platforms, JavaScript frameworks, CDN providers, and analytics tools. Detection is based on meta generator tags, script src patterns, response headers, and cookie names. Use this tool when: - You need to know what CMS (WordPress, Drupal, Shopify) a site runs. - You are assessing a domain's infrastructure before a security review. - You want to identify analytics or marketing tools a site embeds. Do NOT use this tool when: - You want HTTP headers and security posture — use `intel_http` instead. - You want tracker database classification — use `get_domain` instead. - You need robots.txt AI policy — use `intel_robots` instead. Inputs: - `domain` (query, required): Domain to fingerprint. Returns: - `cms`: detected content management system, or null. - `frameworks`: JavaScript/backend frameworks detected. - `cdn`: CDN provider detected, or null. - `analytics`: analytics and tracking tools detected. - `meta_generators`: raw meta generator tag values. Cost: - Free. No API key required. Latency: - Typical: 2-4s (HTML fetch), p99: 7s.
    Connector
  • Create a new surface (tab) inside a workspace. `kind` picks `table`, `doc`, or `html`. Optional `slug` (lowercase kebab-case, 3-64 chars); when omitted the server slugifies `name` and appends a numeric suffix on collision. Optional `columns` overrides the default Title/Status/Notes triple for `table` kinds; ignored for `doc` and `html`. `html` surfaces start with an empty body — write content via `update_html`. Editor role required. Emits `surface.created` so live listeners on the workspace stream see the new tab without a refetch.
    Connector