Skip to main content
Glama
256,679 tools. Last updated 2026-07-04 10:57

"Web Page Content" matching MCP tools:

  • Fetch a specific filing's metadata and document content by accession number. Returns the primary document as readable text. Use offset/next_offset for multi-page access to large filings (10-K, S-1 can exceed 1M chars): pass the next_offset from a truncated response to read the next page. Use section to jump directly to a heading (e.g. 'risk factors', 'item 7') without needing an offset.
    Connector
  • Core dossier check: Snapshot a domain's public web surface: robots.txt, sitemap.xml, and the home-page <head> metadata (title, description, OpenGraph, Twitter cards). Use for SEO audits, content discovery, or verifying metadata before sharing; for HTTP headers use dossier_headers, for redirect behavior use dossier_redirects. Fetches /, /robots.txt, and /sitemap.xml concurrently via HTTPS, 5 s each; parses <head> with a lightweight HTML parser. Returns a composite CheckResult: {status:"ok", meta:{title, description, og, twitter}, robots, sitemapPresent} or {status:"error", reason}.
    Connector
  • Search the web for current information on any topic. Returns extracted page content, not just snippets. Best for factual lookups, specific questions, or when you need a list of sources. For open-ended questions that need synthesis across many sources, use the research tool instead. For news queries (current events, breaking news, politics, world events), set topic="news" to search news sources specifically. This returns recent articles with publication dates. Set include_answer=true to get an AI-synthesized answer alongside results (adds 5 credits). This is the sweet spot for most agent tasks, e.g. basic + include_answer = 8 credits, much cheaper than a full 25-credit research call. Returns: query, answer (if requested), results (array of {title, url, content, description, fetched, published_date}), search_depth, topic, elapsed_ms, credits_used, credits_remaining, altered_query. Args: query: The search query search_depth: "basic" (default) for extracted page content (3 credits), "snippets" for SERP snippets only without page fetching (1 credit) max_results: Number of results (default 10, max 20) include_answer: Generate an AI answer that synthesizes the search results (adds 5 credits) include_domains: Only include results from these domains (max 10) exclude_domains: Exclude results from these domains (max 10) topic: "general" for web search, "news" for news articles. use "news" for current events, breaking news, politics, or any time-sensitive query freshness: Filter by recency - "day", "week", "month", "year", or "YYYY-MM-DD:YYYY-MM-DD"
    Connector
  • Upload and normalize a FINISHED, ready-to-mail document to PDF. Choose this when the content is final and IDENTICAL for every recipient — including when you mail the same letter to many people (just quote/pay once per recipient with the same documentId). The exact bytes you give are what gets printed. Use create_template instead only when the content must vary per recipient via {{fields}}. Returns a documentId, the stored page count, byte size, and source format. Free; no payment required. Provide the document EXACTLY ONE way: `content` (inline text, for html/markdown/text), `contentBase64` (base64-encoded binary, for pdf/docx/image), or `url` (a publicly reachable URL the server fetches). Supplying none, or more than one, is an error. Maximum upload size is 31457280 bytes (~30 MB); output page size is US Letter. Any `{{...}}` text is printed LITERALLY here — it is NOT treated as a merge field. If you want personalized mail merge across recipients, use `create_template` instead. Reserved address zone: a recipient address block is printed over the top ~3 inches of page 1, so the server reserves that space for you automatically. For text/html/markdown/docx, page-1 content is pushed below the block (content may therefore flow onto an additional page); for pdf and image inputs, a blank first page is prepended. As a result the returned page count — and the selected-provider cost behind the resulting quote — can be higher than your source document (e.g. a single-page PDF is stored as 2 pages). You do NOT need to leave the top of your document blank yourself. See the postagent://formats resource for per-format details.
    Connector
  • Fetch the full markdown content of a single UploadKit docs page by its path, formatted with title, description, source URL, and the body. When to use: after search_docs identifies a relevant page and you need its full contents to answer a deep question — prefer search_docs first, then get_doc on the top result. Reading the full page avoids relying on snippets that may omit critical context (callbacks, env vars, edge cases). Returns: a plain-text string — "# {title}\n\n> {description}\n\nSource: {url}\n\n---\n\n{content}". If the path is unknown, returns a not-found message suggesting list_docs. Read-only, idempotent.
    Connector
  • File upload: streaming (one-shot stream-upload — DEFAULT for unknown/generated content), chunked (create-session → POST /blob → chunk → finalize — only when filesize is known exactly), web URL import, and batch (multi-small-file). Call action='describe' for the full action/param reference. Side effects: finalize/stream/stream-upload/web-import/batch create files and consume storage credits. Same-name uploads to a folder OVERWRITE the existing node in place (preserved as a recoverable version). BINARY: `content` is text-only (writes verbatim UTF-8); for binary use `content_base64` (server-decoded) or POST /blob + `blob_id`. UPLOAD STRATEGY (read top-to-bottom, pick the FIRST that matches): (1) Have a URL? → `web-import` (single call). (2) Have content but DON'T know exact size, OR generating/transforming content first? → `stream-upload` (single call, auto-finalizes, NO filesize required, size auto-detected from the bytes). (3) Have a file with KNOWN exact byte count? → `create-session` + `chunk`(s) + `finalize`. **filesize must match the bytes you actually upload — mismatch causes finalize to fail with code 10522 and you must cancel the session.** (4) Multiple small files (≤4 MB each, ≤200 total) into one folder? → `batch`. DEFAULT to `stream-upload` unless you are sure of the exact byte count. Do NOT guess `filesize` for generated content — use `stream-upload` instead. max_size is a hard ceiling that aborts mid-transfer — always overestimate or omit (server uses plan limit).
    Connector

Matching MCP Servers

  • A
    license
    A
    quality
    B
    maintenance
    Enables web search and web fetch operations using Ollama's hosted APIs, allowing MCP clients to search the web and retrieve page content.
    Last updated
    2
    MIT

Matching MCP Connectors

  • Create, edit, preview, publish, and manage web pages from MCP-capable AI clients.

  • Web Content Extract Mcp connects AI agents to real public APIs via MCP. Tools include

  • Sets or clears the default idle content for a display. Idle content is shown whenever the display has no active live content. Provide html OR url to set idle content (mutually exclusive — url is wrapped in a full-page iframe document), or omit both to clear idle content. Provide content_description to make later state reads easier for agents. When the display is currently idle (no active live content), the new idle is pushed to the display immediately; otherwise it stays dormant until the live content ends. Requires admin scope.
    Connector
  • Pushes raw HTML to one display, replacing current content. Prefer send_url only when the user explicitly wants an external web page. Include a human-readable description so get_display_content can summarize intent without reading raw HTML. Before complex content, call get_display_capabilities to match the real browser/runtime. When no design system is supplied, use premium digital-signage quality: full-screen layout, strong hierarchy, refined typography, robust fallback data, and no action buttons unless touch is requested. Exactly one of html or base64_html is required. Requires content_only scope and display management access. Returns id, name, duration, file and version.
    Connector
  • Create a hosted link-in-bio page draft from a style preset. Provide title, displayName, and a preset (or 'auto' to infer from businessType/style). The page starts with empty placeholder blocks for you to fill in via block.update — do not invent content. This tool intentionally creates a draft only and does not return a publish action. After calling it, stop and show the preview URL. Do not call page.publish in the same turn unless the user's current message explicitly asks to publish, make the page live, or get a public share link. Anonymous demo pages expire unless claimed.
    Connector
  • Upload and normalize a FINISHED, ready-to-mail document to PDF. Choose this when the content is final and IDENTICAL for every recipient — including when you mail the same letter to many people (just quote/pay once per recipient with the same documentId). The exact bytes you give are what gets printed. Use create_template instead only when the content must vary per recipient via {{fields}}. Returns a documentId, the stored page count, byte size, and source format. Free; no payment required. Provide the document EXACTLY ONE way: `content` (inline text, for html/markdown/text), `contentBase64` (base64-encoded binary, for pdf/docx/image), or `url` (a publicly reachable URL the server fetches). Supplying none, or more than one, is an error. Maximum upload size is 31457280 bytes (~30 MB); output page size is US Letter. Any `{{...}}` text is printed LITERALLY here — it is NOT treated as a merge field. If you want personalized mail merge across recipients, use `create_template` instead. Reserved address zone: a recipient address block is printed over the top ~3 inches of page 1, so the server reserves that space for you automatically. For text/html/markdown/docx, page-1 content is pushed below the block (content may therefore flow onto an additional page); for pdf and image inputs, a blank first page is prepended. As a result the returned page count — and the selected-provider cost behind the resulting quote — can be higher than your source document (e.g. a single-page PDF is stored as 2 pages). You do NOT need to leave the top of your document blank yourself. See the postagent://formats resource for per-format details.
    Connector
  • Upload and normalize a FINISHED, ready-to-mail document to PDF. Choose this when the content is final and IDENTICAL for every recipient — including when you mail the same letter to many people (just quote/pay once per recipient with the same documentId). The exact bytes you give are what gets printed. Use create_template instead only when the content must vary per recipient via {{fields}}. Returns a documentId, the stored page count, byte size, and source format. Free; no payment required. Provide the document EXACTLY ONE way: `content` (inline text, for html/markdown/text), `contentBase64` (base64-encoded binary, for pdf/docx/image), or `url` (a publicly reachable URL the server fetches). Supplying none, or more than one, is an error. Maximum upload size is 31457280 bytes (~30 MB); output page size is US Letter. Any `{{...}}` text is printed LITERALLY here — it is NOT treated as a merge field. If you want personalized mail merge across recipients, use `create_template` instead. Reserved address zone: a recipient address block is printed over the top ~3 inches of page 1, so the server reserves that space for you automatically. For text/html/markdown/docx, page-1 content is pushed below the block (content may therefore flow onto an additional page); for pdf and image inputs, a blank first page is prepended. As a result the returned page count — and the selected-provider cost behind the resulting quote — can be higher than your source document (e.g. a single-page PDF is stored as 2 pages). You do NOT need to leave the top of your document blank yourself. See the postagent://formats resource for per-format details.
    Connector
  • Extract structured information from web pages using LLM capabilities. Supports both cloud AI and self-hosted LLM extraction. **Best for:** Extracting specific structured data like prices, names, details from web pages. **Not recommended for:** When you need the full content of a page (use scrape); when you're not looking for specific structured data. **Arguments:** - urls: Array of URLs to extract information from - prompt: Custom prompt for the LLM extraction - schema: JSON schema for structured data extraction - allowExternalLinks: Allow extraction from external links - enableWebSearch: Enable web search for additional context - includeSubdomains: Include subdomains in extraction **Prompt Example:** "Extract the product name, price, and description from these product pages." **Usage Example:** ```json { "name": "firecrawl_extract", "arguments": { "urls": ["https://example.com/page1", "https://example.com/page2"], "prompt": "Extract product information including name, price, and description", "schema": { "type": "object", "properties": { "name": { "type": "string" }, "price": { "type": "number" }, "description": { "type": "string" } }, "required": ["name", "price"] }, "allowExternalLinks": false, "enableWebSearch": false, "includeSubdomains": false } } ``` **Returns:** Extracted structured data as defined by your schema.
    Connector
  • Scrape content from a single URL with advanced options. This is the most powerful, fastest and most reliable scraper tool, if available you should always default to using this tool for any web scraping needs. **Best for:** Single page content extraction, when you know exactly which page contains the information. **Not recommended for:** Multiple pages (call scrape multiple times or use crawl), unknown page location (use search). **Common mistakes:** Using markdown format when extracting specific data points (use JSON instead). **Other Features:** Use 'branding' format to extract brand identity (colors, fonts, typography, spacing, UI components) for design analysis or style replication. **CRITICAL - Format Selection (you MUST follow this):** When the user asks for SPECIFIC data points, you MUST use JSON format with a schema. Only use markdown when the user needs the ENTIRE page content. **Use JSON format when user asks for:** - Parameters, fields, or specifications (e.g., "get the header parameters", "what are the required fields") - Prices, numbers, or structured data (e.g., "extract the pricing", "get the product details") - API details, endpoints, or technical specs (e.g., "find the authentication endpoint") - Lists of items or properties (e.g., "list the features", "get all the options") - Any specific piece of information from a page **Use markdown format ONLY when:** - User wants to read/summarize an entire article or blog post - User needs to see all content on a page without specific extraction - User explicitly asks for the full page content **Handling JavaScript-rendered pages (SPAs):** If JSON extraction returns empty, minimal, or just navigation content, the page is likely JavaScript-rendered or the content is on a different URL. Try these steps IN ORDER: 1. **Add waitFor parameter:** Set `waitFor: 5000` to `waitFor: 10000` to allow JavaScript to render before extraction 2. **Try a different URL:** If the URL has a hash fragment (#section), try the base URL or look for a direct page URL 3. **Use firecrawl_map to find the correct page:** Large documentation sites or SPAs often spread content across multiple URLs. Use `firecrawl_map` with a `search` parameter to discover the specific page containing your target content, then scrape that URL directly. Example: If scraping "https://docs.example.com/reference" fails to find webhook parameters, use `firecrawl_map` with `{"url": "https://docs.example.com/reference", "search": "webhook"}` to find URLs like "/reference/webhook-events", then scrape that specific page. 4. **Use firecrawl_agent:** As a last resort for heavily dynamic pages where map+scrape still fails, use the agent which can autonomously navigate and research **Usage Example (JSON format - REQUIRED for specific data extraction):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com/api-docs", "formats": ["json"], "jsonOptions": { "prompt": "Extract the header parameters for the authentication endpoint", "schema": { "type": "object", "properties": { "parameters": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "type": { "type": "string" }, "required": { "type": "boolean" }, "description": { "type": "string" } } } } } } } } } ``` **Prefer markdown format by default.** You can read and reason over the full page content directly — no need for an intermediate query step. Use markdown for questions about page content, factual lookups, and any task where you need to understand the page. **Use JSON format when user needs:** - Structured data with specific fields (extract all products with name, price, description) - Data in a specific schema for downstream processing **Use query format only when:** - The page is extremely long and you need a single targeted answer without processing the full content - You want a quick factual answer and don't need to retain the page content **Usage Example (markdown format - default for most tasks):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com/article", "formats": ["markdown"], "onlyMainContent": true } } ``` **Usage Example (branding format - extract brand identity):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com", "formats": ["branding"] } } ``` **Branding format:** Extracts comprehensive brand identity (colors, fonts, typography, spacing, logo, UI components) for design analysis or style replication. **Performance:** Add maxAge parameter for 500% faster scrapes using cached data. **Returns:** JSON structured data, markdown, branding profile, or other formats as specified. **Safe Mode:** Read-only content extraction. Interactive actions (click, write, executeJavascript) are disabled for security.
    Connector
  • Fetches the complete markdown content of an Apollo documentation page using its slug, or everything after https://apollographql.com/docs. Documentation slugs can be obtained from the SearchDocs tool results. Use this after ApolloDocsSearch to read full pages rather than just excerpts. Content will be given in chunks with the totalCount field specifying the total number of chunks. Start with a chunkIndex of 0 and fetch each chunk.
    Connector
  • Clone a public web page into a hosted site. Fetches the URL, walks its same-origin assets (CSS, JS, images, fonts), rewrites references to local paths, and uploads everything as a working hosted copy in one shot. ========================================================================== USE THIS WHEN THE USER SAYS ========================================================================== - "clone this site / page / website" - "copy this site / page" - "mirror this site" - "duplicate this page" - "save this website" - "make me a version of <URL>" - "I want this page on my own domain" - "rip this page", "fork this site", "backup this site" If a user pastes a URL and wants their own copy of what's there — this is the tool. The agent should not try to recreate the page from memory or by describing what it sees: that is slow, lossy, and burns your context window for no benefit. `clone_site` produces a byte-accurate copy in seconds and leaves your context free for the iteration the user actually wants (rewriting copy, swapping images, restyling, etc.). ========================================================================== WHAT IT DOES ========================================================================== Default behavior is to crawl assets so the cloned page actually renders. Set `crawlAssets: false` to save only the single HTML response without following any assets — useful when you only want the markup. Only http:// and https:// URLs are allowed. Private, loopback, and cloud-metadata addresses are refused. Per-asset cap 10MB; per-clone caps 50 files and 50MB total. Cross-origin asset URLs are kept as-is (not fetched) so external CDN references still resolve. If the user wants a polished, researched site (logo, original copy, SEO, mobile-ready, multi-page) rather than a clone of someone else's page, send them to https://webzum.com for a free preview.
    Connector
  • Crawl a domain and scrape multiple pages using Firecrawl. Returns array of scraped pages with markdown content. Best for site mapping, content audits, or bulk research. Requires Authorization: Bearer <api_key>. Pricing: $0.25 standard, $0.12 lite per page crawled (up to 100 pages per request). Use iliad_web_research for single-page scrapes.
    Connector
  • Fetch a webpage and extract specific information using AI. Use this when you need structured data from a page (e.g. pricing, specs, contact info) rather than the raw content. Costs 5 credits. If the page has no usable text (empty or JavaScript-rendered body), the model is NOT called: content comes back empty and usage.low_content is true, rather than a fabricated answer. Gate on usage.low_content (or usage.content_chars) to detect pages you cannot ground on. Returns: content (the extracted text), url, credits_used, credits_remaining, usage (input_tokens, output_tokens, content_chars, low_content). Args: url: The URL to extract from prompt: What information to extract (e.g. "list all pricing tiers with features" or "extract the author name and publication date")
    Connector
  • RECOMMENDED first step for building a page. Returns the few questions worth asking (business name, primary visitor action, key content, optional source link) so you can then call page.create_from_brief. This is the simplest, most reliable path — prefer it. (page.onboarding.* offers extra category/layout/palette pickers but is a stateless planning helper, not required.) Call once per creation request; skip if the user already gave you a source URL. Does not create a page.
    Connector
  • Use this when the user asks to read, extract, get the text/content/article of, or summarize a webpage/URL. Do NOT use for a visual screenshot (use rendex_screenshot). Extracts clean reader-mode content from any webpage as Markdown, JSON, or HTML. Runs the same Chromium render pass as a screenshot, so it captures content after JavaScript runs — handles SPAs that fetch-only readers miss. Strips nav, ads, and boilerplate, returning the article body plus title, byline, and excerpt. Great for feeding page content to an LLM, summarization, or RAG ingestion. Costs 1 render credit per call.
    Connector
  • Read a web page the way `fetch` can't: render the REAL (JavaScript/SPA) page in a headless browser and return clean readability markdown. Free. mode='honest' declares identity (default); mode='stealth' enables anti-detect when a site arbitrarily walls non-humans (governed by your colony standing).
    Connector