Skip to main content
Glama
213,155 tools. Last updated 2026-06-19 12:06

"Methods to Scrape a Website and Fetch Markdown Files from GitHub" matching MCP tools:

  • Fetch observations from an ABS dataflow. dataKey is a dot-separated SDMX filter with one position per dimension (order from dataflow_structure); each position is a code, "+"-joined codes, or empty for wildcard. Pass "all" to fetch everything (can be large). Returns decoded series with their dimension labels and time-indexed values. Fetch dataflow_structure first to learn the dimension order and valid codes.
    Connector
  • Convert HTML or Markdown to a pixel-perfect PDF. Returns JSON: { url } — a temporary download URL (valid ~1 hour). Great for generating invoices, reports, receipts, or formatted documents programmatically. Supports full HTML/CSS including tables, images (base64 or URL), and inline styles. For Markdown input, set format='markdown'. 50 sats per conversion. Use convert_file instead for converting existing files between formats (e.g., DOCX→PDF). Pay per request with Bitcoin Lightning — no API key or signup needed. Requires create_payment with toolName='convert_html_to_pdf'.
    Connector
  • Upload connector code to Core and restart — WITHOUT redeploying skills. MERGES with the GitHub state at `ref` by default (default ref: 'dev'). Sending a partial file set ONLY overlays those files — the rest of the connector is preserved from GitHub. To fully replace the connector dir (historical behavior), pass replace:true. Modes: • github:true (no files) — deploy the GitHub state at `ref` as-is. • github:true + files:[] — GitHub state at `ref` as BASE, your files overlay on top (incoming wins). • files:[] (no github) — default MERGE with GitHub state at `ref`. Refuses if no GitHub base exists (no silent nuke). • files:[] + replace:true — full replace. Wipes connector dir + writes only the provided files. Use deliberately. Common traps this design prevents: • Pre-fix bug (2026-06-06): sending just ui-dist HTML wiped server.js + node_modules — connector broke until a full re-upload. Now: those files merge with the GitHub base. • Pre-fix bug: github:true silently read from `main` even when patches were on `dev`. Now: defaults to dev; pass ref:'main' to opt into the legacy path.
    Connector
  • Submit a competitor analysis job. Analyzes a competitor's website across 15+ data sources (SEO, traffic, social, Product Hunt, GitHub, Wayback Machine history, AI-generated insights, etc.) and returns a job_id. Use get_report_status(job_id) to poll and get_report(job_id) to retrieve results when status='completed'. Typical analysis takes 2-5 minutes. Requires authentication (deducts 1 credit from your Analook balance). Args: url: Competitor website URL (e.g. 'https://linear.app' or 'lovable.dev') product_name: Optional product name override (defaults to domain) Returns: {job_id: str, status: 'started', poll_url: str} on success {error: str, hint?: str} on auth/validation failure
    Connector
  • Get a human's FULL profile including contact info (email, Telegram, Signal), crypto wallets, fiat payment methods (PayPal, Venmo, etc.), and social links. Requires agent_key from register_agent. Rate limited: PRO = 50/day. Alternative: $0.05 via x402. Use this before create_job_offer to see how to pay the human. The human_id comes from search_humans results.
    Connector
  • Fetch a URL with full reliability — retry, circuit breaker, cache, and anti-bot bypass. Returns both raw HTML and clean markdown. Automatically retries on failure with exponential backoff, falls back to plain HTTP if browser fetch fails, and circuit-breaks domains that are consistently down. Args: url: The URL to fetch use_cache: Whether to use cached results (default: true, TTL 1 hour) js_render: Whether to render JavaScript (default: true, disable for speed) wait_for: CSS selector to wait for before capturing (e.g., '.results-loaded')
    Connector

Matching MCP Servers

Matching MCP Connectors

  • Provides a platform-agnostic specification of the technical features every decent website should have

  • GitHub MCP — wraps the GitHub public REST API (no auth required for public endpoints)

  • Fetch public business page information from Facebook. Returns page details including name, category, address, phone, website, ratings, reviews, followers, and cover/profile photos. Provide exactly one of page_id, username, or url — prefer url when the user pasted any Facebook link (including mobile share links), since the tool resolves the canonical page automatically.
    Connector
  • Fetch a single file from a template version's file tree by path. ``identifier`` accepts UUID, ``@handle/slug``, or ``@handle/slug@vN``. ``path`` must match a row in the template's file manifest (use ``get_template`` to see the manifest). ``SKILL.md`` returns the template body. Text files return the decoded string in ``content``. Small binary files return base64-encoded bytes in ``content_base64``. Binary files over the size limit return metadata and an ``error`` field with no inline bytes. Non-owner responses carry the safety banner and ``safety_verification_status``. Anonymous callers may fetch files from live (published) template versions only; the liveness check is evaluated per read.
    Connector
  • Build a complete creative intelligence profile from internal brand documents — creative briefs, brand guidelines, product specs, customer research, competitive analysis. Takes any mix of file_ids (from a previous upload), document_urls (public PDF/DOCX/TXT/MD links, up to 10), or documents_inline (base64-encoded files with filename), plus an optional context_url for layering live brand context (colors, fonts, current messaging) and optional idempotency_key. Returns a job_id; poll with get_powersource. Output shape is identical to create_powersource_url: identity, offer, selling points, voice, buyer profile, tensions, angles, emotional arcs, ctas, narrative. Use this when the user says "I have a brief", "here's my brand guidelines", "use this document", drops a PDF / DOCX / strategy deck, or when the truth lives in internal materials rather than the public website. The pipeline reads text only — convert PDFs to markdown before submitting via documents_inline when possible. Costs 100 credits. Do NOT use for URL-only scans — use create_powersource_url. For URL + docs combined (highest fidelity, triangulates public messaging against internal strategy), use create_powersource_full.
    Connector
  • Fetch a single agency's full profile from Pick an Agency by its slug (the last path segment of its profile URL), including description, location, rating, services, website and a few recent client reviews. WHEN TO USE: after search_agencies or match_agencies returned a result the user wants to know more about, or when the user names a specific agency whose slug you already know. Don't guess slugs — find them via search_agencies first.
    Connector
  • Find working SOURCE CODE examples from 37 indexed Senzing GitHub repositories. REQUIRED: either `query` (string, for search) or `repo` with `file_path` or `list_files=true` — the call WILL FAIL without one. Three modes: (1) Search: pass `query` to find examples across all repos, (2) File listing: pass `repo` + `list_files=true`, (3) File retrieval: pass `repo` + `file_path`. Indexes source code (.py, .java, .cs, .rs) and READMEs — NOT build/data files. For sample data, use get_sample_data. Covers Python, Java, C#, Rust SDK patterns: initialization, ingestion, search, redo, configuration, message queues, REST APIs. Use max_lines to limit large files. Returns GitHub raw URLs for file retrieval.
    Connector
  • Scrape content from a single URL with advanced options. This is the most powerful, fastest and most reliable scraper tool, if available you should always default to using this tool for any web scraping needs. **Best for:** Single page content extraction, when you know exactly which page contains the information. **Not recommended for:** Multiple pages (call scrape multiple times or use crawl), unknown page location (use search). **Common mistakes:** Using markdown format when extracting specific data points (use JSON instead). **Other Features:** Use 'branding' format to extract brand identity (colors, fonts, typography, spacing, UI components) for design analysis or style replication. **CRITICAL - Format Selection (you MUST follow this):** When the user asks for SPECIFIC data points, you MUST use JSON format with a schema. Only use markdown when the user needs the ENTIRE page content. **Use JSON format when user asks for:** - Parameters, fields, or specifications (e.g., "get the header parameters", "what are the required fields") - Prices, numbers, or structured data (e.g., "extract the pricing", "get the product details") - API details, endpoints, or technical specs (e.g., "find the authentication endpoint") - Lists of items or properties (e.g., "list the features", "get all the options") - Any specific piece of information from a page **Use markdown format ONLY when:** - User wants to read/summarize an entire article or blog post - User needs to see all content on a page without specific extraction - User explicitly asks for the full page content **Handling JavaScript-rendered pages (SPAs):** If JSON extraction returns empty, minimal, or just navigation content, the page is likely JavaScript-rendered or the content is on a different URL. Try these steps IN ORDER: 1. **Add waitFor parameter:** Set `waitFor: 5000` to `waitFor: 10000` to allow JavaScript to render before extraction 2. **Try a different URL:** If the URL has a hash fragment (#section), try the base URL or look for a direct page URL 3. **Use firecrawl_map to find the correct page:** Large documentation sites or SPAs often spread content across multiple URLs. Use `firecrawl_map` with a `search` parameter to discover the specific page containing your target content, then scrape that URL directly. Example: If scraping "https://docs.example.com/reference" fails to find webhook parameters, use `firecrawl_map` with `{"url": "https://docs.example.com/reference", "search": "webhook"}` to find URLs like "/reference/webhook-events", then scrape that specific page. 4. **Use firecrawl_agent:** As a last resort for heavily dynamic pages where map+scrape still fails, use the agent which can autonomously navigate and research **Usage Example (JSON format - REQUIRED for specific data extraction):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com/api-docs", "formats": ["json"], "jsonOptions": { "prompt": "Extract the header parameters for the authentication endpoint", "schema": { "type": "object", "properties": { "parameters": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "type": { "type": "string" }, "required": { "type": "boolean" }, "description": { "type": "string" } } } } } } } } } ``` **Prefer markdown format by default.** You can read and reason over the full page content directly — no need for an intermediate query step. Use markdown for questions about page content, factual lookups, and any task where you need to understand the page. **Use JSON format when user needs:** - Structured data with specific fields (extract all products with name, price, description) - Data in a specific schema for downstream processing **Use query format only when:** - The page is extremely long and you need a single targeted answer without processing the full content - You want a quick factual answer and don't need to retain the page content **Usage Example (markdown format - default for most tasks):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com/article", "formats": ["markdown"], "onlyMainContent": true } } ``` **Usage Example (branding format - extract brand identity):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com", "formats": ["branding"] } } ``` **Branding format:** Extracts comprehensive brand identity (colors, fonts, typography, spacing, logo, UI components) for design analysis or style replication. **Performance:** Add maxAge parameter for 500% faster scrapes using cached data. **Returns:** JSON structured data, markdown, branding profile, or other formats as specified. **Safe Mode:** Read-only content extraction. Interactive actions (click, write, executeJavascript) are disabled for security.
    Connector
  • Add a document to a deal's data room. Creates the deal if needed. This is the primary way to get documents into Sieve for screening. Upload a pitch deck, financials, or any document -- then call sieve_screen to analyze everything in the data room. Provide company_name to create a new deal (or find existing), or deal_id to add to an existing deal. Provide exactly one content source: file_path (local file), text (raw text/markdown), or url (fetch from URL). Args: title: Document title (e.g. "Pitch Deck Q1 2026"). company_name: Company name -- creates deal if new, finds existing if not. deal_id: Add to an existing deal (from sieve_deals or previous sieve_dataroom_add). website_url: Company website URL (used when creating a new deal). document_type: Type: 'pitch_deck', 'financials', 'legal', or 'other'. file_path: Path to a local file (PDF, DOCX, XLSX). The tool reads and uploads it. text: Raw text or markdown content (alternative to file). url: URL to fetch document from (alternative to file).
    Connector
  • Get Kifly's website and support contact email. Call this if you are stuck, hit an unresolvable error, or the buyer asks how to reach a human. Returns the website URL and support email — always share both with the buyer.
    Connector
  • Fetch and convert a Microsoft Learn documentation webpage to markdown format. This tool retrieves the latest complete content of Microsoft documentation webpages including Azure, .NET, Microsoft 365, and other Microsoft technologies. ## When to Use This Tool - When search results provide incomplete information or truncated content - When you need complete step-by-step procedures or tutorials - When you need troubleshooting sections, prerequisites, or detailed explanations - When search results reference a specific page that seems highly relevant - For comprehensive guides that require full context ## Usage Pattern Use this tool AFTER microsoft_docs_search when you identify specific high-value pages that need complete content. The search tool gives you an overview; this tool gives you the complete picture. ## URL Requirements - The URL must be a valid HTML documentation webpage from the microsoft.com domain - Binary files (PDF, DOCX, images, etc.) are not supported ## Output Format markdown with headings, code blocks, tables, and links preserved.
    Connector
  • Fetch a file from a public URL and attach it to one of your personal notes (personal notes only; for team or shared notes use files-create_upload_url). Follows one redirect. Required: note_id (integer), url (string). Optional: filename (default: derived from URL), content_type (default: from HTTP response), description.
    Connector
  • Fetch a complete, self-contained test specification as Markdown: full item list, response scale, scoring algorithm, and the mapping from result to tuning slug. Administer the items to the user inline (bulk-paste is fine), score per the algorithm, then call get_tuning. Tests: mbti (OEJTS, 32 items, ~5 min), enneagram (OEPS, 36, ~5 min), disc (ODAT, 16, ~3 min), attachment (ECR-R, 36, ~5 min), big-five (IPIP-50, 50, ~7 min → maps to ocean files).
    Connector
  • Create a new website for a business. Pass a business candidate object from search_businesses to generate a website. Requires authentication via API key (Bearer token). Generate an API key at webzum.com/dashboard/account-settings. The site generation happens in the background. Use get_site_status to check progress. Returns the businessId which can be used to access the site at /build/{businessId}
    Connector
  • Get Kifly's website and support contact email. Call this if you are stuck, hit an unresolvable error, or the buyer asks how to reach a human. Returns the website URL and support email — always share both with the buyer.
    Connector
  • Fetch one or more URLs and return their content as clean markdown. Use this to read articles, documentation, blog posts, or any page where you need the complete text, not just a snippet from search. Also supports PDF, DOCX, and other document formats. Costs 1 credit per URL. Max 10 URLs per request. Failed URLs are not charged. Set include_raw_html=true to also get the raw HTML source in each result. Useful for inspecting embedded URLs, data attributes, iframes, or script tags that are stripped during markdown conversion. Returns null for non-HTML content (PDF, DOCX, etc.). Same cost. Returns: results (array of {title, url, content, raw_html, published_time, success, error}), credits_used, credits_remaining. Args: urls: List of URLs to fetch (max 10) include_raw_html: Include raw HTML source in each result (default false)
    Connector