213,223 tools. Last updated 2026-06-19 13:33

"Methods to Scrape a Website and Fetch Markdown Files from GitHub" matching MCP tools:

get_data
Abs Au
Fetch observations from an ABS dataflow. dataKey is a dot-separated SDMX filter with one position per dimension (order from dataflow_structure); each position is a code, "+"-joined codes, or empty for wildcard. Pass "all" to fetch everything (can be large). Returns decoded series with their dimension labels and time-indexed values. Fetch dataflow_structure first to learn the dimension order and valid codes.
Connector
convert_html_to_pdf
Sats4AI - Bitcoin-Powered AI Tools
Convert HTML or Markdown to a pixel-perfect PDF. Returns JSON: { url } — a temporary download URL (valid ~1 hour). Great for generating invoices, reports, receipts, or formatted documents programmatically. Supports full HTML/CSS including tables, images (base64 or URL), and inline styles. For Markdown input, set format='markdown'. 50 sats per conversion. Use convert_file instead for converting existing files between formats (e.g., DOCX→PDF). Pay per request with Bitcoin Lightning — no API key or signup needed. Requires create_payment with toolName='convert_html_to_pdf'.
Connector
ateam_upload_connector
ateam-mcp
Upload connector code to Core and restart — WITHOUT redeploying skills. MERGES with the GitHub state at `ref` by default (default ref: 'dev'). Sending a partial file set ONLY overlays those files — the rest of the connector is preserved from GitHub. To fully replace the connector dir (historical behavior), pass replace:true. Modes: • github:true (no files) — deploy the GitHub state at `ref` as-is. • github:true + files:[] — GitHub state at `ref` as BASE, your files overlay on top (incoming wins). • files:[] (no github) — default MERGE with GitHub state at `ref`. Refuses if no GitHub base exists (no silent nuke). • files:[] + replace:true — full replace. Wipes connector dir + writes only the provided files. Use deliberately. Common traps this design prevents: • Pre-fix bug (2026-06-06): sending just ui-dist HTML wiped server.js + node_modules — connector broke until a full re-upload. Now: those files merge with the GitHub base. • Pre-fix bug: github:true silently read from `main` even when patches were on `dev`. Now: defaults to dev; pass ref:'main' to opt into the legacy path.
Connector
analyze_competitor
Analook — Competitor Intelligence
Submit a competitor analysis job. Analyzes a competitor's website across 15+ data sources (SEO, traffic, social, Product Hunt, GitHub, Wayback Machine history, AI-generated insights, etc.) and returns a job_id. Use get_report_status(job_id) to poll and get_report(job_id) to retrieve results when status='completed'. Typical analysis takes 2-5 minutes. Requires authentication (deducts 1 credit from your Analook balance). Args: url: Competitor website URL (e.g. 'https://linear.app' or 'lovable.dev') product_name: Optional product name override (defaults to domain) Returns: {job_id: str, status: 'started', poll_url: str} on success {error: str, hint?: str} on auth/validation failure
Connector
get_human_profile
Human Pages
Get a human's FULL profile including contact info (email, Telegram, Signal), crypto wallets, fiat payment methods (PayPal, Venmo, etc.), and social links. Requires agent_key from register_agent. Rate limited: PRO = 50/day. Alternative: $0.05 via x402. Use this before create_job_offer to see how to pay the human. The human_id comes from search_humans results.
Connector
fetch_url
SteadyFetch
Fetch a URL with full reliability — retry, circuit breaker, cache, and anti-bot bypass. Returns both raw HTML and clean markdown. Automatically retries on failure with exponential backoff, falls back to plain HTTP if browser fetch fails, and circuit-breaks domains that are consistently down. Args: url: The URL to fetch use_cache: Whether to use cached results (default: true, TTL 1 hour) js_render: Whether to render JavaScript (default: true, disable for speed) wait_for: CSS selector to wait for before capturing (e.g., '.results-loaded')
Connector

Matching MCP Servers

Website to Markdown MCP Server
Web Scraping Browser Automation Documentation Access
SunZhi-Will
A
license
B
quality
D
maintenance
Fetches website content and converts it to Markdown format with AI-powered content cleanup, ad removal, and full OpenAPI/Swagger specification support for easy processing by AI assistants.
Last updated 2025-06-27
4
17
4
MIT
markdown-to-html
Developer Tools Autonomous Agents
fashionzzZ
A
license
B
quality
D
maintenance
A Model Context Protocol server that converts Markdown content to HTML format.
Last updated 2025-06-03
1
4,009
2
MIT

Matching MCP Connectors

arjunkmrm-fetchOAuth
Fetch web pages and extract exactly the content you need. Select elements with CSS and retrieve co…
Website Spec
Provides a platform-agnostic specification of the technical features every decent website should have

facebook_business_page
Searchapi
Fetch public business page information from Facebook. Returns page details including name, category, address, phone, website, ratings, reviews, followers, and cover/profile photos. Provide exactly one of page_id, username, or url — prefer url when the user pasted any Facebook link (including mobile share links), since the tool resolves the canonical page automatically.
Connector
get_template_file
Goodeye
Fetch a single file from a template version's file tree by path. ``identifier`` accepts UUID, ``@handle/slug``, or ``@handle/slug@vN``. ``path`` must match a row in the template's file manifest (use ``get_template`` to see the manifest). ``SKILL.md`` returns the template body. Text files return the decoded string in ``content``. Small binary files return base64-encoded bytes in ``content_base64``. Binary files over the size limit return metadata and an ``error`` field with no inline bytes. Non-owner responses carry the safety banner and ``safety_verification_status``. Anonymous callers may fetch files from live (published) template versions only; the liveness check is evaluated per read.
Connector
create_powersource_docs
Heista
Build a complete creative intelligence profile from internal brand documents — creative briefs, brand guidelines, product specs, customer research, competitive analysis. Takes any mix of file_ids (from a previous upload), document_urls (public PDF/DOCX/TXT/MD links, up to 10), or documents_inline (base64-encoded files with filename), plus an optional context_url for layering live brand context (colors, fonts, current messaging) and optional idempotency_key. Returns a job_id; poll with get_powersource. Output shape is identical to create_powersource_url: identity, offer, selling points, voice, buyer profile, tensions, angles, emotional arcs, ctas, narrative. Use this when the user says "I have a brief", "here's my brand guidelines", "use this document", drops a PDF / DOCX / strategy deck, or when the truth lives in internal materials rather than the public website. The pipeline reads text only — convert PDFs to markdown before submitting via documents_inline when possible. Costs 100 credits. Do NOT use for URL-only scans — use create_powersource_url. For URL + docs combined (highest fidelity, triangulates public messaging against internal strategy), use create_powersource_full.
Connector
get_agency
pickanagency
Fetch a single agency's full profile from Pick an Agency by its slug (the last path segment of its profile URL), including description, location, rating, services, website and a few recent client reviews. WHEN TO USE: after search_agencies or match_agencies returned a result the user wants to know more about, or when the user names a specific agency whose slug you already know. Don't guess slugs — find them via search_agencies first.
Connector
find_examples
Senzing
Find working SOURCE CODE examples from 37 indexed Senzing GitHub repositories. REQUIRED: either `query` (string, for search) or `repo` with `file_path` or `list_files=true` — the call WILL FAIL without one. Three modes: (1) Search: pass `query` to find examples across all repos, (2) File listing: pass `repo` + `list_files=true`, (3) File retrieval: pass `repo` + `file_path`. Indexes source code (.py, .java, .cs, .rs) and READMEs — NOT build/data files. For sample data, use get_sample_data. Covers Python, Java, C#, Rust SDK patterns: initialization, ingestion, search, redo, configuration, message queues, REST APIs. Use max_lines to limit large files. Returns GitHub raw URLs for file retrieval.
Connector
firecrawl_scrape
xpay✦ Web Scraping Collection
Scrape content from a single URL with advanced options. This is the most powerful, fastest and most reliable scraper tool, if available you should always default to using this tool for any web scraping needs. **Best for:** Single page content extraction, when you know exactly which page contains the information. **Not recommended for:** Multiple pages (call scrape multiple times or use crawl), unknown page location (use search). **Common mistakes:** Using markdown format when extracting specific data points (use JSON instead). **Other Features:** Use 'branding' format to extract brand identity (colors, fonts, typography, spacing, UI components) for design analysis or style replication. **CRITICAL - Format Selection (you MUST follow this):** When the user asks for SPECIFIC data points, you MUST use JSON format with a schema. Only use markdown when the user needs the ENTIRE page content. **Use JSON format when user asks for:** - Parameters, fields, or specifications (e.g., "get the header parameters", "what are the required fields") - Prices, numbers, or structured data (e.g., "extract the pricing", "get the product details") - API details, endpoints, or technical specs (e.g., "find the authentication endpoint") - Lists of items or properties (e.g., "list the features", "get all the options") - Any specific piece of information from a page **Use markdown format ONLY when:** - User wants to read/summarize an entire article or blog post - User needs to see all content on a page without specific extraction - User explicitly asks for the full page content **Handling JavaScript-rendered pages (SPAs):** If JSON extraction returns empty, minimal, or just navigation content, the page is likely JavaScript-rendered or the content is on a different URL. Try these steps IN ORDER: 1. **Add waitFor parameter:** Set `waitFor: 5000` to `waitFor: 10000` to allow JavaScript to render before extraction 2. **Try a different URL:** If the URL has a hash fragment (#section), try the base URL or look for a direct page URL 3. **Use firecrawl_map to find the correct page:** Large documentation sites or SPAs often spread content across multiple URLs. Use `firecrawl_map` with a `search` parameter to discover the specific page containing your target content, then scrape that URL directly. Example: If scraping "https://docs.example.com/reference" fails to find webhook parameters, use `firecrawl_map` with `{"url": "https://docs.example.com/reference", "search": "webhook"}` to find URLs like "/reference/webhook-events", then scrape that specific page. 4. **Use firecrawl_agent:** As a last resort for heavily dynamic pages where map+scrape still fails, use the agent which can autonomously navigate and research **Usage Example (JSON format - REQUIRED for specific data extraction):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com/api-docs", "formats": ["json"], "jsonOptions": { "prompt": "Extract the header parameters for the authentication endpoint", "schema": { "type": "object", "properties": { "parameters": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "type": { "type": "string" }, "required": { "type": "boolean" }, "description": { "type": "string" } } } } } } } } } ``` **Prefer markdown format by default.** You can read and reason over the full page content directly — no need for an intermediate query step. Use markdown for questions about page content, factual lookups, and any task where you need to understand the page. **Use JSON format when user needs:** - Structured data with specific fields (extract all products with name, price, description) - Data in a specific schema for downstream processing **Use query format only when:** - The page is extremely long and you need a single targeted answer without processing the full content - You want a quick factual answer and don't need to retain the page content **Usage Example (markdown format - default for most tasks):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com/article", "formats": ["markdown"], "onlyMainContent": true } } ``` **Usage Example (branding format - extract brand identity):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com", "formats": ["branding"] } } ``` **Branding format:** Extracts comprehensive brand identity (colors, fonts, typography, spacing, logo, UI components) for design analysis or style replication. **Performance:** Add maxAge parameter for 500% faster scrapes using cached data. **Returns:** JSON structured data, markdown, branding profile, or other formats as specified. **Safe Mode:** Read-only content extraction. Interactive actions (click, write, executeJavascript) are disabled for security.
Connector
sieve_dataroom_add
Sieve
Add a document to a deal's data room. Creates the deal if needed. This is the primary way to get documents into Sieve for screening. Upload a pitch deck, financials, or any document -- then call sieve_screen to analyze everything in the data room. Provide company_name to create a new deal (or find existing), or deal_id to add to an existing deal. Provide exactly one content source: file_path (local file), text (raw text/markdown), or url (fetch from URL). Args: title: Document title (e.g. "Pitch Deck Q1 2026"). company_name: Company name -- creates deal if new, finds existing if not. deal_id: Add to an existing deal (from sieve_deals or previous sieve_dataroom_add). website_url: Company website URL (used when creating a new deal). document_type: Type: 'pitch_deck', 'financials', 'legal', or 'other'. file_path: Path to a local file (PDF, DOCX, XLSX). The tool reads and uploads it. text: Raw text or markdown content (alternative to file). url: URL to fetch document from (alternative to file).
Connector
get_help
Kifly — Agentic Commerce & Payments
Get Kifly's website and support contact email. Call this if you are stuck, hit an unresolvable error, or the buyer asks how to reach a human. Returns the website URL and support email — always share both with the buyer.
Connector
microsoft_docs_fetch
xpay✦ DevTools Collection
Fetch and convert a Microsoft Learn documentation webpage to markdown format. This tool retrieves the latest complete content of Microsoft documentation webpages including Azure, .NET, Microsoft 365, and other Microsoft technologies. ## When to Use This Tool - When search results provide incomplete information or truncated content - When you need complete step-by-step procedures or tutorials - When you need troubleshooting sections, prerequisites, or detailed explanations - When search results reference a specific page that seems highly relevant - For comprehensive guides that require full context ## Usage Pattern Use this tool AFTER microsoft_docs_search when you identify specific high-value pages that need complete content. The search tool gives you an overview; this tool gives you the complete picture. ## URL Requirements - The URL must be a valid HTML documentation webpage from the microsoft.com domain - Binary files (PDF, DOCX, images, etc.) are not supported ## Output Format markdown with headings, code blocks, tables, and links preserved.
Connector
files-attach_from_url
hjarni
Fetch a file from a public URL and attach it to one of your personal notes (personal notes only; for team or shared notes use files-create_upload_url). Follows one redirect. Required: note_id (integer), url (string). Optional: filename (default: derived from URL), content_type (default: from HTTP response), description.
Connector
get_test_spec
agenttune
Fetch a complete, self-contained test specification as Markdown: full item list, response scale, scoring algorithm, and the mapping from result to tuning slug. Administer the items to the user inline (bulk-paste is fine), score per the algorithm, then call get_tuning. Tests: mbti (OEJTS, 32 items, ~5 min), enneagram (OEPS, 36, ~5 min), disc (ODAT, 16, ~3 min), attachment (ECR-R, 36, ~5 min), big-five (IPIP-50, 50, ~7 min → maps to ocean files).
Connector
create_site
WebZum - The Hosting Layer for AI-Generated Web Content
Create a new website for a business. Pass a business candidate object from search_businesses to generate a website. Requires authentication via API key (Bearer token). Generate an API key at webzum.com/dashboard/account-settings. The site generation happens in the background. Use get_site_status to check progress. Returns the businessId which can be used to access the site at /build/{businessId}
Connector
get_help
Kifly — Agentic Commerce & Payments
Get Kifly's website and support contact email. Call this if you are stuck, hit an unresolvable error, or the buyer asks how to reach a human. Returns the website URL and support email — always share both with the buyer.
Connector
fetch
Sofya
Fetch one or more URLs and return their content as clean markdown. Use this to read articles, documentation, blog posts, or any page where you need the complete text, not just a snippet from search. Also supports PDF, DOCX, and other document formats. Costs 1 credit per URL. Max 10 URLs per request. Failed URLs are not charged. Set include_raw_html=true to also get the raw HTML source in each result. Useful for inspecting embedded URLs, data attributes, iframes, or script tags that are stripped during markdown conversion. Returns null for non-HTML content (PDF, DOCX, etc.). Same cost. Returns: results (array of {title, url, content, raw_html, published_time, success, error}), credits_used, credits_remaining. Args: urls: List of URLs to fetch (max 10) include_raw_html: Include raw HTML source in each result (default false)
Connector