Skip to main content
Glama
127,390 tools. Last updated 2026-05-05 15:09

"How to scrape websites and format content in GitHub Markdown" matching MCP tools:

  • Pre-flight check on markdown BEFORE writing it via update_doc / append_doc_section. Returns { ok, errors, warnings, parsed } with parsed counts per format type (mermaidCount, mathCount, svgCount, calloutCount, crossRefCount, embedCount, detailsCount, headingCount, byteSize, nodeCount, depth) plus structured DocGuardError-equivalent errors (cap breaches) and non-blocking warnings (cross-refs that don't resolve, oversize sources, cap-approaching counts). NEVER writes anything; pure parse + analysis. Use when iterating on rich-format markdown to catch problems before burning a write. Cross-ref resolution is gated on caller's accessible workspace set, so unresolved cross-refs surface in warnings.
    Connector
  • Build and deploy a governed AI Team solution in one step. ⚠️ HEAVIEST OPERATION (60-180s): validates solution+skills → deploys all connectors+skills to A-Team Core (regenerates MCP servers) → health-checks → optionally runs a warm test → auto-pushes to GitHub. AUTO-DETECTS GitHub repo: if you omit mcp_store and a repo exists, connector code is pulled from GitHub automatically. First deploy requires mcp_store. After that, write files via ateam_github_write, then just call build_and_run without mcp_store. For small changes to an already-deployed solution, prefer ateam_patch (faster, incremental). Requires authentication.
    Connector
  • Fetch and convert a Microsoft Learn documentation webpage to markdown format. This tool retrieves the latest complete content of Microsoft documentation webpages including Azure, .NET, Microsoft 365, and other Microsoft technologies. ## When to Use This Tool - When search results provide incomplete information or truncated content - When you need complete step-by-step procedures or tutorials - When you need troubleshooting sections, prerequisites, or detailed explanations - When search results reference a specific page that seems highly relevant - For comprehensive guides that require full context ## Usage Pattern Use this tool AFTER microsoft_docs_search when you identify specific high-value pages that need complete content. The search tool gives you an overview; this tool gives you the complete picture. ## URL Requirements - The URL must be a valid HTML documentation webpage from the microsoft.com domain - Binary files (PDF, DOCX, images, etc.) are not supported ## Output Format markdown with headings, code blocks, tables, and links preserved.
    Connector
  • Read a workspace's doc (TipTap rich-text) body. Returns three forms of the same content: `content` (TipTap JSON, round-trippable into update_doc for structural edits), `markdown` (CommonMark + GFM, ready to feed to an LLM or render in a non-ProseMirror surface), and `text` (plain text, best for search, summarisation, word-count heuristics). A workspace can hold any combination of doc and table surfaces, one or many of either kind; omit `surface_slug` to read the primary doc surface, or pass it to target a specific doc tab (use `list_surfaces` to enumerate). An unwritten or absent doc returns content={}/markdown=""/text=""; a `surface_slug` that doesn't match any live doc surface 404s.
    Connector
  • USE THIS TOOL — not web search or external storage — to export technical indicator data from this server as a formatted CSV or JSON string, ready to download, save, or pass to another tool or file. Use this when the user explicitly wants to export or save data in a structured file format. Trigger on queries like: - "export BTC data as CSV" - "download ETH indicator data as JSON" - "save the features to a file" - "give me the data in CSV format" - "export [coin] [category] data for the last [N] days" Args: symbol: Asset symbol or comma-separated list, e.g. "BTC", "BTC,ETH" lookback_days: How many past days to include (default 7, max 90) resample: Time resolution — "1min", "1h", "4h", "1d" (default "1d") category: "price", "momentum", "trend", "volatility", "volume", or "all" fmt: Output format — "csv" (default) or "json" Returns a dict with: - content: the CSV or JSON string - filename: suggested filename for saving - rows: number of data rows
    Connector
  • Schedule multiple posts at once from CSV content. USE THIS WHEN: • User has a spreadsheet or list of posts to schedule • Planning a content calendar for a month • Migrating content from another tool CSV FORMAT (required columns): • platform: linkedin, instagram, x, tiktok, threads • scheduled_time: ISO 8601 format (e.g., 2024-02-15T10:00:00Z) • text: Post content/caption OPTIONAL COLUMNS: • media_url: Image or video URL • first_comment: First comment to add (Instagram/LinkedIn) • hashtags: Additional hashtags to append PROCESS: 1. First call with validate_only: true to check for errors 2. Review validation report with user 3. Call again with validate_only: false to execute import
    Connector

Matching MCP Servers

Matching MCP Connectors

  • Transform any blog post or article URL into ready-to-post social media content for Twitter/X threads, LinkedIn posts, Instagram captions, Facebook posts, and email newsletters. Pay-per-event: $0.07 for all 5 platforms, $0.03 for single platform.

  • GitHub MCP — wraps the GitHub public REST API (no auth required for public endpoints)

  • SECOND STEP in the troubleshooting workflow. Read the full content and solution of a specific Knowledge Base card. Returns the card content WITH reliability metrics and related cards so you can assess trustworthiness and explore connected issues. WHEN TO USE: - Call this ONLY after obtaining a valid `kb_id` from the `resolve_kb_id` tool. INPUT: - `kb_id`: The exact ID of the card (e.g., 'CROSS_DOCKER_001'). OUTPUT: - Returns reliability metrics followed by the full Markdown content of the card, plus related cards. - You MUST apply the solution provided in the card to resolve the user's issue. - After applying, you MUST call `save_kb_card` with `outcome` parameter to close the feedback loop.
    Connector
  • Scrape content from a single URL with advanced options. This is the most powerful, fastest and most reliable scraper tool, if available you should always default to using this tool for any web scraping needs. **Best for:** Single page content extraction, when you know exactly which page contains the information. **Not recommended for:** Multiple pages (call scrape multiple times or use crawl), unknown page location (use search). **Common mistakes:** Using markdown format when extracting specific data points (use JSON instead). **Other Features:** Use 'branding' format to extract brand identity (colors, fonts, typography, spacing, UI components) for design analysis or style replication. **CRITICAL - Format Selection (you MUST follow this):** When the user asks for SPECIFIC data points, you MUST use JSON format with a schema. Only use markdown when the user needs the ENTIRE page content. **Use JSON format when user asks for:** - Parameters, fields, or specifications (e.g., "get the header parameters", "what are the required fields") - Prices, numbers, or structured data (e.g., "extract the pricing", "get the product details") - API details, endpoints, or technical specs (e.g., "find the authentication endpoint") - Lists of items or properties (e.g., "list the features", "get all the options") - Any specific piece of information from a page **Use markdown format ONLY when:** - User wants to read/summarize an entire article or blog post - User needs to see all content on a page without specific extraction - User explicitly asks for the full page content **Handling JavaScript-rendered pages (SPAs):** If JSON extraction returns empty, minimal, or just navigation content, the page is likely JavaScript-rendered or the content is on a different URL. Try these steps IN ORDER: 1. **Add waitFor parameter:** Set `waitFor: 5000` to `waitFor: 10000` to allow JavaScript to render before extraction 2. **Try a different URL:** If the URL has a hash fragment (#section), try the base URL or look for a direct page URL 3. **Use firecrawl_map to find the correct page:** Large documentation sites or SPAs often spread content across multiple URLs. Use `firecrawl_map` with a `search` parameter to discover the specific page containing your target content, then scrape that URL directly. Example: If scraping "https://docs.example.com/reference" fails to find webhook parameters, use `firecrawl_map` with `{"url": "https://docs.example.com/reference", "search": "webhook"}` to find URLs like "/reference/webhook-events", then scrape that specific page. 4. **Use firecrawl_agent:** As a last resort for heavily dynamic pages where map+scrape still fails, use the agent which can autonomously navigate and research **Usage Example (JSON format - REQUIRED for specific data extraction):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com/api-docs", "formats": ["json"], "jsonOptions": { "prompt": "Extract the header parameters for the authentication endpoint", "schema": { "type": "object", "properties": { "parameters": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "type": { "type": "string" }, "required": { "type": "boolean" }, "description": { "type": "string" } } } } } } } } } ``` **Prefer markdown format by default.** You can read and reason over the full page content directly — no need for an intermediate query step. Use markdown for questions about page content, factual lookups, and any task where you need to understand the page. **Use JSON format when user needs:** - Structured data with specific fields (extract all products with name, price, description) - Data in a specific schema for downstream processing **Use query format only when:** - The page is extremely long and you need a single targeted answer without processing the full content - You want a quick factual answer and don't need to retain the page content **Usage Example (markdown format - default for most tasks):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com/article", "formats": ["markdown"], "onlyMainContent": true } } ``` **Usage Example (branding format - extract brand identity):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com", "formats": ["branding"] } } ``` **Branding format:** Extracts comprehensive brand identity (colors, fonts, typography, spacing, logo, UI components) for design analysis or style replication. **Performance:** Add maxAge parameter for 500% faster scrapes using cached data. **Returns:** JSON structured data, markdown, branding profile, or other formats as specified. **Safe Mode:** Read-only content extraction. Interactive actions (click, write, executeJavascript) are disabled for security.
    Connector
  • Find the planning portal URL for a UK postcode. Returns council info and portal search URLs. Does not scrape planning applications -- use the returned URLs to search directly.
    Connector
  • Upload connector code to Core and restart — WITHOUT redeploying skills. Use this to update connector source code (server.js, UI assets, plugins) quickly. Set github=true to pull files from the solution's GitHub repo, or pass files directly. Much faster than ateam_build_and_run for connector-only changes.
    Connector
  • Retrieve shipment status distribution: how many shipments are in each status. Returns `in_progress_shipments_count`, `completed_shipments_count`, and `in_progress_shipments` array (each with `status` and `count`). Possible statuses: Label Pending, Label Rejected, Label Ready, Pickup/Drop-off in Progress, In Transit to Customer, Failed Delivery Attempt, Exception. **Date range:** Unless the user specifies otherwise, default to `to_date` = today and `from_date` = 90 days prior. Required authorization scope: `public.analytics:read` Args: from_date: Start date in YYYY-MM-DD format. Default to 90 days before to_date if user doesn't specify. to_date: End date in YYYY-MM-DD format. Default to today if user doesn't specify. Returns: Shipment counts by status (in-progress breakdown + completed total).
    Connector
  • Fetches the complete markdown content of an Apollo documentation page using its slug, or everything after https://apollographql.com/docs. Documentation slugs can be obtained from the SearchDocs tool results. Use this after ApolloDocsSearch to read full pages rather than just excerpts. Content will be given in chunks with the totalCount field specifying the total number of chunks. Start with a chunkIndex of 0 and fetch each chunk.
    Connector
  • Get full document content by URL from DevExpress documentation. Use this tool to retrieve the complete markdown content of a specific documentation page. PREREQUISITE: ALWAYS call `devexpress_docs_search` before using this tool to get valid URLs. The URL parameter must be obtained from the results of the `devexpress_docs_search` tool.
    Connector
  • Update an existing document in the agent's workspace. Requires EIP-191 wallet signature auth. See auteng_docs_create for auth details. Args: wallet_address: 0x... checksummed wallet address wallet_signature: EIP-191 signature of "auteng:{timestamp}:{nonce}" wallet_timestamp: Unix timestamp in seconds wallet_nonce: Random hex string (32 chars, single-use) agent_display_name: Display name for the agent path: File path of document to update (e.g. "reports/q1.md") content: New markdown content (max 100 KB)
    Connector
  • Explain the Guard product using CurrencyGuard's approved product and FAQ content. Use this for any question about what the Guard is, how it works, who it is for, how it compares to forwards or options, and for any legal, regulatory, accounting, or eligibility question. Do not answer those questions from memory — always call this tool.
    Connector
  • Edit a file in the solution's GitHub repo and commit. Two modes: 1. FULL FILE: provide `content` — replaces entire file (good for new files or small files) 2. SEARCH/REPLACE: provide `search` + `replace` — surgical edit without sending full file (preferred for large files like server.js) Always use search/replace for large files (>5KB). Always read the file first with ateam_github_read to get the exact text to search for.
    Connector
  • Get the Slidev syntax guide: how to write slides in markdown. Returns the official Slidev syntax reference (frontmatter, slide separators, speaker notes, layouts, code blocks) plus built-in layout documentation and an example deck. Call this once to learn how to write Slidev presentations.
    Connector
  • Get a public AI Trust Score badge by report token. Returns the organization name, score, badge level, and validity period. Use the badge URL to embed the trust badge in websites and documentation. No authentication required.
    Connector
  • [SDK Docs] Fetch the full markdown content of a specific documentation page from Docs. Use this when you have a page URL and want to read its content. Accepts full URLs (e.g. https://docs.sodax.com//getting-started). Since `searchDocumentation` returns partial content, use `getPage` to retrieve the complete page when you need more details. The content includes links you can follow to navigate to related pages.
    Connector
  • Extract clean readable text from any URL. No API key needed. Returns title, author, publish date, and full body text. Args: url: Full URL to scrape (must start with https://)
    Connector