acrawl
Server Configuration
Describes the environment variables required to run the server.
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||
Capabilities
Features and capabilities supported by this server
| Capability | Details |
|---|---|
| tools | {} |
Tools
Functions exposed to the LLM to take actions
| Name | Description |
|---|---|
| navigateA | Navigate to a URL and return the page content as fit_markdown (default, prunes boilerplate for token efficiency), structured markdown, plain text, or raw HTML. Automatically escalates from fast HTTP fetch to full headless browser when JavaScript rendering is detected (React, Next.js, Vue, Angular markers, or short bodies). Returns content with an embedded page_map of headings, links, forms, and interactive elements for subsequent tool calls. Use this as the primary tool for accessing any web page. |
| go_backA | Navigate the browser back to the previous page in history (equivalent to the browser back button). Returns the URL navigated to and a page_state object with headings, landmarks, and links of the resulting page. Use after clicking into a page to return to a listing or search results without re-navigating by URL. |
| refreshA | Reload the current page. Returns page_state after reload. Use after setting intercept rules to replay the page with rules active. Seq increments for temporal observation queries. |
| scrollA | Scroll the current page up or down by a specified pixel amount to reveal content beyond the visible viewport. Returns updated page_state after scrolling, reflecting any newly loaded lazy content. Use to reveal below-the-fold content, trigger infinite scroll loading, or navigate long pages section by section. |
| switch_tabA | Switch the browser focus to a different open tab by its zero-based index. Returns the tab count and a page_state object reflecting the switched-to tab's content (headings, landmarks, links). Use to access pages opened by link targets, popups, or forked sub-agents without re-navigating. |
| waitA | Wait for a DOM element to reach a specified state (visible, hidden, attached, detached) or pause for a fixed duration. Use after actions that trigger asynchronous page changes such as form submissions, AJAX requests, or animations. Returns post-action page_state showing the resulting URL, title, and structural diff once the condition is met or the timeout expires. |
| clickA | Click on a page element identified by CSS selector, @eN reference, or visible label text. May trigger navigation, form submission, or dynamic content changes. Returns post-action page_state. Use 'selector' for CSS/ref-based targeting; use 'text' (with optional 'role' and 'region') to activate a button, tab, checkbox, or link by its visible label — useful for SPA admin UIs and modals where CSS paths are fragile. |
| click_atA | Click at specific viewport coordinates (x, y) using a real mouse event. Use exclusively for elements without stable CSS selectors: canvas drawings, interactive maps, SVG regions, or coordinate-based UIs. Returns post-action page_state. Prefer the selector-based 'click' tool for normal DOM elements — it is more reliable and does not require coordinate calculation. |
| fill_formA | Fill one or more form fields with values and optionally submit the form. Accepts field identifiers as CSS selectors, field names/IDs, or @eN references from page_map. Also resolves fields by visible label text page-wide — works in modals and div-based UIs without a boundary. Returns post-action page_state with the resulting URL and structural diff. Use form_selector to disambiguate when the page contains multiple forms. |
| select_optionA | Select an option from a native or custom ARIA/portal dropdown. Identify the target control via CSS selector or @eN ref, then specify which option to select by its value attribute, visible label text, or zero-based index. Omit value, label, and index to open the dropdown, enumerate the currently available options, and return them without selecting. Returns post-action page_state showing any page changes triggered by the selection (e.g. dependent dropdowns updating). |
| hoverA | Hover the mouse over a page element to trigger hover-dependent UI such as tooltips, dropdown menus, or expandable content. Returns post-action page_state showing any newly revealed elements. Use this before click when content only appears on mouseover; use click instead if the element needs activation rather than hover. |
| press_keyA | Dispatch a keyboard key press event on the page or a targeted element. Supports named keys (Enter, Escape, Tab, ArrowDown, Backspace) and character keys. Returns post-action page_state reflecting any DOM changes caused by the keypress. Use for form submission (Enter), closing modals (Escape), focus navigation (Tab), or keyboard shortcuts. |
| execute_jsA | Execute arbitrary JavaScript in the page context and return the evaluation result. The script runs synchronously in the browser's main frame with full access to the DOM, window, and page APIs. Use as a last resort when CSS selectors and other tools cannot achieve the interaction — prefer click, fill_form, and select_option for standard interactions. |
| page_mapA | Get a comprehensive structural map of the current page including headings (h1–h6 with section sizes), landmark regions, forms with field details, links (text + href, capped at 50), and interactive elements (buttons, inputs, selects with state and @eN refs). Also returns a regions hierarchy (sidebar/main/dialog), the active_dialog, and non-form controls alongside headings/landmarks/links/interactive. Use to discover page structure before interacting, or with scope to inspect a specific modal/dialog without background noise. Scope accepts semantic tokens ('dialog', 'main', 'sidebar') or a region handle (@r1) in addition to a raw CSS selector. Each interactive element returns a stable @eN reference for use in click, fill_form, hover, press_key, and select_option. |
| read_contentA | Extract plain text content from a specific page section identified by heading name or CSS selector. Supports pagination via offset and max_chars for large sections. Returns content, total character count, and whether more content is available. Use after page_map to read specific sections without re-fetching the entire page; use navigate instead when you need the full page content. |
| list_resourcesA | List all discoverable resources on the current page: links (with href and text), images (with src and alt), and forms (with action and method). Returns the complete set without caps — use this when page_map's 50-link limit is insufficient or when you need image URLs. No parameters required. |
| list_network_activityA | List observed network requests buffered during this browser session. Supports temporal filtering by seq window, request-state filters, URL substring filtering, adjective-based sorting such as slowest/fastest or newest/oldest, and an inline content_type field on each row. Returns stable @rN refs for follow-up inspection with inspect_request. |
| inspect_requestA | Inspect a previously listed network request by its @rN id from list_network_activity. Returns the captured request metadata, coarse timing summary, initiator type, and notes about unavailable headers/bodies. |
| list_page_logsA | List buffered console logs for the current page with optional level filtering and seq-based temporal filtering. Group by exact message text (default, deduplicated with @logN IDs), source, or level. |
| inspect_logA | Inspect a deduplicated console log group from list_page_logs and return concrete instances with timestamps, stack traces, and source locations. |
| list_websocket_activityA | Overview of WebSocket connections and message counts. Returns connections with @wsN IDs. Use inspect_websocket to see actual message content. |
| inspect_websocketA | Inspect actual WebSocket messages for a connection. Provide @wsN ID from list_websocket_activity. Supports direction filter, pattern search, and sort_by (newest/oldest). |
| screenshotA | Capture a screenshot of the current page viewport, a specific element, or the full scrollable page. Returns base64-encoded image data by default, or saves to disk when save=true. Use as a LAST RESORT only after text-based tools (page_map, read_content, execute_js) have failed to provide the needed information — screenshots are expensive and cannot be searched or parsed programmatically. |
| save_fileA | Download a resource from a URL and save it to the local output directory. Handles any file type (images, PDFs, CSVs, etc.) via HTTP GET. Returns the absolute path of the saved file. Use to persist crawl artifacts; path traversal is blocked for security. |
| get_page_performanceA | Get page performance metrics using Navigation Timing and Resource Timing APIs. Returns TTFB, DOM timings, and a breakdown of the top 20 resources by transfer size. Works on both browsers and SPAs. |
| inspect_cookiesA | Inspect cookies on the current page with security analysis. Returns all cookies with domain, path, expiry, secure/httponly flags, and detected security issues (missing_secure, missing_httponly, sameSite_none_without_secure, excessive_lifetime, overly_broad_domain). Includes third-party detection and filtering options. |
| inspect_storageA | Inspect browser storage (localStorage and sessionStorage) on the current page. Returns all key-value pairs with size information. Supports filtering by storage type and key pattern. |
| measure_coverageA | Measure JavaScript and CSS code coverage on the current page. Returns per-file byte usage showing how much code was actually executed/applied versus total loaded. Useful for identifying unused bundles, oversized dependencies, and performance optimization opportunities. |
| audit_accessibilityA | Run axe-core WCAG accessibility audit on the current page. Returns violations grouped by impact level with selectors and descriptions. Use scope to limit to a specific DOM subtree. |
| intercept_networkA | Manage network interception rules. Block or mock requests matching URL glob patterns. Rules are additive — each call adds a rule. Use refresh() after adding rules to replay the page load with rules active. |
| run_scriptA | Execute a deterministic multi-step script without per-step LLM round-trips, running on a cloned browser tab. Scripts support loops (for/while/forEach), conditionals (if/else), error handling (try/catch), parallel branches, and variable capture. Returns a script_id immediately — use wait_for_scripts to collect results. Provide either an inline script definition or a name to load a previously saved script. Use when you detect a repetitive pattern (same operation on 3+ pages/items). |
| script_statusA | Check the current execution status of a running or completed script without blocking. Returns the script's state (running, completed, failed, cancelled), current step count, extracted data so far, and any error message. Use to monitor long-running scripts between other actions; use wait_for_scripts to block until completion. |
| wait_for_scriptsA | Block until one or more scripts finish execution and return their collected results. Returns a JSON array of ScriptResult objects with extracted_data, yielded checkpoints, step count, and status. If script_ids is omitted, waits for ALL active scripts. Use after run_script to collect final results. |
| cancel_scriptA | Abort a running script immediately, closing its browser tab and discarding any partial results not yet yielded. The script transitions to 'cancelled' status. Use when a script is stuck, taking too long, or its results are no longer needed. |
| save_scriptA | Persist a script definition to disk at ~/.acrawl/scripts/.json for reuse across sessions. Once saved, run it later with run_script using name instead of providing the full inline definition. Use for complex extraction patterns you want to apply repeatedly. |
| list_scriptsA | List all previously saved scripts with their metadata (name, creation date, last modified, size). Returns a JSON array. Use to discover available scripts before running them with run_script by name, or to audit what scripts exist on disk. |
| read_scriptA | Read the full JSON definition of a previously saved script. Returns the complete script object including version, steps, and limits. Use to inspect a saved script's logic before running it, or to understand what an existing script does before modifying and re-saving it. |
| set_deviceA | Switch browser device emulation between mobile and desktop modes. Recreates the browser context with new viewport, user agent, and touch settings. Cookies and localStorage are preserved. Use preset device names for convenience or provide custom parameters. Returns page_state showing the page as rendered in the new device mode. Cannot be used while sub-agents are running. |
| run_goalA | Execute a high-level crawl goal autonomously. The agent plans, navigates, and extracts data using its own LLM loop. Returns structured results when done. Use this for complex multi-page tasks; use individual tools (navigate, click, etc.) for fine-grained control. |
Prompts
Interactive templates invoked by user choice
| Name | Description |
|---|---|
No prompts | |
Resources
Contextual data attached and managed by the client
| Name | Description |
|---|---|
No resources | |
Latest Blog Posts
- Your AI Chatbot Just Exposed Your CEO's Salary to an InternBy Om-Shree-0709 on .Agent IdentityMCP SecurityOAuth Delegation
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/Mingye-Lu/AgenticCrawler'
If you have feedback or need assistance with the MCP directory API, please join our Discord server