WebReaper
Server Configuration
Describes the environment variables required to run the server.
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||
Capabilities
Features and capabilities supported by this server
| Capability | Details |
|---|---|
| tools | {
"listChanged": true
} |
| logging | {} |
Tools
Functions exposed to the LLM to take actions
| Name | Description |
|---|---|
| mapA | Discover URLs on a site via sitemap.xml + root-page link extraction. Returns a newline-separated list of URLs. |
| extractA | Extract structured fields from a URL using a JSON schema. The schema mirrors the WebReaper Schema shape: { field, children: [ { field, selector, type, is_list }, ... ] }. Returns the extracted record(s) as JSON Lines. |
| crawlA | Crawl a whole site: recursively follow on-domain links from the start URL and return one Markdown record per page as JSON Lines. WARNING: this is a single long BLOCKING call, bounded by max_pages (default 50, hard cap 1000). It emits MCP progress notifications per page for clients that render them (e.g. Claude Desktop); blocking clients like n8n just wait for the result. For a large site prefer 'map' to list URLs, then 'scrape' each URL, so every call stays short and you keep per-URL control. |
| scrapeA | Fetch a URL and return its main content as LLM-ready Markdown. The lowest-cost call against any site. Useful for reading a page into context. |
| extract_with_promptA | Extract structured data from a URL with an LLM, using a natural-language instruction instead of a CSS schema (e.g. "each person's name, title, and email"). Returns the extracted record(s) as JSON Lines. Requires an OpenAI-compatible LLM endpoint configured on the MCP host: set WEBREAPER_LLM_MODEL and WEBREAPER_LLM_BASE_URL (e.g. https://api.openai.com/v1 or http://localhost:11434/v1), with the API key in WEBREAPER_LLM_API_KEY (or OPENAI_API_KEY). The optional model parameter overrides WEBREAPER_LLM_MODEL for this call. Costs one LLM call. |
| extract_inferredA | Extract structured data from a URL WITHOUT writing a schema: an LLM infers the schema from the page (optionally steered by a goal), then WebReaper extracts deterministically. Cheaper and more consistent than extract_with_prompt across similarly shaped pages. Requires an OpenAI-compatible LLM endpoint on the host: WEBREAPER_LLM_MODEL + WEBREAPER_LLM_BASE_URL, key in WEBREAPER_LLM_API_KEY (or OPENAI_API_KEY). Returns the extracted record(s) as JSON Lines. |
Prompts
Interactive templates invoked by user choice
| Name | Description |
|---|---|
No prompts | |
Resources
Contextual data attached and managed by the client
| Name | Description |
|---|---|
No resources | |
Latest Blog Posts
- Your AI Chatbot Just Exposed Your CEO's Salary to an InternBy Om-Shree-0709 on .Agent IdentityMCP SecurityOAuth Delegation
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/alex-on-ai/WebReaper'
If you have feedback or need assistance with the MCP directory API, please join our Discord server