Schema | WebReaper

WebReaper

Overview Schema Related Servers Score Discussions

Server Configuration

Describes the environment variables required to run the server.

Name	Required	Description	Default
No arguments

Capabilities

Features and capabilities supported by this server

Capability	Details
`tools`	{ "listChanged": true }
`logging`	{}

Tools

Functions exposed to the LLM to take actions

Name	Description
mapA	Discover URLs on a site via sitemap.xml + root-page link extraction. Returns a newline-separated list of URLs.
extractA	Extract structured fields from a URL using a JSON schema. The schema mirrors the WebReaper Schema shape: { field, children: [ { field, selector, type, is_list }, ... ] }. Returns the extracted record(s) as JSON Lines.
crawlA	Crawl a whole site: recursively follow on-domain links from the start URL and return one Markdown record per page as JSON Lines. WARNING: this is a single long BLOCKING call, bounded by max_pages (default 50, hard cap 1000). It emits MCP progress notifications per page for clients that render them (e.g. Claude Desktop); blocking clients like n8n just wait for the result. For a large site prefer 'map' to list URLs, then 'scrape' each URL, so every call stays short and you keep per-URL control.
scrapeA	Fetch a URL and return its main content as LLM-ready Markdown. The lowest-cost call against any site. Useful for reading a page into context.
extract_with_promptA	Extract structured data from a URL with an LLM, using a natural-language instruction instead of a CSS schema (e.g. "each person's name, title, and email"). Returns the extracted record(s) as JSON Lines. Requires an OpenAI-compatible LLM endpoint configured on the MCP host: set WEBREAPER_LLM_MODEL and WEBREAPER_LLM_BASE_URL (e.g. https://api.openai.com/v1 or http://localhost:11434/v1), with the API key in WEBREAPER_LLM_API_KEY (or OPENAI_API_KEY). The optional model parameter overrides WEBREAPER_LLM_MODEL for this call. Costs one LLM call.
extract_inferredA	Extract structured data from a URL WITHOUT writing a schema: an LLM infers the schema from the page (optionally steered by a goal), then WebReaper extracts deterministically. Cheaper and more consistent than extract_with_prompt across similarly shaped pages. Requires an OpenAI-compatible LLM endpoint on the host: WEBREAPER_LLM_MODEL + WEBREAPER_LLM_BASE_URL, key in WEBREAPER_LLM_API_KEY (or OPENAI_API_KEY). Returns the extracted record(s) as JSON Lines.

Prompts

Interactive templates invoked by user choice

Name	Description
No prompts

Resources

Contextual data attached and managed by the client

Name	Description
No resources

Server Configuration
Capabilities
Tools
Prompts
Resources

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/alex-on-ai/WebReaper'

If you have feedback or need assistance with the MCP directory API, please join our Discord server