What can you do with this server?

The webclaw server lets you extract, crawl, and analyze web content for AI agents, RAG pipelines, and developer workflows. Here's what you can do: * Scrape – Extract content from a single URL as markdown, LLM-optimized text, plain text, HTML, or JSON; supports CSS selector filtering, main-content extraction, and auto-fallback to cloud API when bot protection or JS rendering is detected. * Crawl – Breadth-first crawl of a website from a seed URL, following links up to a configurable depth and page limit, with optional sitemap seeding and concurrent requests. * Batch – Scrape multiple URLs concurrently and return extracted content for all of them at once. * Map – Discover all URLs from a website's sitemaps (via robots.txt + sitemap.xml) without fully extracting every page. * Extract – Extract structured data from a web page using an LLM, guided by a JSON schema or a natural language prompt. * Summarize – Generate a concise LLM-powered summary of a web page, with configurable sentence count. * Diff – Compare a URL's current content against a previously saved extraction snapshot to highlight what has changed. * Brand – Extract brand identity assets (colors, fonts, logo, favicon) from a website's HTML and CSS. * Search – Search the web for a query and return structured results (requires API key). * Research – Run a deep, multi-source research investigation on a topic or question, with an optional deep mode for more thorough results (requires API key).

Which integrations are available for this server?

Integrates with local Ollama instances for private, AI-powered structured data extraction and content summarization. Utilizes OpenAI's API to perform schema-enforced extraction and content summarization on scraped data. Provides specialized extraction of structured metadata and content details from YouTube video pages.

webclaw

by 0xMassi

Overview Schema Related Servers Score Discussions

Rust

Hybrid

Most web scraping tools give your agent one of two bad outputs:

a blocked page, login wall, or empty app shell
raw HTML full of nav, scripts, styling, ads, and duplicated boilerplate

webclaw.io is the hosted web extraction API for webclaw. This repo contains the open-source CLI, MCP server, extraction engine, and self-hostable server.

webclaw turns a URL into clean content your tools can actually use.

webclaw https://example.com --format markdown

# Example Domain

This domain is for use in illustrative examples in documents.

You may use this domain in literature without prior coordination or asking for permission.

Use it from the terminal, wire it into Claude/Cursor through MCP, call the hosted API from your app, or self-host the OSS server.

Install

Agent setup

The fastest way to connect webclaw to Claude Code, Claude Desktop, Cursor, Windsurf, OpenCode, Codex CLI, and other MCP-compatible tools:

npx create-webclaw

The installer detects supported clients and configures the MCP server for you.

Homebrew

brew tap 0xMassi/webclaw
brew install webclaw

Prebuilt binaries

Download macOS, Linux, and Windows binaries from GitHub Releases.

Docker

docker run --rm ghcr.io/0xmassi/webclaw https://example.com

Cargo

cargo install --git https://github.com/0xMassi/webclaw.git webclaw-cli
cargo install --git https://github.com/0xMassi/webclaw.git webclaw-mcp

If building from source fails because native build tools are missing, install the platform prerequisites:

OS	Command
Debian / Ubuntu	`sudo apt install -y pkg-config libssl-dev cmake clang git build-essential`
Fedora / RHEL	`sudo dnf install -y pkg-config openssl-devel cmake clang git make gcc`
Arch	`sudo pacman -S pkg-config openssl cmake clang git base-devel`
macOS	`xcode-select --install`

Related MCP server: read-website-fast

Quick Start

Scrape one page

webclaw https://stripe.com --format markdown

Return LLM-optimized text

webclaw https://docs.anthropic.com --format llm

Keep only the main content

webclaw https://example.com/blog/post --only-main-content

Include or exclude selectors

webclaw https://example.com \
  --include "article, main, .content" \
  --exclude "nav, footer, .sidebar, .ad"

Crawl a documentation site

webclaw https://docs.rust-lang.org --crawl --depth 2 --max-pages 50

Workflow examples

Extract brand assets

webclaw https://github.com --brand

Compare a page over time

webclaw https://example.com/pricing --format json > pricing-old.json
webclaw https://example.com/pricing --diff-with pricing-old.json

MCP Server

webclaw ships with an MCP server for AI agents.

npx create-webclaw

Manual config:

{
  "mcpServers": {
    "webclaw": {
      "command": "~/.webclaw/webclaw-mcp"
    }
  }
}

Then ask your agent things like:

Scrape these competitor pricing pages and summarize the differences.

Crawl this documentation site and prepare clean context for a RAG index.

Extract the brand colors, fonts, and logos from this company website.

Use as an agent skill

Add webclaw to Claude Code, Cursor, Windsurf, and other MCP agents in one command:

npx skills add 0xMassi/webclaw-skill

Your agent gets scrape, crawl, map, extract, summarize, diff, brand, and search as native tools. Most sites extract locally with no API key. Set WEBCLAW_API_KEY to handle bot-protected and JavaScript-rendered pages.

Find it on skills.sh.

Tools

Tool	What it does	Local
`scrape`	Extract one URL as markdown, text, JSON, LLM format, or HTML	Yes
`crawl`	Follow same-origin links and extract discovered pages	Yes
`map`	Discover URLs without extracting every page	Yes
`batch`	Scrape multiple URLs in parallel	Yes
`extract`	Convert page content into structured data	Yes, with local or configured LLM
`summarize`	Summarize a page	Yes, with local or configured LLM
`diff`	Compare page content snapshots	Yes
`brand`	Extract colors, fonts, logos, and metadata	Yes
`search`	Search the web and scrape results	Hosted API
`research`	Multi-source research workflow	Hosted API

SDKs

npm install @webclaw/sdk
pip install webclaw
go get github.com/0xMassi/webclaw-go

import { Webclaw } from "@webclaw/sdk";

const client = new Webclaw({ apiKey: process.env.WEBCLAW_API_KEY! });

const page = await client.scrape({
  url: "https://example.com",
  formats: ["markdown"],
  only_main_content: true,
});

console.log(page.markdown);

from webclaw import Webclaw

client = Webclaw(api_key="wc_your_key")

page = client.scrape(
    "https://example.com",
    formats=["markdown"],
    only_main_content=True,
)

print(page.markdown)

curl -X POST https://api.webclaw.io/v1/scrape \
  -H "Authorization: Bearer $WEBCLAW_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "formats": ["markdown"],
    "only_main_content": true
  }'

Output Formats

Format	Use it when you need
`markdown`	Clean page content with structure preserved
`llm`	Compact context for agents and RAG pipelines
`text`	Plain text with minimal formatting
`json`	Structured metadata, links, images, and extracted fields
`html`	Cleaned HTML for custom processing

Local First, Hosted When Needed

The CLI and MCP server work locally without an account for the core extraction path.

Use the hosted API at webclaw.io when you need:

protected-site access without managing infrastructure
JavaScript rendering
async crawl and research jobs
web search
watches and production usage tracking
SDKs for application code

export WEBCLAW_API_KEY=wc_your_key

webclaw https://example.com --cloud

What You Can Build

Use case	Example
AI agent web access	Give Claude, Cursor, or another MCP client clean page context
RAG ingestion	Crawl docs, help centers, blogs, and knowledge bases
Competitor monitoring	Track pricing pages, changelogs, docs, and product pages
Structured extraction	Turn messy pages into typed JSON for automations
Research workflows	Search, scrape, summarize, and cite multiple sources
Brand intelligence	Extract logos, colors, fonts, and social metadata

Architecture

webclaw/
  crates/
    webclaw-core     HTML to markdown, text, JSON, and LLM-ready output
    webclaw-fetch    Fetching, crawling, batching, and mapping
    webclaw-llm      Local and hosted LLM provider support
    webclaw-pdf      PDF text extraction
    webclaw-mcp      MCP server for AI agents
    webclaw-cli      Command-line interface

webclaw-core is pure extraction logic: no network I/O, small surface area, and usable independently from the fetching layer.

Configuration

Variable	Description
`WEBCLAW_API_KEY`	Hosted API key
`OLLAMA_HOST`	Ollama URL for local LLM features
`OPENAI_API_KEY`	OpenAI-compatible LLM provider key
`OPENAI_BASE_URL`	OpenAI-compatible base URL
`ANTHROPIC_API_KEY`	Anthropic-compatible LLM provider key
`ANTHROPIC_BASE_URL`	Anthropic-compatible base URL
`WEBCLAW_PROXY`	Single proxy URL
`WEBCLAW_PROXY_FILE`	Proxy pool file

Contributing

The most useful contributions right now are practical and small:

add examples for real agent and RAG workflows
improve SDK snippets
report pages that extract poorly
add failing fixtures for messy HTML
improve docs for MCP clients and local setup
test the CLI on more Linux/macOS environments

Good first places to start:

If a page extracts badly, include:

URL:
Command or API request:
Expected output:
Actual output:
Format used: markdown / llm / text / json / html
CLI, MCP, SDK, or API:

Please remove secrets, cookies, private tokens, and customer data from logs before posting.

Infrastructure Partner

Studio Partners

Community Plugins

Third-party plugins that integrate webclaw with AI agent platforms:

Plugin	Platform	What it does
openclaw-webclaw	OpenClaw	Native webclaw v1 API plugin with 9 tools: scrape, search, crawl, extract, summarize, diff, map, batch, brand
hermes-webclaw	Hermes Agent	Web search provider and 9 dedicated tools for the full v1 API surface. Install with `hermes plugins install jal-co/hermes-webclaw`

Built a webclaw integration? Open a PR to add it here.

Contributors

Thanks to everyone improving webclaw through issues, examples, docs, bug reports, and pull requests.

Star History

License

AGPL-3.0

Install Server

license - permissive license

quality

maintenance

How are these scores calculated?

Maintenance

–Maintainers

<1hResponse time

1dRelease cycle

55Releases (12mo)

Commit activity

Issues opened vs closed

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/0xMassi/webclaw'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

Install

Agent setup

Homebrew

Prebuilt binaries

Docker

Cargo

Quick Start

Scrape one page

Return LLM-optimized text

Keep only the main content

Include or exclude selectors

Crawl a documentation site

Workflow examples

Extract brand assets

Compare a page over time

MCP Server

Use as an agent skill

Tools

SDKs

Output Formats

Local First, Hosted When Needed

What You Can Build

Architecture

Configuration

Contributing

Infrastructure Partner

Studio Partners

Community Plugins

Contributors

Star History

License

Maintenance

Resources

Tools

Appeared in Searches

Latest Blog Posts

MCP directory API