Skip to main content
Glama
webpeel
by webpeel

The Problem

Every AI agent that touches the web rebuilds the same brittle stack: HTTP fetch → headless browser → anti-bot bypass → HTML cleanup → markdown conversion → token budgeting. Each layer fails differently. Sites change. Cloudflare rotates challenges. Your agent gets empty strings at 2 AM and your pipeline breaks.

WebPeel replaces that entire stack with one function call. It handles engine selection, anti-bot escalation, domain-specific extraction, and token optimization so your agent gets clean, structured data every time — without managing browsers, proxies, or parsing logic.


Related MCP server: Anybrowse

Quick Start

# Zero-install — just run it
npx webpeel "https://example.com"

# Search the web
npx webpeel search "latest AI agent frameworks"

# Crawl an entire site
npx webpeel crawl docs.example.com --max-pages 50

# Screenshot any page
npx webpeel screenshot "https://stripe.com/pricing" --full-page

# Ask a question about any page
npx webpeel ask "https://arxiv.org/abs/2401.00001" "What is the main contribution?"

Or install globally:

npm install -g webpeel

Use as a library:

import { peel } from 'webpeel';

const result = await peel('https://news.ycombinator.com');
console.log(result.markdown);   // Clean markdown, ready for your LLM
console.log(result.metadata);   // Title, tokens saved, timing, etc.

Use via API:

curl "https://api.webpeel.dev/v1/fetch?url=https://stripe.com/pricing" \
  -H "Authorization: Bearer $WEBPEEL_API_KEY"
{
  "url": "https://stripe.com/pricing",
  "markdown": "# Stripe Pricing\n\n**Integrated per-transaction fees**...",
  "metadata": {
    "title": "Pricing & Fees | Stripe",
    "tokens": 420,
    "tokensOriginal": 8200,
    "savingsPct": 94.9
  }
}

Get your free API key → · No credit card required · 500 requests/week free


Why WebPeel

🧠 55+ Domain Extractors — Not Just HTML-to-Markdown

Generic scrapers convert raw HTML to markdown and call it a day. WebPeel has purpose-built extractors for 55+ domains — Reddit, GitHub, YouTube, Amazon, ArXiv, Hacker News, Wikipedia, StackOverflow, Zillow, Polymarket, ESPN, and more. Each extractor understands the site's structure and returns clean, structured data without browser rendering.

⚡ 65–98% Token Savings

Domain extractors strip navigation, ads, sidebars, and boilerplate before content reaches your agent. Less context consumed = lower costs, faster inference, and longer agent chains.

Site

Raw HTML tokens

WebPeel tokens

Savings

News article

18,000

640

96%

Reddit thread

24,000

890

96%

Wikipedia page

31,000

2,100

93%

GitHub README

5,200

1,800

65%

E-commerce product

14,000

310

98%

🔄 6-Layer Engine Escalation

WebPeel doesn't just try one method — it automatically escalates through 6 engines until it gets a good result:

Simple HTTP → Domain API → Browser render → Stealth browser → Cloaked browser → Search cache fallback

No manual --render flags for most sites. WebPeel knows which sites need JavaScript, which need stealth, and which have anti-bot protection — and picks the right engine automatically.

🔌 Firecrawl-Compatible Migration Path

Already using Firecrawl-style workflows? WebPeel supports compatible /v1/scrape, /v2/scrape, /v1/crawl, /v1/search, and /v1/map endpoints, which makes migration dramatically easier than rebuilding your pipeline from scratch.


Agent-Native Integrations

MCP Server (Claude, Cursor, Windsurf, VS Code)

Give any MCP-compatible AI the ability to browse, search, and extract from the web.

{
  "mcpServers": {
    "webpeel": {
      "command": "npx",
      "args": ["-y", "webpeel", "mcp"],
      "env": { "WEBPEEL_API_KEY": "wp_your_key_here" }
    }
  }
}

7 MCP tools exposed: webpeel_read · webpeel_find · webpeel_see · webpeel_extract · webpeel_monitor · webpeel_act · webpeel_crawl

Full MCP setup guide →

LangChain

import { WebPeelLoader } from 'webpeel/integrations/langchain';

const loader = new WebPeelLoader({ url: 'https://example.com', render: true });
const docs = await loader.load();

LlamaIndex

import { WebPeelReader } from 'webpeel/integrations/llamaindex';

const reader = new WebPeelReader();
const docs = await reader.loadData('https://example.com');

Python SDK

pip install webpeel
from webpeel import WebPeel

wp = WebPeel(api_key="wp_...")
result = wp.fetch("https://example.com")
print(result.markdown)

Full Feature Set

Capability

CLI

API

Details

Fetch & extract

webpeel "url"

GET /v1/fetch

Clean markdown from any URL

Web search

webpeel search "query"

GET /v1/search

DuckDuckGo (free) or Brave (BYOK)

Smart search

POST /v1/search/smart

AI-powered structured results

Crawl sites

webpeel crawl "url"

POST /v1/crawl

Depth/page limits, rate control

Screenshots

webpeel screenshot "url"

POST /v1/screenshot

Full-page, multi-viewport, visual diff, filmstrip

Structured extraction

--extract-schema

POST /v1/extract

JSON schema → structured data

Q&A

webpeel ask "url" "q"

POST /v1/answer

Answer questions about any page

Deep research

POST /v1/deep-research

Multi-query autonomous research

Content monitoring

webpeel monitor "url"

POST /v1/watch

Change detection with webhooks

Browser sessions

POST /v1/session

Persistent sessions for login flows

Browser actions

--action 'click:.btn'

actions field

Click, type, scroll, wait

Batch scrape

webpeel batch file

POST /v1/batch/scrape

Parallel multi-URL processing

URL discovery

webpeel map "url"

POST /v1/map

Sitemap and link discovery

YouTube transcripts

auto-detected

auto-detected

Multiple export formats

PDF extraction

auto-detected

auto-detected

Text, tables, structure

Research agent

POST /v1/agent

Autonomous multi-step research


Use Cases for Agent Builders

RAG pipelines — Fetch docs, articles, or entire sites as clean markdown ready for chunking, embedding, and retrieval.

Price monitoring — Track product pages across major commerce sites with structured extraction and change detection.

Competitive intel — Monitor competitor pages, pricing tables, and job boards. Visual diff screenshots catch layout changes CSS selectors would miss.

Research agents — Give Claude, Codex, Cursor, or your own agent grounded web access through the API or MCP server.

Lead enrichment — Pull company details, public links, and page structure from business sites without writing per-site parsers.

Content aggregation — Crawl and extract from communities, docs sites, and publications with domain-native extractors that understand each site's structure.


Architecture

Your Agent
    ↓
 WebPeel (npm / API / MCP)
    ↓
┌─────────────────────────────────┐
│  Engine Ranker                  │
│  HTTP → Domain API → Browser   │
│  → Stealth → Cloaked → Cache   │
├─────────────────────────────────┤
│  55+ Domain Extractors          │
│  reddit · github · youtube      │
│  amazon · arxiv · zillow · ...  │
├─────────────────────────────────┤
│  Content Pipeline               │
│  Readability → Turndown →       │
│  Token budgeting → Chunking     │
└─────────────────────────────────┘
    ↓
 Clean markdown / structured JSON

Reliability

WebPeel is built for production agent workflows, not just one-off demos.

  • Automated evals in-repo — smart search and fetch eval suites ship with the codebase

  • Post-deploy gate — critical checks run before calling a deploy healthy

  • Engine fallback chain — when one fetch method fails, WebPeel escalates instead of giving up

  • Multiple surfaces, one core — CLI, API, SDK, and MCP all ride the same extraction pipeline


Security

  • SSRF protection — blocks localhost, private IPs, metadata endpoints, file:// schemes

  • Helmet.js — HSTS, X-Frame-Options, nosniff, XSS protection on all responses

  • Webhook signing — HMAC-SHA256 on all outbound webhooks

  • API key hashing — SHA-256 with granular scopes

  • Rate limiting — sliding window, per-tier

  • Audit logging — every API call logged with IP, key, and action

  • GDPR compliantDELETE /v1/account for full data erasure Security policy → · SLA (99.9% uptime) →


Why teams choose WebPeel instead of stitching a stack together

Approach

What it gives you

Where it breaks down

Raw HTTP + HTML parsing

Cheap, simple fetches

Falls apart on JS-heavy sites, anti-bot pages, and noisy HTML

Pure browser automation

Maximum control

Expensive, slow, fragile, and high-maintenance for large-scale use

Search-only APIs

Great discovery

Weak page extraction, limited structured output, limited downstream actions

Single-purpose scrapers

Fast on one job

You end up composing 4–6 tools for real agent workflows

WebPeel

Fetch + search + crawl + extraction + screenshots + monitoring in one layer

Opinionated toward agent workflows rather than generic scraping


📖 Documentation · 💰 Pricing · 🎮 Playground · 📝 Blog · 💬 Discussions · 🚀 Releases · 📊 Status · 🔒 Security · 📋 Changelog


Contributing

Pull requests welcome. Please open an issue first to discuss major changes.

git clone https://github.com/webpeel/webpeel.git
cd webpeel && npm install
npm run build && npm test

License

WebPeel SDK License — free for personal and commercial use with attribution.

A
license - permissive license
-
quality - not tested
A
maintenance

Maintenance

Maintainers
33dResponse time
3dRelease cycle
12Releases (12mo)
Issues opened vs closed

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/webpeel/webpeel'

If you have feedback or need assistance with the MCP directory API, please join our Discord server