servo-fetch
servo-fetch embeds the Servo browser engine. It executes JavaScript, computes CSS layout, captures screenshots with a software renderer, and extracts clean content — available as a CLI, a Rust library, and a Python SDK.
# CLI
servo-fetch "https://example.com" # clean Markdown
servo-fetch "https://example.com" --screenshot page.png # PNG screenshot// Rust
let md = servo_fetch::markdown("https://example.com")?;# Python
page = servo_fetch.fetch("https://example.com")
print(page.markdown)Why servo-fetch
Zero dependencies — single binary, no Chromium, no API key
Real JS execution — SpiderMonkey runs JavaScript, parallel CSS engine computes layout
Layout- and visibility-aware extraction — strips navbars, sidebars, footers by rendered position, plus cookie banners, modals, and CSS-hidden content (
opacity:0,aria-hidden, sr-only)Schema-driven JSON — declarative CSS-selector schema pulls structured data
Parallel batch fetch — multiple URLs fetched concurrently
Site crawling — BFS link traversal with robots.txt, same-site scope, and rate limiting
URL discovery — sitemap-based URL mapping without rendering (fast, lightweight)
Screenshots without GPU — software renderer captures PNG/full-page screenshots anywhere
Accessibility tree — AccessKit integration with roles, names, and bounding boxes
Performance and quality
Apple M3 Pro, versus Playwright (the typical AI-agent stack):
Benchmark | servo-fetch | playwright:optimized |
Time — static-small | ~231 ms | ~645 ms |
Time — spa-heavy | ~331 ms | ~798 ms |
Memory (peak RSS) | 51–64 MB | 300–328 MB |
Extraction quality: mean word-F1 0.819 vs Readability's 0.728 across
eight page-type fixtures, with without[] boilerplate removal at 95.0%
vs 78.6%. Direct-binary engine peers (chrome-headless-shell, Lightpanda,
curl) are opt-in.
Methodology, three-axis breakdown, per-fixture F1, and raw JSON:
benchmarks/README.md +
benchmarks/results/.
Install
Interface | Install | Docs |
CLI |
| |
Rust |
| |
Python |
|
cargo binstall servo-fetch-cli # prebuilt binary
cargo install servo-fetch-cli # build from sourceOr download from GitHub Releases.
Linux — install runtime deps and use xvfb-run on headless servers:
sudo apt install -y libegl1 libfontconfig1 libfreetype6
xvfb-run --auto-servernum servo-fetch "https://example.com"Windows — cargo binstall does not copy sidecar files (cargo-binstall#353), so the installed servo-fetch.exe fails at startup with a missing libEGL.dll. Download the .zip from Releases instead — it bundles libEGL.dll and libGLESv2.dll.
macOS — no extra setup needed.
Quick Start
CLI
servo-fetch "https://example.com" # Markdown (default)
servo-fetch "https://example.com" --format json # Structured JSON
servo-fetch "https://example.com" --screenshot page.png # PNG screenshot
servo-fetch "https://example.com" --js "document.title" # Run JavaScript
servo-fetch "https://example.com" --schema schema.json # Schema-driven JSON
servo-fetch URL1 URL2 URL3 # Parallel batch
servo-fetch "https://example.com" --output page.md # Save to a single file
servo-fetch URL1 URL2 --output-dir ./out/ # Save each URL to its own file
servo-fetch crawl "https://docs.example.com" --limit 20 # Crawl a site
servo-fetch crawl URL --output-dir ./pages/ # Save each crawled page to its own file
servo-fetch map "https://example.com" # Discover URLs via sitemap
servo-fetch mcp # MCP server (stdio)
servo-fetch serve # HTTP API serverFull CLI reference → servo-fetch-cli
Rust
cargo add servo-fetch// URL → Markdown in one line
let md = servo_fetch::markdown("https://example.com")?;
// Fetch with options
use servo_fetch::{fetch, FetchOptions};
use std::time::Duration;
let page = fetch(FetchOptions::new("https://example.com").timeout(Duration::from_secs(60)))?;
println!("{}", page.html);
let md = page.markdown()?;
// Crawl a site
servo_fetch::crawl_each(
servo_fetch::CrawlOptions::new("https://docs.example.com")
.limit(100)
.user_agent("MyBot/1.0"),
|result| match &result.outcome {
Ok(page) => println!("{}: {} chars", result.url, page.content.len()),
Err(e) => eprintln!("{}: {e}", result.url),
},
)?;
// Discover URLs via sitemap (no rendering)
let urls = servo_fetch::map(
servo_fetch::MapOptions::new("https://example.com").limit(1000),
)?;
for u in &urls {
println!("{}", u.url);
}Full API reference → servo-fetch
Python
pip install servo-fetchimport servo_fetch
page = servo_fetch.fetch("https://example.com")
print(page.markdown)
# Schema extraction
from servo_fetch import Schema, Field
schema = Schema(
base_selector=".product",
fields=[
Field(name="title", selector="h2", type="text"),
Field(name="price", selector=".price", type="text"),
],
)
page = servo_fetch.fetch("https://shop.example.com", schema=schema)
print(page.extracted)Full API reference → bindings/python
MCP Server
Built-in Model Context Protocol server with six tools: fetch,
batch_fetch, crawl, map, screenshot, and execute_js.
{
"mcpServers": {
"servo-fetch": {
"command": "servo-fetch",
"args": ["mcp"]
}
}
}Streamable HTTP: servo-fetch mcp --port 8080
Full MCP tool reference → servo-fetch-cli README
HTTP API
REST endpoints for containerized deployments and HTTP clients:
servo-fetch serve # 127.0.0.1:3000
servo-fetch serve --host 0.0.0.0 --port 80 # expose to network
curl -X POST http://127.0.0.1:3000/v1/fetch \
-H 'content-type: application/json' \
-d '{"url":"https://example.com"}'Endpoints: GET /health, GET /version, POST /v1/fetch, POST /v1/batch_fetch, POST /v1/screenshot, POST /v1/execute_js, POST /v1/crawl, POST /v1/map.
Full HTTP API reference → servo-fetch-cli README
Docker
Multi-arch image on GitHub Container Registry (linux/amd64, linux/arm64):
docker run --rm -p 3000:3000 ghcr.io/konippi/servo-fetch:latest
curl -X POST http://127.0.0.1:3000/v1/fetch \
-H 'content-type: application/json' \
-d '{"url":"https://example.com"}'Runs as non-root (UID 1001). Images are signed with cosign (keyless) and published with SLSA provenance and SBOM attestations.
Agent Skills
servo-fetch ships with an Agent Skills package for AI coding agents:
npx skills add https://github.com/konippi/servo-fetch/tree/main/skills/servo-fetchSecurity
servo-fetch blocks all private and reserved IP ranges (RFC 6890), strips credentials from URLs, disables HTTP redirects to prevent SSRF bypass, and sanitizes all output against terminal escape injection (CVE-2021-42574). See SECURITY.md for details.
Limitations
Sites behind login walls or CAPTCHAs are not supported.
Contributing
See CONTRIBUTING.md for development setup, commit conventions, and PR guidelines.
License
MIT OR Apache-2.0
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/konippi/servo-fetch'
If you have feedback or need assistance with the MCP directory API, please join our Discord server