nab is a local, token-optimized HTTP client server that fetches web content as clean markdown for LLM pipelines, with browser authentication, anti-bot evasion, and structured output.
fetch— Fetch any URL and convert to clean markdown (HTML, PDF, JSON, plain text, SPA data), with:Browser cookie injection (Brave, Chrome, Firefox, Safari, Edge, Dia)
Focus mode (BM25 scoring) to extract only relevant sections
Token budget with structure-aware truncation
Diff mode to return only changed content vs. a previous snapshot
Named sessions for persistent cookie state across requests
HTTP/2, HTTP/3 (QUIC), TLS 1.3, and realistic browser fingerprinting
fetch_batch— Fetch multiple URLs concurrently with HTTP/2 multiplexing and connection pooling, returning per-URL timing resultssubmit— Submit web forms by auto-extracting hidden fields and CSRF tokens, merging user-provided fields, and POSTing; supports session-based cookie persistencelogin— Auto-login to websites using 1Password credentials, with MFA/TOTP handling and session cookie storage for subsequent authenticated requestsauth_lookup— Look up 1Password credentials for a URL/domain, returning username and TOTP availability without exposing the passwordfingerprint— Generate realistic browser fingerprint profiles (Chrome, Firefox, Safari) including User-Agent, Sec-CH-UA, and Accept-Language headersvalidate— Run a validation suite against real websites to verify HTTP/2, HTTP/3, compression, fingerprinting, TLS 1.3, and 1Password integrationsbenchmark— Benchmark URL fetching over multiple iterations, returning min/avg/max response times and error counts
Enables credential lookup, auto-login with CSRF handling, and TOTP support for accessing authenticated websites.
Used as the underlying protocol for extracting structured data from Mastodon instances.
Enables authenticated web fetching by injecting session cookies from the Brave browser.
Allows defining custom extraction rules for specific web elements using CSS selectors in plugins.
Utilized to retrieve structured discussion data from Hacker News via its Firebase-backed API.
Facilitates authenticated access and bot evasion through cookie injection and browser fingerprinting.
Extracts structured data from issues and pull requests via the GitHub REST API.
Provides specialized extraction for Google Workspace services including Docs, Sheets, and Slides.
Converts documents into clean markdown while preserving comments and suggested edits.
Renders spreadsheet data as markdown tables for easy processing by LLMs.
Extracts text content and comments from presentations into clean markdown.
Retrieves structured content from posts and reels using the oEmbed API.
Enables data extraction from Single Page Applications (SPAs) and lightweight rendering of dynamic content.
The primary output format, converting complex HTML into token-efficient markdown optimized for LLM context windows.
Fetches structured social media content from Mastodon users and statuses via ActivityPub.
Retrieves subreddit discussions and comments using the Reddit JSON API.
Supports authenticated fetching and anti-fingerprinting using Safari's browser profiles and cookies.
Used for configuring custom site extractors and plugins via plugins.toml.
Provides native passkey authentication support for secure website logins.
Extracts structured article content using the Wikipedia REST API.
Retrieves video metadata and information via the YouTube oEmbed API.
nab
Fetch any URL as clean markdown — with your browser cookies, anti-bot evasion, and 25x fewer tokens than raw HTML.

nab is a local, token-optimized HTTP client built for LLM pipelines. It converts web pages to clean markdown, injects your real browser session cookies for authenticated content, and spoofs browser fingerprints to bypass bot detection. No API keys. No cloud. Just fast, authenticated, LLM-ready output.
Why nab?
Feature | nab | Firecrawl | Crawl4AI | Playwright | Jina Reader | curl |
Clean markdown output | Built-in (25x savings) | Markdown | Markdown | Raw HTML | Markdown | Raw HTML |
Browser cookie auth | Auto-detect (6 browsers) | None | None | Requires login script | API key | Manual |
Anti-bot evasion | Fingerprint spoofing | Cloud proxy | Stealth plugin | Stealth plugin | Cloud-side | None |
JS rendering | QuickJS (1MB, local) | Cloud browser | Chromium (300MB+) | Chromium (300MB+) | Cloud-side | None |
Speed (typical page) | ~50ms | ~1-3s | ~2-5s | ~2-5s | ~500ms | ~100ms |
Token output (typical) | ~500 | ~1,500 | ~1,500 | ~12,500 | ~2,000 | ~12,500 |
Runs locally | Yes (single binary) | Cloud API | Yes (Python + Chrome) | Yes (Node + Chrome) | Cloud API | Yes |
HTTP/3 (QUIC) | Yes | No | No | No | N/A | Build-dependent |
Site-specific APIs | 11 built-in providers | None | None | None | None | None |
1Password / Passkeys | Native | None | None | None | None | None |
Cost | Free (local) | $0.004/page | Free (local) | Free (local) | Free tier / paid | Free (local) |
Install size | ~15MB binary | Cloud service | ~300MB+ | ~300MB+ | Cloud service | ~5MB |
Quick Start
# Install (pick one)
brew install MikkoParkkola/tap/nab # Homebrew
cargo install nab # From crates.io
cargo binstall nab # Pre-built binaryFetch a page as clean markdown
nab fetch https://example.comAccess authenticated content with your browser cookies
# Auto-detects your default browser and injects session cookies
nab fetch https://github.com/notifications --cookies braveNo login flows. No API keys. nab reads your existing browser cookies (Brave, Chrome, Firefox, Safari, Edge, Dia) and uses them for the request. You stay logged in — nab just borrows the session.
Bypass bot detection with fingerprint spoofing
# Realistic Chrome/Firefox/Safari profiles — not a headless browser signature
nab fetch https://protected-site.comnab ships with anti-fingerprinting by default: realistic TLS fingerprints, browser-accurate headers, and randomized profiles. Sites see a normal browser, not a scraping tool.
Features
11 Site Providers — Specialized extractors for Twitter/X, Reddit, Hacker News, GitHub, Google Workspace, YouTube, Wikipedia, StackOverflow, Mastodon, LinkedIn, and Instagram. API-backed where possible for structured output.
Google Workspace Extraction — Fetch Google Docs, Sheets, and Slides as clean markdown using browser cookies. Extracts comments and suggested edits from OOXML (docx/xlsx/pptx).
HTML-to-Markdown — Automatic conversion with boilerplate removal. 25x token savings vs raw HTML.
PDF Extraction — PDF-to-markdown with heading and table detection (requires pdfium).
Browser Cookie Auth — Auto-detects your default browser (Brave, Chrome, Firefox, Safari, Edge, Dia) and injects session cookies. Zero config.
1Password Integration — Credential lookup, auto-login with CSRF handling, TOTP/MFA support.
Passkey/WebAuthn — Native passkey authentication via 1Password's open-source library.
HTTP/3 (QUIC) — 0-RTT connection resumption, HTTP/2 multiplexing, TLS 1.3.
Anti-Fingerprinting — Realistic Chrome/Firefox/Safari browser profiles to avoid bot detection.
JS Engine (QuickJS) — Lightweight embedded JavaScript for pages that need it, without a full browser.
Compression — Brotli, Zstd, Gzip, Deflate decompression built in.
Query-Focused Extraction — BM25-lite scoring extracts only the sections relevant to your query. Send
focus="authentication"and get back just the auth docs, not the entire page.Token Budget — Structure-aware truncation respects headings, code blocks, and tables. Never splits mid-block. Set
max_tokens=2000to fit any context window.Prefetch Link Graph — Extract same-site links from fetched pages, scored by relevance to your focus query. eTLD+1 filtering via Mozilla's public suffix list.
Persistent Sessions — Named sessions with automatic cookie persistence across requests. LRU eviction (32 slots), cookie seeding from browser jars.
CSS Extractor Plugins — Define custom extractors in
plugins.tomlusing CSS selectors — no Rust code required.MCP Server —
nab-mcpbinary for direct integration with Claude Code and other MCP clients.Batch Fetching — Parallel URL fetching with connection pooling.
Site Providers
nab detects URLs for these platforms and uses their APIs or structured data instead of scraping HTML:
Provider | URL Patterns | Method |
Twitter/X |
| FxTwitter API |
| JSON API | |
Hacker News |
| Firebase API |
GitHub |
| REST API |
Google Workspace |
| Export API + OOXML |
YouTube |
| oEmbed |
Wikipedia |
| REST API |
StackOverflow |
| API |
Mastodon |
| ActivityPub |
| oEmbed | |
| oEmbed |
If no provider matches, nab falls back to standard HTML fetch + markdown conversion.
Usage
# Basic fetch (auto-cookies, markdown output)
nab fetch https://example.com
# Force specific browser cookies
nab fetch https://github.com/notifications --cookies brave
# With 1Password credentials
nab fetch https://internal.company.com --1password
# Google Docs (markdown with comments and suggested edits)
nab fetch --cookies brave "https://docs.google.com/document/d/DOCID/edit"
# Google Sheets (CSV rendered as markdown table)
nab fetch --cookies brave "https://docs.google.com/spreadsheets/d/SHEETID/edit"
# Google Slides (plain text with comments)
nab fetch --cookies brave "https://docs.google.com/presentation/d/SLIDEID/edit"
# Raw HTML output (skip markdown conversion)
nab fetch https://example.com --raw-html
# JSON output format
nab fetch https://api.example.com --format json
# Batch benchmark
nab bench "https://example.com,https://httpbin.org/get" -i 10
# Get OTP code from 1Password
nab otp github.com
# Generate browser fingerprint profiles
nab fingerprint -c 5CLI Reference
Command | Description |
| Fetch a URL and convert to clean markdown |
| Extract data from JavaScript-heavy SPA pages |
| Submit a form with smart field extraction and CSRF handling |
| Auto-login to a website using 1Password credentials |
| Stream media from various providers (Yle, NRK, SVT, DR) |
| Analyze video with transcription and vision pipeline |
| Add subtitles and overlays to video |
| Benchmark fetching with timing statistics |
| Generate and display browser fingerprint profiles |
| Test 1Password credential lookup for a URL |
| Run validation tests against real websites |
| Get OTP code from 1Password |
| Export browser cookies in Netscape format |
Common flags for fetch:
Flag | Description |
| Use cookies from browser: |
| Use 1Password credentials for this URL |
| HTTP or SOCKS5 proxy URL |
| Output format: |
| Output raw HTML instead of markdown |
| Extract links only |
| Show what changed since the last fetch |
| Disable SPA data extraction |
| Batch fetch URLs from file (one per line) |
| Max concurrent requests for batch mode (default: 5) |
| HTTP method: GET, POST, PUT, DELETE, PATCH |
| Request body data (for POST/PUT/PATCH) |
| Custom request header (repeatable) |
| Save body to file |
| Enable verbose debug logging |
PDF Extraction
nab converts PDF files to markdown with heading detection and table reconstruction. Requires pdfium (ships with Chromium, or install via Homebrew).
# Fetch a PDF and convert to markdown
nab fetch https://example.com/report.pdf
# Save PDF conversion to file
nab fetch https://arxiv.org/pdf/2301.00001 -o paper.mdThe PDF pipeline extracts character positions via pdfium, reconstructs text lines, detects tables through column alignment, and renders clean markdown. Target performance is ~10ms/page. Maximum input size is 50 MB.
Proxy Support
nab supports HTTP and SOCKS5 proxies via the --proxy flag or environment variables.
# Explicit proxy
nab fetch https://example.com --proxy socks5://127.0.0.1:1080
nab fetch https://example.com --proxy http://proxy.company.com:8080
# Environment variables (checked in this order)
export HTTPS_PROXY=http://proxy:8080
export HTTP_PROXY=http://proxy:8080
export ALL_PROXY=socks5://proxy:1080The --proxy flag takes precedence over environment variables. Both uppercase and lowercase variants (HTTPS_PROXY / https_proxy) are recognized.
Environment Variables
Variable | Purpose |
| HTTPS proxy URL |
| HTTP proxy URL |
| Proxy for all protocols |
| Claude API key for |
| Logging level (e.g., |
| Pushover notifications for MFA |
| Telegram notifications for MFA |
Configuration
nab requires no configuration files. It uses smart defaults: auto-detected browser cookies, randomized fingerprints, and markdown output.
Optional plugin configuration at ~/.config/nab/plugins.toml:
# Binary plugin — external process (original format)
[[plugins]]
name = "my-provider"
binary = "/usr/local/bin/nab-plugin-example"
patterns = ["example\\.com/.*"]
# CSS extractor — no external binary needed (new in v0.5)
[[plugins]]
name = "internal-wiki"
type = "css"
patterns = ["wiki\\.corp\\.com/.*"]
[plugins.content]
selector = "div.wiki-content"
remove = ["nav", ".sidebar"]
[plugins.metadata]
title = "h1.page-title"
author = ".author-name"
published = "time.published"Binary plugins receive a URL as JSON on stdin and return markdown on stdout. CSS extractors run in-process using configurable selectors — no code required.
Persistent state stored in ~/.nab/:
Path | Purpose |
| Content snapshots for |
| Saved login sessions |
| Cached browser versions (auto-updates every 14 days) |
MCP Server
nab ships a native Rust MCP server (nab-mcp) for integration with Claude Code and other MCP clients.
Setup -- add to your MCP client configuration:
{
"mcpServers": {
"nab": {
"command": "nab-mcp"
}
}
}Available tools:
Tool | Description | Key Parameters |
| Fetch URL and convert to markdown |
|
| Fetch multiple URLs in parallel |
|
| Submit a web form with CSRF extraction |
|
| Auto-login via 1Password |
|
| Look up 1Password credentials |
|
| Generate browser fingerprints |
|
| Run validation test suite | — |
| Benchmark URL fetching |
|
The MCP server uses MCP protocol 2025-11-25 (latest) over stdio and shares a single AcceleratedClient across all tool calls for connection pooling.
Protocol features:
Tool annotations — read-only, destructive, and open-world hints on all 8 tools
Structured output —
outputSchema+structured_contenton all 8 tools (machine-parseable JSON alongside human-readable text)URL elicitation — OAuth/SSO login sends the user to the auth URL in-browser (Google, GitHub, Microsoft, Apple, and 9 more)
Form elicitation — interactive credential input and multi-select cookie source picker
Task-augmented execution —
fetch_batchcan run asynchronously with progress notificationsServer icons — globe SVG in light/dark themes
Benchmarks
HTML-to-markdown conversion throughput (via cargo bench):
Payload | Throughput |
1 KB HTML | 2.8 MB/s |
10 KB HTML | 14.5 MB/s |
50 KB HTML | 22.3 MB/s |
200 KB HTML | 28.1 MB/s |
Arena allocator vs Vec<String> for response buffering:
Benchmark | Arena (bumpalo) | Vec | Speedup |
Realistic 10KB response | 4.2 us | 9.3 us | 2.2x |
1MB large response | 380 us | 890 us | 2.3x |
1000 small allocations | 12 us | 28 us | 2.3x |
Run benchmarks yourself: cargo bench
Install
Homebrew (macOS/Linux)
brew tap MikkoParkkola/tap
brew install nabFrom crates.io (requires Rust 1.93+)
cargo install nabPre-built binary (cargo-binstall)
cargo binstall nabOr download directly from GitHub Releases:
Platform | Binary |
macOS Apple Silicon |
|
macOS Intel |
|
Linux x86_64 |
|
Linux ARM64 |
|
Windows x64 |
|
From source
git clone https://github.com/MikkoParkkola/nab.git
cd nab && cargo install --path .Library Usage
use nab::AcceleratedClient;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let client = AcceleratedClient::new()?;
let html = client.fetch_text("https://example.com").await?;
println!("Fetched {} bytes", html.len());
Ok(())
}Requirements
Rust 1.93+ (for building from source)
ffmpeg (optional, for streaming/analyze commands):
brew install ffmpeg1Password CLI (optional): Install guide
Architecture
See docs/ARCHITECTURE.md for the full internal architecture, module organization, data flow diagrams, and extension points.
Contributing
See CONTRIBUTING.md for development setup, code style guidelines, testing instructions, and pull request process.
Responsible Use
This tool includes browser cookie extraction and fingerprint spoofing capabilities. These features are intended for legitimate use cases such as accessing your own authenticated content and automated testing. Use responsibly and only on sites where you have authorization.
License
MIT License - see LICENSE for details.