nab
nab is a local, token-optimized HTTP client server that fetches web content as clean markdown for LLM pipelines, with browser authentication, anti-bot evasion, and structured output.
fetch— Fetch any URL and convert to clean markdown (HTML, PDF, JSON, plain text, SPA data), with:Browser cookie injection (Brave, Chrome, Firefox, Safari, Edge, Dia)
Focus mode (BM25 scoring) to extract only relevant sections
Token budget with structure-aware truncation
Diff mode to return only changed content vs. a previous snapshot
Named sessions for persistent cookie state across requests
HTTP/2, HTTP/3 (QUIC), TLS 1.3, and realistic browser fingerprinting
fetch_batch— Fetch multiple URLs concurrently with HTTP/2 multiplexing and connection pooling, returning per-URL timing resultssubmit— Submit web forms by auto-extracting hidden fields and CSRF tokens, merging user-provided fields, and POSTing; supports session-based cookie persistencelogin— Auto-login to websites using 1Password credentials, with MFA/TOTP handling and session cookie storage for subsequent authenticated requestsauth_lookup— Look up 1Password credentials for a URL/domain, returning username and TOTP availability without exposing the passwordfingerprint— Generate realistic browser fingerprint profiles (Chrome, Firefox, Safari) including User-Agent, Sec-CH-UA, and Accept-Language headersvalidate— Run a validation suite against real websites to verify HTTP/2, HTTP/3, compression, fingerprinting, TLS 1.3, and 1Password integrationsbenchmark— Benchmark URL fetching over multiple iterations, returning min/avg/max response times and error counts
Enables credential lookup, auto-login with CSRF handling, and TOTP support for accessing authenticated websites.
Used as the underlying protocol for extracting structured data from Mastodon instances.
Enables authenticated web fetching by injecting session cookies from the Brave browser.
Allows defining custom extraction rules for specific web elements using CSS selectors in plugins.
Utilized to retrieve structured discussion data from Hacker News via its Firebase-backed API.
Facilitates authenticated access and bot evasion through cookie injection and browser fingerprinting.
Extracts structured data from issues and pull requests via the GitHub REST API.
Provides specialized extraction for Google Workspace services including Docs, Sheets, and Slides.
Converts documents into clean markdown while preserving comments and suggested edits.
Renders spreadsheet data as markdown tables for easy processing by LLMs.
Extracts text content and comments from presentations into clean markdown.
Retrieves structured content from posts and reels using the oEmbed API.
Enables data extraction from Single Page Applications (SPAs) and lightweight rendering of dynamic content.
The primary output format, converting complex HTML into token-efficient markdown optimized for LLM context windows.
Fetches structured social media content from Mastodon users and statuses via ActivityPub.
Retrieves subreddit discussions and comments using the Reddit JSON API.
Supports authenticated fetching and anti-fingerprinting using Safari's browser profiles and cookies.
Used for configuring custom site extractors and plugins via plugins.toml.
Provides native passkey authentication support for secure website logins.
Extracts structured article content using the Wikipedia REST API.
Retrieves video metadata and information via the YouTube oEmbed API.
nab
Token-optimized web fetcher + multilingual ASR + URL watcher. MCP 2025-11-25 compliant. Rust. macOS arm64 first, cross-platform.

nab is a single Rust binary that does three things very well: it fetches any URL as clean markdown (with your real browser cookies and anti-bot evasion), it analyzes any audio or video file with on-device multilingual ASR and speaker diarization, and it watches any URL for changes and pushes notifications when content moves. Everything runs locally. There are no API keys to set up by default. The output is shaped for LLM context windows.
Why nab
Token-lean by design. nab returns only what an LLM actually needs — clean markdown, BM25-lite query-focused extraction, and structure-aware token budgets — cutting the token cost of web research instead of dumping raw HTML into your context window.
Multimodal, fully on-device. Transcribe and diarize any audio or video (FluidAudio / Parakeet TDT v3 on the Apple Neural Engine — 131× realtime on a 2-hour clip, 25 EU languages, word-level timestamps, optional Qwen3-ASR for zh/ja/ko/vi) and OCR images via Apple Vision (15 languages, ~10–50 ms). No cloud, no API keys.
Authenticated reach. Real browser cookies, 1Password auto-login with TOTP/MFA, WebAuthn passkeys, fingerprint spoofing and WAF evasion — reach internal dashboards, SaaS apps, and paywalled research with the same command as a public URL.
Watch the web. Subscribe to any URL via MCP resources — conditional GETs, semantic diff, adaptive backoff. RSS for the entire web.
Prompt-injection defense, on by default. Hidden instructions addressed to your AI are surfaced to you, not silently executed by your model — see Security.
Everything is a single local Rust binary. No cloud backend, no API keys by default, output shaped for LLM context windows.
Related MCP server: fetch-guard
Quick start
Tell your AI assistant (recommended):
Read https://github.com/MikkoParkkola/nab and install nab as my web fetching and audio analysis MCP server
Your agent will install the binary, wire itself up, and start fetching. Works in Claude Code, Cursor, Windsurf, and any AI with terminal access.
Or install and try manually:
brew install MikkoParkkola/tap/nab # install
nab fetch https://news.ycombinator.com # fetch as markdown
nab models fetch fluidaudio # download ASR model
nab analyze interview.mp4 --diarize # transcribe + identify speakers
nab watch add https://status.openai.com --interval 5m # subscribe to changesFeatures
Command | What it does |
| Fetch any URL as clean markdown. HTTP/3, browser cookie injection (Brave / Chrome / Firefox / Safari / Edge / Dia), 1Password auto-login, fingerprint spoofing, fetch-time YARA-X redaction for prompt-injection/exfil signatures, 12 site providers. MCP fetch also supports query-focused extraction, readability, and token budgets. |
| Explicit opt-in browser rendering for JS-heavy pages through a configured Chrome DevTools Protocol WebSocket endpoint. No Chromium is bundled and default |
| Transcribe and diarize. FluidAudio (Parakeet TDT v3) on Apple Neural Engine, 131x realtime on a 2-hour clip, word-level timestamps, 25 EU languages, optional Qwen3-ASR for zh/ja/ko/vi, optional active reading via MCP sampling. |
| Monitor a URL and push notifications via subscribable MCP resources. RSS for the entire web. Conditional GETs, semantic diff, adaptive backoff. |
| Persistent install of inference model binaries. Supports |
| MCP 2025-11-25 server. stdio + Streamable HTTP. 12 tools, 4 prompts, 2+N resources, structured logging, sampling, roots, elicitation. |
| Apple Vision OCR engine. 15 languages. Apple Neural Engine accelerated. ~10-50 ms per image. macOS only. |
Security: prompt-injection defense
Web pages increasingly carry instructions written for the AI, not for you — concealed in HTML comments, display:none / aria-hidden text, data-ai / data-mcp / data-agent attribute payloads, or WebMCP manifests. Fetch such a page with a naive tool and those hidden instructions land straight in your model's context, where they can be acted on. This is the prompt-injection-as-phishing class of attack.
nab treats every fetched page as hostile input and runs two local, non-networked guards before any content reaches your agent — on by default, no flag, no setup:
Secure Ingestion guard — detects and strips machine-targeted markup that is invisible to humans (AI-addressed comments, hidden
display:none/aria-hiddentext, agent-onlydata-*payloads, WebMCP advertisements) and reports each detection atInfo/Warn/Blockseverity, so you see what a page tried to tell your agent instead of it being silently executed.YARA-X signature guard — scans every returned body for prompt-injection, exfiltration, secret-leak, and obfuscation signatures, redacting matched sections by default. Set
NAB_YARA_ACTION=refuseto block the fetch outright (orNAB_YARA_BYPASS=1as an audited emergency opt-out).
The net effect: hidden instructions become visible to you, not executed by your model — a strong reason to point your agent at nab fetch instead of a built-in web-fetch tool.
Licensing: both guards are Enterprise Edition modules — free for personal and non-commercial use under PolyForm Noncommercial 1.0.0; commercial / business use requires a commercial license (see COMMERCIAL.md and the License section).
Installation
Homebrew (macOS, recommended)
brew tap MikkoParkkola/tap
brew install nabPre-built binary (no Rust toolchain required)
Most users want this path — these are ready-to-run binaries; nothing is compiled on your machine.
If you have cargo-binstall, it fetches the right pre-built binary automatically:
cargo binstall nabOtherwise download directly from GitHub Releases. Both the nab CLI and the nab-mcp server ship for every platform below, alongside checksums-sha256.txt:
Platform | CLI binary | MCP server binary |
macOS Apple Silicon |
|
|
macOS Intel |
|
|
Linux x86_64 (glibc) |
|
|
Linux x86_64 (static musl) |
|
|
Linux ARM64 (glibc) |
|
|
Linux ARM64 (static musl) |
|
|
Windows x64 |
|
|
Example install for macOS Apple Silicon (substitute the filename for your platform):
shasum -a 256 -c checksums-sha256.txt --ignore-missing
chmod +x nab-aarch64-apple-darwin
mv nab-aarch64-apple-darwin /usr/local/bin/nab
xattr -d com.apple.quarantine /usr/local/bin/nab 2>/dev/null || trueFrom crates.io (compiles from source)
Builds nab locally — requires the Rust toolchain (1.95 or newer) and takes a few minutes:
cargo install nabFrom source
git clone https://github.com/MikkoParkkola/nab.git
cd nab
cargo install --path .Avoiding duplicate installs
If you install nab through more than one channel (for example a Homebrew tap
and cargo install), the copy that wins depends on PATH order. On many
setups /opt/homebrew/bin comes before ~/.cargo/bin, so a Homebrew binary can
shadow a newer cargo-installed one — and nab --version then reports the older
version.
Run the built-in diagnostic to see every nab on your PATH, which one wins,
and their versions:
nab doctorIf the binary on your PATH is the stale one, its doctor may predate this
command; invoke the newer install by full path to diagnose, e.g.
~/.cargo/bin/nab doctor. To resolve, keep a single install channel
(brew uninstall nab or cargo uninstall nab), or reorder PATH so the
directory of the install you want comes first.
MCP Configuration
Add to your MCP client config (Claude Desktop, Cursor, Windsurf, etc.):
{
"mcpServers": {
"nab": {
"command": "nab-mcp"
}
}
}Or use the auto-installer:
nab mcp install # Claude Desktop (default)
nab mcp install --client claude-code # Claude Code
nab mcp install --client cursor # Cursor
nab mcp install --client windsurf # Windsurf
nab mcp install --client codex # OpenAI Codex CLI
nab mcp install --client vscode # VS Code Copilot
nab mcp install --client zed # Zed
nab mcp install --dry-run # preview without writingAlso supported: gemini, amazon-q, lm-studio.
See MCP integration below for the full list of tools, capabilities, and HTTP transport.
Claude Code plugin
This repository includes a local Claude Code plugin in plugin/. It bundles nab MCP auto-registration with the Claude Elite research, url-insight, wayback, ia, and oreilly skills.
claude --plugin-dir ./pluginThe plugin exposes the /nab workflow shape for fetch, authenticated Brave-cookie fetches, archive retrieval, and multi-source research. It keeps nab's auth-aware path front and center: nab fetch --cookies brave <url> for existing browser sessions and nab fetch --1password <url> for 1Password/TOTP flows.
Usage
Fetch
# Basic fetch — auto-detects browser, returns markdown
nab fetch https://example.com
# Use cookies from a specific browser
nab fetch https://github.com/notifications --cookies brave
# 1Password auto-login (TOTP/MFA supported)
nab fetch https://internal.company.com --1password
# Google Workspace (Docs, Sheets, Slides) with comments
nab fetch --cookies brave "https://docs.google.com/document/d/DOCID/edit"
# Output JSON with confidence scores
nab fetch https://example.com --format json
# Batch fetch with parallelism
nab fetch --batch urls.txt --parallel 8
# Explicit browser rendering for JS-heavy pages
NAB_BROWSER_CDP_WS=wss://... nab browser https://example.com
nab fetch https://example.com --render --browser-cdp-url wss://...Common flags for fetch:
Flag | Description |
|
|
| 1Password credential lookup + auto-login |
| HTTP or SOCKS5 proxy |
|
|
| Skip markdown conversion |
| Force readability extraction for generic HTML pages |
| Apply an output token envelope; returned markdown uses 80% for headroom |
| Opt in to remote thin-content recovery via |
| Opt in to configured CDP browser rendering for JS-heavy pages; requires |
| Show what changed since the last fetch |
| HTTP method + body |
| Write body to file |
MCP fetch additionally supports focus, readability, max_tokens, and session parameters for query-focused extraction, readability extraction, structure-aware token budgets, and persistent encrypted cookie sessions.
Analyze
nab analyze transcribes audio and video files locally. The default backend on macOS arm64 is FluidAudio, which runs Parakeet TDT v3 on the Apple Neural Engine.
# Download the ASR model (~600 MB, one-time)
nab models fetch fluidaudio
# Transcribe a video
nab analyze interview.mp4
# Add speaker diarization (PyAnnote community-1)
nab analyze interview.mp4 --diarize
# Force a language hint (BCP-47)
nab analyze podcast.mp3 --language fi
# Word-level timestamps
nab analyze talk.mp4 --word-timestamps
# Active reading: nab uses MCP sampling to look up references mentioned in the audio
nab analyze interview.mp4 --active-reading
# Expose speaker embeddings for matching against hebb's voiceprint database
nab analyze interview.mp4 --diarize --include-embeddings
# Output JSON
nab analyze podcast.mp3 --format jsonReal numbers from a 2 h 09 m English audio file (Karen Hao interview, MacBook Pro M-series):
Metric | Value |
Wall time | 59.6 s |
Realtime factor | 131x |
FluidAudio mean confidence | 97.18 % |
Audio extraction (ffmpeg) | ~650x realtime |
Backend | Platform | Languages | Diarization |
| macOS arm64 | 25 EU languages, +zh/ja/ko/vi via Qwen3-ASR (opt-in) | PyAnnote community-1 |
| Linux/x86, macOS, Windows | Parakeet ONNX, 25+ langs | sherpa-onnx pyannote-seg-3.0 |
| Universal fallback | whisper-large-v3-turbo, 99 langs | none |
Watch
nab watch turns any URL into a subscribable resource. MCP clients receive notifications/resources/updated when the content changes.
nab watch add https://news.ycombinator.com --interval 10m
nab watch add https://example.com/pricing --interval 1h --selector "table.pricing"
nab watch add https://api.openai.com/status --interval 5m --notify-on regression
nab watch list
nab watch logs <id>
nab watch remove <id>Per-watch options:
Flag | Default | Description |
| 1h | Polling interval ( |
| none | CSS selector to scope diff to one element |
|
|
|
|
|
|
The poller uses conditional GETs (If-None-Match, If-Modified-Since), so 304 responses cost effectively nothing. Watches with five consecutive failures auto-mute. Adaptive backoff applies on 429 and 503.
Models
nab models list # show installed model versions
nab models fetch fluidaudio # download FluidAudio binary + Parakeet weights
nab models update fluidaudio # check for upstream updates
nab models verify fluidaudio # checksum + smoke testBoth whisper and sherpa-onnx ship as cross-platform fallbacks alongside the macOS-default fluidaudio backend.
MCP integration
nab-mcp is a native Rust MCP server. It runs over stdio (default) or Streamable HTTP. It is fully compliant with MCP protocol version 2025-11-25.
Quick setup (recommended)
nab mcp install # Claude Desktop (default)
nab mcp install --client claude-code # Claude Code
nab mcp install --client cursor # Cursor
nab mcp install --client windsurf # Windsurf
nab mcp install --client codex # OpenAI Codex CLI
nab mcp install --client vscode # VS Code Copilot
nab mcp install --client zed # Zed
nab mcp install --dry-run # preview what would changeAlso supported: gemini, amazon-q, lm-studio. This auto-detects the nab-mcp binary path, backs up your existing config, and adds the nab entry. Restart your client after installing.
Manual setup
Add to your MCP client configuration (~/.config/claude/mcp.json or equivalent):
{
"mcpServers": {
"nab": {
"command": "nab-mcp"
}
}
}HTTP transport
nab mcp serve --http 127.0.0.1:8765
# or directly:
nab-mcp --http 127.0.0.1:8765Bind to localhost by default. Origin checks and MCP-Protocol-Version header validation are enforced per spec.
MCP capabilities
Capability | Status |
Tools | 12 tools with structured output schemas, annotations, validation errors |
Prompts | 4 prompts ( |
Resources | 2 static + N dynamic watch resources, all subscribable |
Logging |
|
Sampling | nab calls back to the host LLM for active reading, focus extraction, form auto-fill |
Roots |
|
Elicitation | Form mode + URL mode for OAuth/SSO |
Argument completion |
|
Server icons | Light + dark SVG |
Transports | stdio + Streamable HTTP (resumable, session-scoped) |
The 12 MCP tools:
Tool | Description |
| Fetch URL → markdown, with cookies, focus, token budget, session |
| Parallel multi-URL fetch with task-augmented async execution |
| Submit a form with CSRF + smart field extraction |
| 1Password auto-login with TOTP support |
| Look up 1Password credentials for a URL |
| Generate browser fingerprint profiles |
| Run the validation test suite |
| Time URL fetches with stats |
| Transcribe and diarize audio/video |
| Create a URL watch and subscribe |
| Manage watches |
Site providers
nab detects URLs for 12 platforms and uses APIs or stable structured page data instead of broad HTML scraping.
Provider | URL pattern | Method |
Twitter / X |
| FxTwitter API |
| JSON API | |
Hacker News |
| Firebase API |
GitHub |
| REST API |
Google Workspace | Docs, Sheets, Slides | Export API + OOXML |
YouTube |
| oEmbed |
Wikipedia |
| REST API |
StackOverflow |
| API |
Mastodon |
| ActivityPub |
| oEmbed | |
| oEmbed | |
Substack |
| Article DOM ( |
If no provider matches, nab falls back to standard HTML fetch + markdown conversion.
Architecture
nab is built around a small set of orthogonal subsystems: cmd/ (CLI), bin/mcp_server/ (MCP server), content/ (HTML / PDF / OCR pipeline), analyze/ (ASR + diarization + vision), watch/ (URL monitoring + subscriptions), auth/ (cookies + 1Password + WebAuthn), site/ (per-site providers), and the shared AcceleratedClient (HTTP/3 + connection pool + fingerprint store).
See:
docs/ARCHITECTURE.md — full module map and data flow
docs/sovereign-stack.md — how nab composes with hebb to form a local-first multimodal stack
docs/getting-started.md — new user onboarding
Design notes
The docs/design/ directory tracks recent design proposals:
analyze-v2.md — multilingual ASR + diarization + vision pipeline
url-watch-resources.md — URL watch as MCP subscribable resources
active-reading.md — active reading via MCP sampling
mcp-spec-closure.md — closing the last MCP 2025-11-25 spec gaps
Companion tools
nab is half of a sovereign multimodal stack. The other half is hebb, a neuroscience-inspired memory MCP server. Composition examples:
nab analyze --diarize --include-embeddings→hebb voice_match→ speakers labeled with namesnab fetch URL→hebb kv_set→ personal sovereign web memorynab watch add URL→hebb kv_set(on update) → time-series of changes to any web page
See docs/sovereign-stack.md for the full composition story.
Configuration
nab requires no configuration files. It uses smart defaults: auto-detected browser cookies, randomized fingerprints, and markdown output.
Persistent state lives in ~/.nab/:
Path | Purpose |
| Content snapshots for |
| AES-256-GCM encrypted named-session jars (non-Windows) |
| Locally generated master key for session encryption (non-Windows) |
| Cached browser versions (auto-updates every 14 days) |
| URL watch state |
| Installed inference model binaries |
Optional plugin configuration at ~/.config/nab/plugins.toml. See docs/getting-started.md for plugin examples.
Environment variables
Variable | Purpose |
| HTTPS proxy URL |
| HTTP proxy URL |
| Proxy for all protocols |
| Logging level (e.g., |
| Pushover notifications for MFA |
| Telegram notifications for MFA |
Library usage
use nab::AcceleratedClient;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let client = AcceleratedClient::new()?;
let html = client.fetch_text("https://example.com").await?;
println!("Fetched {} bytes", html.len());
Ok(())
}Requirements
Rust 1.95+ for building from source
ffmpeg for
analyzeandstreamcommands:brew install ffmpeg1Password CLI (optional, for credential integration): see 1Password docs
Contributing
See CONTRIBUTING.md for development setup, code style guidelines, testing instructions, and pull request process.
Responsible use
This tool includes browser cookie extraction and fingerprint spoofing capabilities. They are intended for legitimate use cases — accessing your own authenticated content, automated testing, sites where you have authorization. Use responsibly.
Troubleshooting
MCP server not connecting? Run nab-mcp directly in your terminal to see errors. Verify the binary exists with which nab-mcp. If installed via cargo install nab, both nab and nab-mcp should be on your $PATH.
Cookie extraction failing? Grant Full Disk Access to your terminal in System Settings > Privacy & Security > Full Disk Access (macOS). Browser cookies are stored in protected directories. Use --cookies brave to target a specific browser.
ASR model not found? Run nab models fetch fluidaudio to download the model (~542 MB). The model directory is ~/.nab/models/. Use nab models list to see what's installed.
Fetch returning HTML instead of markdown? Some sites block automated access. Try nab fetch URL --cookies brave to use your browser session, or nab fetch URL --1password for sites that need login.
Fetch returning thin content from a JavaScript app? Default nab fetch stays local-first and HTTP-only. For pages that need DOM execution, configure an external CDP endpoint with NAB_BROWSER_CDP_WS or --browser-cdp-url, then run nab browser URL or nab fetch URL --render. Remote browser providers may receive the URL and rendered page content; local browser cookies are not automatically available to remote browsers.
YARA-X guard redacted a fetch? nab fetch and MCP fetch scan returned bodies by default before saving or returning content. NAB_YARA_ACTION=refuse blocks instead of redacting. NAB_YARA_BYPASS=1 is an audited emergency opt-out.
"too many open files" on watch? Increase your ulimit: ulimit -n 4096. The default macOS limit (256) is too low for many concurrent watches.
Ecosystem
nab is part of a suite of MCP tools:
Tool | Description |
Universal MCP gateway — compact 12-15 tool surface replaces 100+ registrations | |
AI travel agent — 36 MCP tools for flights, hotels, ground transport | |
Web content extraction — fetch any URL with cookies + anti-bot bypass | |
macOS GUI automation — 34 MCP tools via Accessibility API |
License
nab is dual-licensed as of v0.9.0:
Scope | License | File |
Core fetch / analyze / watch / MCP server / public web fetching | MIT | |
Designated Enterprise Edition modules (authenticated reach + anti-bot) | PolyForm Noncommercial 1.0.0 |
EE-designated paths (every file carries // SPDX-License-Identifier: PolyForm-Noncommercial-1.0.0):
src/auth/— 1Password, WebAuthn, and browser-cookie injection (premium authenticated reach)src/fingerprint/— browser fingerprint spoofing (anti-bot evasion)src/waf/— WAF challenge handlingsrc/site/— per-site provider integrations (proprietary domain knowledge)src/security/— Secure Ingestion guard for stripping machine-targeted HTML directives and hidden metadatacrates/nab-yara-engine/— fetch-time YARA-X signature guard for prompt injection, exfiltration, secrets, and obfuscation
What this means in practice:
Free for noncommercial use, modification, redistribution.
Commercial use of EE modules requires a separate commercial license.
Companies can buy a standard commercial-use license via GitHub Sponsors at EUR 500/month per named project.
See COMMERCIAL.md for business use, forks, wrappers, shared services, and managed-service deployments.
All releases prior to v0.9.0 remain entirely MIT and stay MIT forever.
Maintenance
Latest Blog Posts
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/MikkoParkkola/nab'
If you have feedback or need assistance with the MCP directory API, please join our Discord server