Skip to main content
Glama

BrowseAI Dev

npm PyPI LangChain License Discord

Research infrastructure for AI agents with Grounded Intelligence — real-time web search, evidence extraction, verification, and structured citations. Every claim is backed by a URL. Every answer has a confidence score.

Agent → BrowseAI Dev → Internet → Verified answers + sources

Website · Playground · API Docs · Alternatives · Discord

Package names: npm: browseai-dev · PyPI: browseaidev · LangChain: langchain-browseaidev — Previously browse-ai and browseai. Old names still work and redirect automatically.


How It Works

search → fetch pages → neural rerank → extract claims → verify → cited answer (streamed)

Every answer goes through a multi-step verification pipeline. No hallucination. Every claim is backed by a real source.

Verification & Confidence Scoring

Confidence scores are evidence-based — not LLM self-assessed. After the LLM extracts claims and sources, a post-extraction verification engine checks every claim against the actual source page text:

  1. Atomic claim decomposition — Compound claims are auto-split into individual verifiable facts. "Tesla had $96B revenue and 1.8M deliveries" becomes two atomic claims, each verified independently.

  2. Hybrid retrieval combining keyword and semantic matching — For each claim, keyword matching finds lexical matches and dense embeddings find semantic matches from source text. Rankings are fused to catch paraphrased evidence that keyword matching alone misses (e.g., "prevents fabricated answers" matching "reduces hallucinations"). Premium tier only, with graceful keyword-only fallback.

  3. Semantic evidence reranking — Top candidates per claim are reranked by a purpose-built verification model trained on 1.4M+ claim-evidence pairs that improves with every query. Selects the best supporting evidence, applies contradiction penalties and paraphrase boosts.

  4. Multi-provider search — Parallel search across multiple providers for broader source diversity. More independent sources = stronger cross-reference = higher confidence.

  5. Domain authority scoring — 10,000+ domains across 5 tiers (institutional .gov/.edu → major news → tech journalism → community → low-quality). Dynamic scoring that improves from real verification data.

  6. Source quote verification — LLM-extracted quotes verified against actual page text using multi-strategy matching.

  7. Cross-source consensus — Each claim verified against all available page texts. Claims supported by 3+ independent domains get "strong consensus". Single-source claims flagged as "weak".

  8. Contradiction detection — Claim pairs analyzed for semantic conflicts using topic overlap and contradiction classification. Detected contradictions surfaced in the response and penalize confidence.

  9. Multi-pass consistency — In thorough mode, claims are cross-checked across independent extraction passes. Claims confirmed by both passes get boosted; inconsistent claims are penalized.

  10. Auto-calibrated confidence — Multi-factor confidence formula auto-adjusts from real user feedback. Predicted confidence aligns with actual accuracy over time. Factors: verification rate, domain authority, source count, consensus, domain diversity, claim grounding, source recency, and citation depth.

  11. Per-claim evidence retrieval — Weak claims get targeted search queries generated by LLM, then searched individually across all providers. Each claim gets its own evidence pool instead of sharing the same corpus.

  12. Counter-query verification — Verified claims are stress-tested with adversarial "what would disprove this?" search queries. If counter-evidence is found, claim confidence is penalized.

  13. Iterative confidence-gated retrieval — Thorough mode uses a confidence-gated loop: verify → if weak claims remain → generate targeted query → search → re-verify. Loops up to 3 iterations with early termination when queries repeat or confidence meets threshold.

Claims include verified, verificationScore, consensusCount, and consensusLevel fields. Sources include verified and authority. Detected contradictions are returned at the top level. Agents can use these fields to make trust decisions programmatically.

Graceful fallback: When premium keys are not set, the system runs keyword-only verification. Semantic retrieval and reranking are transparent premium enhancements — no degradation, no errors.

Depth Modes

Three depth levels control research thoroughness:

Depth

Behavior

Use case

fast (default)

Single search → extract → verify pass

Quick lookups, real-time agents

thorough

Iterative confidence-gated loop (up to 3 passes), per-claim evidence retrieval, counter-query verification, multi-pass consistency checking

Important research, fact-checking

deep

Premium multi-step agentic research: iterative think-search-extract-evaluate cycles (up to 4 total steps). Gap analysis identifies missing info, generates follow-up queries. Claims/sources merged across steps with final re-verification. Target confidence: 0.85. Requires BAI key + sign-in. Falls back to thorough when quota exhausted.

Complex research questions, comprehensive analysis

# Thorough mode
curl -X POST https://browseai.dev/api/browse/answer \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bai_xxx" \
  -d '{"query": "What is quantum computing?", "depth": "thorough"}'

# Deep mode (uses premium features)
curl -X POST https://browseai.dev/api/browse/answer \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bai_xxx" \
  -d '{"query": "Compare CRISPR approaches for sickle cell disease", "depth": "deep"}'

Deep mode runs iterative think-search-extract-evaluate cycles: each step performs gap analysis to identify what's missing, generates targeted follow-up queries, and merges claims/sources across steps with a final re-verification pass. It targets a confidence threshold of 0.85 (DEEP_CONFIDENCE_THRESHOLD) and runs up to 3 follow-up steps (MAX_FOLLOW_UP_STEPS, 4 total including the initial pass). Uses semantic reranking, multi-provider search, and multi-pass consistency. Each deep query costs 3x quota (100 deep queries/day). When quota is exhausted, deep mode gracefully falls back to thorough. Without a BAI key, deep mode also falls back to thorough.

Deep mode responses include reasoningSteps showing the multi-step research process (step number, query, gap analysis, claim count, confidence per step).

Streaming API

Get real-time progress with per-token answer streaming. The streaming endpoint sends Server-Sent Events (SSE) as each pipeline step completes. Deep mode steps are grouped by research pass for clean progress display:

curl -N -X POST https://browseai.dev/api/browse/answer/stream \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bai_xxx" \
  -d '{"query": "What is quantum computing?"}'

Events: trace (progress), sources (discovered early), token (streamed answer text), result (final answer), done.

Retry with Backoff

All external API calls (search providers, LLM, page fetching) automatically retry on transient failures (429 rate limits, 5xx server errors) with exponential backoff and jitter. Auth errors (401/403) fail immediately — no wasted retries.

Research Memory (Sessions)

Persistent research sessions that accumulate knowledge across multiple queries. Later queries automatically recall prior verified claims, building deeper understanding over time.

Sessions require a BrowseAI Dev API key (bai_xxx) for identity and ownership. Get a free key at browseai.dev/dashboard. For MCP, set BROWSE_API_KEY env var. For Python SDK, pass api_key="bai_xxx". For REST API, use Authorization: Bearer bai_xxx.

# Python SDK
session = client.session("quantum-research")
r1 = session.ask("What is quantum entanglement?")       # 13 claims stored
r2 = session.ask("How is entanglement used in computing?")  # 12 claims recalled!
knowledge = session.knowledge()  # Export all accumulated claims

# Share with other agents or humans
share = session.share()  # Returns shareId + URL
# Another agent forks and continues the research
forked = client.fork_session(share.share_id)
# REST API
curl -X POST https://browseai.dev/api/session \
  -H "Authorization: Bearer bai_xxx" \
  -d '{"name": "my-research"}'
# Returns session ID, then:
curl -X POST https://browseai.dev/api/session/{id}/ask \
  -H "Authorization: Bearer bai_xxx" \
  -d '{"query": "What is quantum entanglement?"}'

# Share a session publicly
curl -X POST https://browseai.dev/api/session/{id}/share \
  -H "Authorization: Bearer bai_xxx"

# Fork a shared session (copies all knowledge)
curl -X POST https://browseai.dev/api/session/share/{shareId}/fork \
  -H "Authorization: Bearer bai_xxx"

Each session response includes recalledClaims and newClaimsStored. Sessions can be shared publicly and forked by other agents — enabling collaborative, multi-agent research workflows.

Query Planning

Complex queries are automatically decomposed into focused sub-queries with intent labels (definition, evidence, comparison, counterargument, technical, historical). Each sub-query targets a different aspect of the question, maximizing source diversity. Simple factual queries skip planning entirely — no added latency.

Self-Improving Accuracy

The entire verification pipeline improves automatically with usage:

  • Domain authority — Dynamic scoring adjusts domain trust scores as evidence accumulates. Static tier scores dominate initially, then real verification rates take over.

  • Adaptive verification thresholds — Claim verification thresholds tune per query type based on observed verification rates. Too strict? Loosens up. Too lenient? Tightens.

  • Consensus threshold tuning — Cross-source agreement thresholds adapt based on query type performance.

  • Confidence weight optimization — The multi-factor confidence formula rebalances weights per query type when user feedback indicates inaccuracy.

  • Page count optimization — Source fetch counts adjust based on confidence outcomes per query type.

Feedback Loop

Submit feedback on results to accelerate learning. Agents and users can rate results as good, bad, or wrong — this feeds directly into the adaptive threshold engine.

curl -X POST https://browseai.dev/api/browse/feedback \
  -H "Content-Type: application/json" \
  -d '{"resultId": "abc123", "rating": "good"}'
client.feedback(result_id="abc123", rating="good")
# Or flag a specific wrong claim:
client.feedback(result_id="abc123", rating="wrong", claim_index=2)

Quick Start

Python SDK

pip install browseaidev
from browseaidev import BrowseAIDev

client = BrowseAIDev(api_key="bai_xxx")

# Research with citations
result = client.ask("What is quantum computing?")
print(result.answer)
print(f"Confidence: {result.confidence:.0%}")
for source in result.sources:
    print(f"  - {source.title}: {source.url}")

# Thorough mode — auto-retries if confidence < 60%
thorough = client.ask("What is quantum computing?", depth="thorough")

# Deep mode — multi-step reasoning with gap analysis (requires BAI key)
deep = client.ask("Compare CRISPR approaches for sickle cell disease", depth="deep")
for step in deep.reasoning_steps or []:
    print(f"  Step {step.step}: {step.query} ({step.confidence:.0%})")

LangChain integration: (PyPI)

pip install langchain-browseaidev
from langchain_browseaidev import BrowseAIDevAnswerTool, BrowseAIDevSearchTool

# Use with any LangChain agent
tools = [
    BrowseAIDevAnswerTool(api_key="bai_xxx"),   # Verified search with citations
    BrowseAIDevSearchTool(api_key="bai_xxx"),    # Basic web search
]

# Standalone usage
tool = BrowseAIDevAnswerTool(api_key="bai_xxx")
result = tool.invoke({"query": "What is quantum computing?", "depth": "thorough"})

5 tools available: BrowseAIDevSearchTool, BrowseAIDevAnswerTool (verified), BrowseAIDevExtractTool, BrowseAIDevCompareTool, BrowseAIDevClarityTool (anti-hallucination).

MCP Server (Claude Desktop, Cursor, Windsurf)

npx browseai-dev setup

Or manually add to your MCP config:

{
  "mcpServers": {
    "browseai-dev": {
      "command": "npx",
      "args": ["-y", "browseai-dev"],
      "env": {
        "BROWSE_API_KEY": "bai_xxx"
      }
    }
  }
}

Get a free API key at browseai.dev/dashboard.

REST API

# Basic query
curl -X POST https://browseai.dev/api/browse/answer \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bai_xxx" \
  -d '{"query": "What is quantum computing?"}'

# Thorough mode (auto-retries if confidence < 60%)
curl -X POST https://browseai.dev/api/browse/answer \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bai_xxx" \
  -d '{"query": "What is quantum computing?", "depth": "thorough"}'

# Deep mode (multi-step reasoning)
curl -X POST https://browseai.dev/api/browse/answer \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bai_xxx" \
  -d '{"query": "Compare CRISPR approaches", "depth": "deep"}'

Self-Host

The MCP server and frontend are open-source and can be run locally. The verification engine is a hosted service — all API requests are processed by the BrowseAI Dev cloud infrastructure.

git clone https://github.com/BrowseAI-HQ/BrowseAI-Dev.git
cd BrowseAI-Dev
pnpm install
pnpm dev:web    # Run the frontend locally (API calls go to browseai.dev)

API Keys

All API access requires a BrowseAI Dev API key (bai_xxx). Sign up for free at browseai.dev/dashboard.

Method

How

Verification

Limits

BrowseAI Dev API Key (Free)

Authorization: Bearer bai_xxx

Full premium — semantic verification, multi-provider, multi-pass consistency

Generous quota with graceful fallback

BrowseAI Dev API Key (Pro)

Authorization: Bearer bai_xxx

Full premium — unlimited, no fallback

Unlimited + priority queue, managed keys, team seats

Demo (website)

No auth needed

Keyword verification

1 query/hour per IP

The free tier includes 100 premium queries/day (or ~33 deep queries/day at 3x cost each). When the quota is reached, queries gracefully fall back to keyword verification (or deep falls back to thorough) — still works, just basic matching. Quota resets every 24 hours. Pro removes all limits.

API responses include quota info when using a BAI key:

{
  "success": true,
  "result": { ... },
  "quota": { "used": 12, "limit": 100, "premiumActive": true }
}

Project Structure

/apps/mcp              MCP server (stdio transport, npm: browseai-dev)
/packages/shared       Shared types, Zod schemas, constants
/packages/python-sdk   Python SDK (PyPI: browseaidev)
/src                   React frontend (Vite, port 8080)
/supabase              Database migrations

The verification engine (API server) is in a separate private repository (BrowseAI-HQ/browseaidev-engine) and runs as a hosted service.

API Endpoints

Endpoint

Description

POST /browse/search

Search the web

POST /browse/open

Fetch and parse a page

POST /browse/extract

Extract structured claims from a page

POST /browse/answer

Full pipeline: search + extract + cite. depth: "fast", "thorough", or "deep"

POST /browse/answer/stream

Streaming answer via SSE — real-time token streaming + progress events

POST /browse/compare

Compare raw LLM vs evidence-backed answer

POST /browse/clarity

Clarity — anti-hallucination answer engine. Three modes: mode: "prompt" (enhanced prompts only), mode: "answer" (LLM answer, default), mode: "verified" (LLM + web fusion). Legacy verify: true = mode: "verified"

GET /browse/share/:id

Get a shared result

GET /browse/stats

Total queries answered

GET /browse/sources/top

Top cited source domains

GET /browse/analytics/summary

Usage analytics (authenticated)

POST /session

Create a research session

POST /session/:id/ask

Research with session memory (recalls + stores claims)

POST /session/:id/recall

Query session knowledge without new search

GET /session/:id/knowledge

Export all session claims

POST /session/:id/share

Share a session publicly (returns shareId)

GET /session/share/:shareId

View a shared session (public, no auth)

POST /session/share/:shareId/fork

Fork a shared session into your account

GET /session/:id

Get session details

GET /sessions

List your sessions (authenticated)

DELETE /session/:id

Delete a session (authenticated)

POST /browse/feedback

Submit feedback on a result (good/bad/wrong)

GET /browse/learning/stats

Self-learning engine stats

GET /user/stats

Your query stats (authenticated)

GET /user/history

Your query history (authenticated)

DELETE /user/data

Delete all your data (GDPR right to erasure)

MCP Tools

Tool

Description

browse_search

Search the web for information on any topic

browse_open

Fetch and parse a web page into clean text

browse_extract

Extract structured claims from a page

browse_answer

Full pipeline: search + extract + cite. depth: "fast", "thorough", or "deep"

browse_compare

Compare raw LLM vs evidence-backed answer

browse_clarity

Anti-hallucination answer engine — three modes: prompt (prompts only), answer (LLM), verified (LLM + web fusion)

browse_session_create

Create a research session (persistent memory)

browse_session_ask

Research within a session (recalls prior knowledge)

browse_session_recall

Query session knowledge without new web search

browse_session_share

Share a session publicly (returns share URL)

browse_session_knowledge

Export all claims from a session

browse_session_fork

Fork a shared session to continue the research

browse_feedback

Submit feedback on a result to improve accuracy

Python SDK

Method

Description

client.search(query)

Search the web

client.open(url)

Fetch and parse a page

client.extract(url, query=)

Extract claims from a page

client.ask(query, depth=)

Full pipeline with citations. depth: "fast", "thorough", or "deep"

client.compare(query)

Raw LLM vs evidence-backed

client.session(name)

Create a research session

session.ask(query, depth=)

Research with memory recall

session.recall(query)

Query session knowledge

session.knowledge()

Export all session claims

session.share()

Share session publicly (returns shareId + URL)

client.get_session(id)

Resume an existing session by ID

client.list_sessions()

List all your sessions

client.fork_session(share_id)

Fork a shared session into your account

session.delete()

Delete a session

client.feedback(result_id, rating)

Submit feedback (good/bad/wrong) to improve accuracy

Async support: AsyncBrowseAIDev with the same API.

Enterprise Search Providers

Use BrowseAI Dev with your own data sources instead of — or alongside — public web search. Supports Elasticsearch, Confluence, and custom endpoints with optional zero data retention for compliance.

# Elasticsearch
result = client.ask("What is our refund policy?", search_provider={
    "type": "elasticsearch",
    "endpoint": "https://es.internal.company.com/kb/_search",
    "authHeader": "Bearer es-token-xxx",
    "index": "docs",
})

# Confluence
result = client.ask("PCI compliance process?", search_provider={
    "type": "confluence",
    "endpoint": "https://company.atlassian.net/wiki/rest/api",
    "authHeader": "Basic base64-creds",
    "spaceKey": "ENG",
})

# Zero data retention (nothing stored, cached, or logged)
result = client.ask("Patient protocols", search_provider={
    "type": "elasticsearch",
    "endpoint": "https://es.hipaa.company.com/medical/_search",
    "authHeader": "Bearer token",
    "dataRetention": "none",
})
# REST API — enterprise search
curl -X POST https://browseai.dev/api/browse/answer \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bai_xxx" \
  -d '{
    "query": "What is our refund policy?",
    "searchProvider": {
      "type": "elasticsearch",
      "endpoint": "https://es.internal.company.com/kb/_search",
      "authHeader": "Bearer es-token-xxx",
      "index": "docs"
    }
  }'

Response Structure

Every answer includes structured fields for programmatic trust decisions:

{
  "answer": "Quantum computing uses qubits...",
  "confidence": 0.82,
  "shareId": "abc123def456",
  "effectiveDepth": "thorough",
  "claims": [
    {
      "claim": "Qubits can exist in superposition",
      "sources": ["https://en.wikipedia.org/wiki/Qubit"],
      "verified": true,
      "verificationScore": 0.87,
      "consensusCount": 3,
      "consensusLevel": "strong"
    }
  ],
  "sources": [
    {
      "url": "https://en.wikipedia.org/wiki/Qubit",
      "title": "Qubit - Wikipedia",
      "domain": "en.wikipedia.org",
      "quote": "A qubit is the basic unit of quantum information...",
      "verified": true,
      "authority": 0.70
    }
  ],
  "contradictions": [
    {
      "claimA": "Quantum computers are faster for all tasks",
      "claimB": "Quantum advantage only applies to specific problems",
      "topic": "quantum computing performance",
      "nliConfidence": 0.89
    }
  ],
  "reasoningSteps": [
    { "step": 1, "query": "quantum computing basics", "gapAnalysis": "Initial research pass", "claimCount": 8, "confidence": 0.65 },
    { "step": 2, "query": "quantum computing vs classical comparison", "gapAnalysis": "Missing classical vs quantum comparison", "claimCount": 14, "confidence": 0.82 }
  ],
  "trace": [
    { "step": "Search Web", "duration_ms": 423, "detail": "5 results" },
    { "step": "Fetch Pages", "duration_ms": 1205, "detail": "4 pages" }
  ],
  "quota": { "used": 12, "limit": 50, "premiumActive": true }
}

Key fields:

  • confidence — evidence-based score (0-1), not LLM self-assessed

  • shareId — unique ID for sharing this result (use with /browse/share/:id)

  • effectiveDepth — actual depth used ("fast", "thorough", or "deep") — may differ from requested depth due to fallback

  • claims[].verified — whether the claim was verified against source text

  • claims[].consensusLevel"strong" (3+ sources), "moderate", or "weak"

  • contradictions — detected conflicts between claims (with confidence score)

  • reasoningSteps — deep mode only: multi-step research iterations with gap analysis

  • trace — execution timeline for debugging and monitoring

  • quota — premium quota usage (BAI key users only): used, limit, premiumActive

Examples

See the examples/ directory for ready-to-run agent recipes:

Agent Recipes

Example

Description

research-agent.py

Simple research agent with citations

deep-research-agent.py

Multi-step deep reasoning with gap analysis

streaming-agent.py

Real-time SSE streaming with progress events

contradiction-detector.py

Surface contradictions across sources

enterprise-search.py

Custom data sources + zero retention mode

code-research-agent.py

Research libraries/docs before writing code

hallucination-detector.py

Compare raw LLM vs evidence-backed answers

langchain-agent.py

BrowseAI Dev as a LangChain tool

crewai-research-team.py

Multi-agent research team with CrewAI

research-session.py

Research sessions with persistent memory

Tutorials

Tutorial

What You'll Build

coding-agent/

Agent that researches before writing code — never recommends deprecated libraries

support-agent/

Agent that verifies answers before responding — escalates when confidence is low

content-agent/

Agent that writes blog posts where every stat has a citation

fact-checker-bot/

Discord bot that verifies any claim with !verify and !compare

is-this-true/

Web app — paste any sentence, get a confidence score and sources

debate-settler/

CLI tool — two claims battle it out, evidence decides the winner

docs-verifier/

Verify every factual claim in your README or docs

podcast-prep/

Research brief builder for podcast interviews

Environment Variables

These are for running the MCP server or frontend locally. The verification engine runs as a hosted service and does not require local configuration.

Variable

Required

Description

BROWSE_API_KEY

Yes (MCP)

BrowseAI Dev API key (bai_xxx) — get one at browseai.dev/dashboard

Tech Stack

  • API: Node.js, TypeScript, Fastify, Zod

  • Search: Multi-provider (parallel search across sources)

  • Parsing: @mozilla/readability + linkedom

  • AI: LLM via OpenRouter

  • Caching: Redis or in-memory with intelligent TTL (time-sensitive queries get shorter TTL)

  • Frontend: React, Tailwind CSS, shadcn/ui, Framer Motion

  • Verification: Hybrid keyword + semantic matching with evidence reranking

  • MCP: @modelcontextprotocol/sdk

  • Python SDK: httpx, Pydantic

  • Database: Supabase (PostgreSQL)

Agent Skills

Pre-built skills that teach AI coding agents (Claude Code, Codex, Cursor, etc.) when and how to use BrowseAI Dev:

npx skills add BrowseAI-HQ/browseAIDev_Skills

Skill

What it does

browse-research

Evidence-backed answers with citations and confidence

browse-fact-check

Compare raw LLM vs evidence-backed, verify claims

browse-extract

Structured claim extraction from URLs

browse-sessions

Multi-query research with persistent knowledge

browse-deep-dive

Multi-step agentic research with reasoning chains and gap analysis

browse-compare-claims

Settle factual disputes — evidence-backed vs raw LLM side-by-side

browse-monitor

Track evolving topics over time, diff against prior knowledge

browse-cite

Generate formatted citations (APA/MLA) with authority scores

browse-clarity

Clarity — anti-hallucination answer engine with optional web verification

View all skills →

Community

Contributing

See CONTRIBUTING.md for setup instructions, coding conventions, and PR process.

License

This project uses an open-core model:

Component

License

What it means

SDKs, MCP server, integrations, frontend (this repo)

Apache 2.0

Use freely, modify, redistribute

Verification engine (separate private repo)

BSL 1.1

Hosted service — free to use via API, but source is not public. Converts to Apache 2.0 on 2030-03-25

See the LICENSE file for details on this repository.

Install Server
A
license - permissive license
A
quality
B
maintenance

Maintenance

Maintainers
Response time
1wRelease cycle
3Releases (12mo)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/BrowseAI-HQ/BrowseAI-Dev'

If you have feedback or need assistance with the MCP directory API, please join our Discord server