What can you do with this server?

The footnote-mcp server is a comprehensive 42-tool MCP server for source-grounded web research, focused on searching, extracting, and verifying that claims are actually supported by their sources. Web Discovery & Research * Search the web using multiple providers (Tavily, Brave, Google, Bing/DuckDuckGo) with optional recency filtering and semantic reranking * Deep search that fetches top pages and reranks chunks for LLM-ready context * Fetch and extract text from URLs (with caching and provenance) * Search scholarly/encyclopedic sources (arXiv, Wikipedia) * Retrieve archived pages from the Wayback Machine * Fetch authenticated/gated pages using custom cookies or headers * Crawl websites breadth-first (up to 50 pages) * Generate specialized search queries (e.g., site:, filetype:csv, API variants) Structured Data Extraction * Parse HTML tables into structured rows/columns with source provenance * Detect and parse downloadable files (CSV, XLSX, PDF, JSON, XML) * Fetch and parse JSON/API endpoints directly * Validate date coverage, resolve units/currencies, and reject incompatible rows * Align time series, compute deltas, and flag outliers/missing data * Export datasets to CSV, XLSX, or JSON Source Quality & Claim Verification * Classify sources (official, aggregator, blog, forum, blocked, etc.) * Check claim entailment against source excerpts (heuristic, Ollama LLM, or local NLI model) * Corroborate claims across multiple sources (corroborated/conflicting/single-source/etc.) * Locate exact supporting spans with character offsets and containment scores * Read/write a persistent source cache; generate research debug reports; run health checks Custom Extraction Recipes * Propose, generate, validate, and run task-specific extraction recipes in a sandboxed subprocess * Promote successful recipes as reusable memory and manage a recipe registry Browser Automation (JS-Heavy Pages) * Navigate a headless Chromium browser; capture page state via accessibility tree * Click, type, scroll, and extract text or tables from interactive pages * Set date ranges and submit forms; take screenshots with optional OCR

Which integrations are available for this server?

Provides scholarly search capabilities for papers on arXiv. Provides web search capabilities using Brave Search API for independent web indexing. Provides web search capabilities via DuckDuckGo (scraped) as a fallback search engine. Provides web search capabilities using Google Custom Search API. Provides local LLM-based claim entailment verification and semantic reranking using Ollama models. Provides encyclopedic search capabilities on Wikipedia.

How do I use footnote-mcp?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@footnote-mcp verify claim 'vaccines save lives' from credible sources" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

footnote-mcp

by KazKozDev

Overview Schema Related Servers Score Discussions

Python

Local

The footnote-mcp server is a comprehensive 42-tool MCP server for source-grounded web research, focused on searching, extracting, and verifying that claims are actually supported by their sources.

Web Discovery & Research

Search the web using multiple providers (Tavily, Brave, Google, Bing/DuckDuckGo) with optional recency filtering and semantic reranking
Deep search that fetches top pages and reranks chunks for LLM-ready context
Fetch and extract text from URLs (with caching and provenance)
Search scholarly/encyclopedic sources (arXiv, Wikipedia)
Retrieve archived pages from the Wayback Machine
Fetch authenticated/gated pages using custom cookies or headers
Crawl websites breadth-first (up to 50 pages)
Generate specialized search queries (e.g., site:, filetype:csv, API variants)

Structured Data Extraction

Parse HTML tables into structured rows/columns with source provenance
Detect and parse downloadable files (CSV, XLSX, PDF, JSON, XML)
Fetch and parse JSON/API endpoints directly
Validate date coverage, resolve units/currencies, and reject incompatible rows
Align time series, compute deltas, and flag outliers/missing data
Export datasets to CSV, XLSX, or JSON

Source Quality & Claim Verification

Classify sources (official, aggregator, blog, forum, blocked, etc.)
Check claim entailment against source excerpts (heuristic, Ollama LLM, or local NLI model)
Corroborate claims across multiple sources (corroborated/conflicting/single-source/etc.)
Locate exact supporting spans with character offsets and containment scores
Read/write a persistent source cache; generate research debug reports; run health checks

Custom Extraction Recipes

Propose, generate, validate, and run task-specific extraction recipes in a sandboxed subprocess
Promote successful recipes as reusable memory and manage a recipe registry

Browser Automation (JS-Heavy Pages)

Navigate a headless Chromium browser; capture page state via accessibility tree
Click, type, scroll, and extract text or tables from interactive pages
Set date ranges and submit forms; take screenshots with optional OCR

An MCP server for source-grounded web research. It searches the web, fetches and extracts pages, pulls structured data out of tables/files/APIs, and — the part that sets it apart — verifies that a claim is actually supported by its source instead of trusting a snippet. 45 tools over stdio MCP, driven by any MCP client (Claude Desktop, Cursor) or by the companion Scholiast research agent.

The design priority is trustworthiness over convenience: search snippets are treated as discovery only, every fetched page is cached with provenance, and claims are checked against the source text before they count. It also degrades gracefully — with no API keys and no config it still works (scraped search + an automatic headless-browser fallback + an offline verification heuristic); keys and env vars only make it better.

Quick start

From PyPI (Python ≥ 3.10):

pip install footnote-mcp
python -m playwright install chromium   # the headless browser used by the fetch fallback
footnote-mcp                            # start the server (speaks MCP over stdio)

Or from source:

python3 -m venv .venv && source .venv/bin/activate
pip install -e .                        # installs the `footnote-mcp` console script + deps
python -m playwright install chromium   # the headless browser used by the fetch fallback
footnote-mcp                            # start the server (speaks MCP over stdio)

footnote-mcp now waits for an MCP client on stdio. Point a client at it by dropping this into its MCP settings (Claude Desktop: claude_desktop_config.json; Cursor: ~/.cursor/mcp.json):

{
  "mcpServers": {
    "footnote": { "command": "footnote-mcp" }
  }
}

Related MCP server: webx-mcp-server

Hosted MCP endpoint (Render)

This repository also includes an authenticated Streamable HTTP deployment. The stdio command above remains the recommended local option; the hosted endpoint is for MCP clients that support remote servers.

Push this repository to GitHub, then create a Render Blueprint from it. Render reads render.yaml, builds the included Dockerfile (including Chromium and Tesseract), and exposes the health check at /healthz.
Choose a unique service name, then set FOOTNOTE_MCP_PUBLIC_URL in Render to its exact public origin, for example https://my-footnote-mcp.onrender.com. FOOTNOTE_MCP_API_KEY is generated by the Blueprint; it is the owner key. Keep it secret and use it only for administration/testing.
Connect an MCP client to https://my-footnote-mcp.onrender.com/mcp with:

Authorization: Bearer <FOOTNOTE_MCP_API_KEY>

The API key is required: a public, unauthenticated research and browser server would let strangers consume its outbound traffic and call its tools. The service also validates its configured public host and browser Origin to protect the MCP endpoint from DNS rebinding.

Render Free is suitable for demos, not production: it sleeps after 15 minutes of inactivity, cold starts take about a minute, and both outbound bandwidth and instance hours are limited. The filesystem is ephemeral, so do not rely on the source cache or browser profile persisting across restarts.

Giving access to other people

Never distribute the owner key. Create a separate random key for each person:

python -m footnote_mcp.keygen

Then update FOOTNOTE_MCP_API_KEYS in Render's Environment settings and redeploy. Its value is JSON, where each user has a key and a personal requests-per-minute limit:

{
  "alice": {"key": "fn_alice_key_here", "rpm": 20},
  "bob": {"key": "fn_bob_key_here", "rpm": 10}
}

Give each person only their own value. To revoke access, remove that user from the JSON and redeploy; the other keys keep working. Limits are held in memory, which is appropriate for this one-instance Free service and reset on restart.

No API keys are required to start — search falls back to scraping Bing + DuckDuckGo. Add keys later under "env" (see Search backends). Pass --headed to watch the browser tier work.

Optional runtime variables are documented in .env.example. Copy it to .env for local shells, or paste selected variables into your MCP client config:

{
  "mcpServers": {
    "footnote": {
      "command": "footnote-mcp",
      "env": {
        "TAVILY_API_KEY": "..."
      }
    }
  }
}

To run without installing, straight from the source tree:

PYTHONPATH=src python -m footnote_mcp

Verifying claims — the differentiator

The reason to use this over a plain search tool is evidence_entailment and friends: they tell a claim a source supports from one it does not. benchmarks/run_benchmark.py measures that on a labeled set of claim/source pairs (and demos corroborate_claim and locate_claim_span):

python benchmarks/run_benchmark.py                    # offline heuristic (deterministic)
python benchmarks/run_benchmark.py --backend ollama   # LLM judge (needs ollama)

Offline-heuristic result on the labeled set (benchmarks/REPORT.md):

Set	n	Accuracy	Unsupported-claim catch rate	Precision on "supported"
Data domain (numeric + factual)	15	100%	100%	100%
Overall (incl. semantic)	18	83%	78%	80%

On its design domain — numeric and factual data claims — the offline heuristic never blesses an unsupported claim and never misses one. Its blind spot is purely-semantic negation/paraphrase; for those, evidence_entailment with backend="ollama" (a local LLM judge) closes the gap. Run the --backend ollama line above to score that path on your own machine.

Tools

Tool	Description
`web_search`	Configured SearXNG first, then keyed providers, then scraped Bing + DuckDuckGo. Snippets are discovery only.
`web_search_recent`	Search restricted to a recency window (day/week/month/year).
`web_deep_search`	Automatically route across web/papers/encyclopedia/GitHub/archive sources, then fetch, extract, rerank, and return source context.
`web_read`	Fetch one URL, extract text, classify source quality, persist cache metadata.
`papers_search`	Search Crossref and arXiv through one normalized, zero-key paper contract.
`encyclopedia_search`	Search Wikipedia/Wikidata entities or run read-only Wikidata SPARQL.
`github_search`	Search public repositories, issues, code, or commits; authentication is optional.
`archive_search`	Find URL captures through Wayback Machine and Common Crawl, optionally extracting archived text.
`web_archive_fetch`	Find the closest Wayback Machine snapshot for a dead/changed URL.
`web_fetch_authenticated`	Fetch a page that needs cookies or custom headers.
`web_crawl`	Breadth-first crawl from a start URL, on-host by default (≤ 50 pages).
`generate_search_queries`	Generate operator queries (`site:`, `filetype:csv`, API/data-table variants).

Tool	Description
`web_extract_tables`	Parse HTML tables into `columns`/`rows` with source-URL provenance.
`web_detect_downloads`	Detect linked CSV/TSV/XLS/XLSX/PDF/JSON/XML files.
`web_parse_file`	Download and parse CSV/TSV/XLS/XLSX/PDF/JSON.
`web_fetch_json`	Fetch direct API/JSON endpoints into parsed JSON.
`check_date_completeness`	Validate required date coverage (day/week/month).
`resolve_units`	Detect currencies, currency pairs, measurement units.
`validate_unit_rows`	Reject rows with incompatible units or currency pairs.
`reconcile_time_series`	Align series on a key, compute deltas, flag missing keys/outliers.
`export_dataset`	Write consolidated rows to a `csv`/`xlsx`/`json` file.

Tool	Description
`classify_source`	Classify official / aggregator / blog / forum / interactive / blocked / error.
`evidence_entailment`	Strict claim-vs-source checker: `heuristic`, `auto`, `ollama`, optional `local_nli`.
`corroborate_claim`	Triangulate a claim across excerpts (corroborated / conflicting / single_source / …).
`locate_claim_span`	Locate supporting sentence(s) with char offsets and a containment score.
`source_cache_get` / `source_cache_put`	Inspect and write persistent source-cache entries.
`build_research_debug_report`	Compact report of queries, URLs, source quality, verification gaps.
`startup_health_check`	Check parser, OCR, browser, and cache dependencies.

When generic parsers fail, synthesize a sandboxed parser:

Tool	Description
`tool_spec_propose`	Propose a task-specific extraction recipe spec.
`tool_code_generate`	Generate a starter `extract(source_text, input_payload)` recipe.
`tool_code_validate`	Validate recipe code against a static safety allowlist.
`tool_code_run_sandboxed`	Run validated code in a limited subprocess (JSON output only).
`tool_promote`	Save a validated recipe as reusable memory (no server edit).
`recipe_registry`	Manage promoted recipes: `list` / `get` / `run` / `delete`.

A controlled Chromium session for JS-heavy or interactive pages:

Tool	Description
`web_navigate` · `web_snapshot` · `web_click` · `web_type` · `web_extract` · `web_scroll`	Drive a page via stable element refs.
`browser_set_date_range` · `browser_extract_tables` · `browser_extract_tables_for_date_range`	Set a date range, submit, extract visible tables.
`web_screenshot`	Save a PNG and optionally OCR text locked inside the image.

Search backends

web_search routes through a provider layer. A configured zero-key SearXNG instance is tried first, followed by keyed providers and finally scraped Bing + DuckDuckGo. Results are normalized to one shape regardless of backend.

Provider	Env vars	Notes
SearXNG	`FOOTNOTE_SEARXNG_URL` (or `SEARXNG_URL`)	Zero-key JSON API; instance must enable JSON output.
Tavily	`TAVILY_API_KEY`	LLM-oriented search API.
Brave	`BRAVE_API_KEY`	Independent web index.
Google	`GOOGLE_API_KEY` + `GOOGLE_CSE_ID`	Programmable Search (Custom Search JSON API).
Bing + DuckDuckGo	none	Default fallback; scraped, no key.

auto (default) tries configured providers in order SearXNG → Tavily → Brave → Google, then scrapes. Force one with the provider argument (searxng/tavily/brave/google/scrape).

Specialized zero-key discovery

The public MCP surface is organized by user intent rather than by HTTP API:

Intent tool	Backends	Routing notes
`papers_search`	Crossref + arXiv	`source=auto` queries both; force either backend when needed.
`encyclopedia_search`	Wikipedia + Wikidata	Entity search by default; optional read-only SPARQL for structured facts.
`github_search`	GitHub REST search	Public zero-key requests work at GitHub's unauthenticated rate limit; `GITHUB_TOKEN` is optional.
`archive_search`	Wayback + Common Crawl	Accepts a URL/host pattern. `fetch_text=true` attempts archived-content extraction.

All four return title, url, snippet, published, authors, source, and source_type where those fields apply. web_deep_search accepts an optional sources array (web, papers, encyclopedia, github, archive). With an empty array it always uses general web discovery and adds specialized sources when the query signals their intent.

Semantic reranking. Pass semantic: true to web_search to reorder by meaning rather than keyword overlap: it over-fetches, embeds query and results with a local ollama model, and sorts by cosine similarity (each result gains semantic_score). Best-effort — if ollama is unavailable the original order is returned. Model: FOOTNOTE_EMBED_MODEL (default bge-m3).

Fetching & anti-bot ladder

web_read fetches through an escalation ladder (scraper.py): the cheapest method runs first and escalates only when a result looks blocked or empty. A block/quality detector decides when to escalate; a per-domain rate limiter, circuit breaker, and negative cache keep it polite. The tier used and the full attempt trace come back in fetch_tier / scrape_tiers.

Tier	Method	Enabled by
1	HTTP (curl_cffi TLS impersonation)	always
2	HTTP through a rotating proxy	`FOOTNOTE_PROXIES` set
3	Headless Chromium (runs JavaScript)	`FOOTNOTE_BROWSER_FALLBACK=1` (default on)
4	Chromium through a proxy	proxies + browser
5	Hosted scrape API (Firecrawl / ScrapingBee)	`FOOTNOTE_SCRAPE_API` set

With nothing configured it is the plain HTTP path plus an automatic browser fallback for JavaScript-rendered pages.

Env var	Default	Purpose
`FOOTNOTE_BROWSER_FALLBACK`	`1`	Escalate blocked/JS pages to headless Chromium.
`FOOTNOTE_PROXIES`	(none)	Comma-separated proxy URLs; sticky per domain with health tracking.
`FOOTNOTE_SCRAPE_API`	(none)	`firecrawl` or `scrapingbee` (needs the matching API key).
`FOOTNOTE_DOMAIN_RPS` / `_BURST`	`3` / `5`	Per-domain rate limit (token bucket).
`FOOTNOTE_BREAKER_THRESHOLD` / `_COOLDOWN`	`5` / `120`	Per-domain circuit breaker.
`FOOTNOTE_NEGCACHE_TTL`	`300`	Seconds to remember a blocked URL.
`FOOTNOTE_THIN_CONTENT_CHARS`	`200`	Below this extracted length, a script-heavy page counts as a JS shell.

Runtime data

~/.footnote-mcp/source_cache/        # persistent page cache (with provenance)
~/.footnote-mcp/research_memory.json # persistent research memory

Override the cache location with FOOTNOTE_SOURCE_CACHE=/path/to/cache footnote-mcp.

check_date_completeness supports the calendars calendar, business_day, crypto_24_7, forex_weekday, us_business_day, and ru_business_day (pass explicit holidays for source-specific ones; the us_/ru_ variants use the optional holidays package).

Other install paths

Docker bundles Chromium and tesseract — nothing else to install:

docker build -t footnote-mcp .
docker run -i --rm footnote-mcp        # the client launches this; see MCP config below

Published images are available from GitHub Container Registry:

docker run -i --rm ghcr.io/kazkozdev/footnote-mcp:0.2.3
docker run -i --rm ghcr.io/kazkozdev/footnote-mcp:latest

{
  "mcpServers": {
    "footnote": {
      "command": "docker",
      "args": ["run", "-i", "--rm", "ghcr.io/kazkozdev/footnote-mcp:latest"]
    }
  }
}

pipx / uvx (isolated install of the entry point):

pipx install /path/to/footnote-mcp          # or: pipx install git+<repo-url>
uvx --from /path/to/footnote-mcp footnote-mcp   # ad-hoc, no install

OCR. PDF/image OCR uses pytesseract + the system tesseract binary (brew install tesseract on macOS). Local NLI backend for evidence_entailment backend="local_nli": pip install -r requirements-nli.txt (model via FOOTNOTE_NLI_MODEL). Either way, startup_health_check reports what is actually available. Runtime dependency ranges are declared in pyproject.toml and mirrored in requirements.txt.

Tests

pip install -r requirements-dev.txt
python -m pytest -q          # offline unit + smoke tests; no network or keys needed

tests/test_mcp_smoke.py launches the server over real MCP stdio and exercises the tools end to end against a local HTTP fixture; the rest are offline unit tests of the parsers, fetch ladder, search providers, and dispatch. The live search test is opt-in:

RUN_LIVE_WEB_TESTS=1 python -m pytest -m live

CI runs the same suite (.github/workflows/tests.yml).

License

MIT — see LICENSE.

Install Server

license - permissive license

quality

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

5dRelease cycle

5Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Tools

View all tools

Related MCP Servers

rust-research-mcp
pma1999
A
license
-
quality
D
maintenance
An MCP server for academic research that enables paper search across 14 sources, PDF download with multi-provider fallback, metadata extraction, and bibliography generation.
Last updated 2025-10-15
2
GPL 3.0
webx-mcp-server
Browser Automation Web Scraping Search
ceeyang-ai
A
license
A
quality
B
maintenance
An MCP server for web content extraction, providing tools to fetch clean text, extract links, query by CSS selector, and search the web via DuckDuckGo.
Last updated 2026-06-18
4
MIT
Content Creator MCP Server
Web Scraping Search Speech Processing
niknshinde
F
license
-
quality
D
maintenance
An MCP server for content creation that extracts video transcripts, scrapes web articles, and performs web searches.
Last updated 2026-03-19
tapsite
Browser Automation Web Scraping Developer Tools
mgriffen
A
license
-
quality
A
maintenance
An MCP server for web intelligence extraction that provides 55 tools for extracting structured data, design systems, accessibility audits, and more from websites.
Last updated 2026-07-26
46
2
MIT

View all related MCP servers

Related MCP Connectors

PapersFlow
Academic research MCP server for paper search, citation checks, graphs, and deep research.
Parallel Task MCP
An MCP server for deep research or task groups
agent-skill
An MCP Server that provides identity verification and anti-fraud tools for AI agents via deepidv.

View all MCP Connectors

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/KazKozDev/footnote-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

Quick start

Hosted MCP endpoint (Render)

Giving access to other people

Verifying claims — the differentiator

Tools

Search backends

Specialized zero-key discovery

Fetching & anti-bot ladder

Runtime data

Other install paths

Tests

License

Maintenance

Resources

Looking for Admin?

Tools

Related MCP Servers

rust-research-mcp

webx-mcp-server

Content Creator MCP Server

tapsite

Related MCP Connectors

Latest Blog Posts

MCP directory API