nesift-mcp
Enables ingestion of PDF documents from arXiv, automatically detecting PDF URLs and extracting content for semantic search.
Integrates with SearXNG to perform web searches, fetch results, index pages, and provide answers in a single command.
Allows ingestion of Wikipedia articles for semantic search and querying, supporting hybrid retrieval and context budget trimming.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@nesift-mcpsearch for hybrid retrieval and give me key points"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
nesift
Fast, local semantic search over web content for AI agents. Sifts the net for signal — uses ~90% fewer tokens than raw web_fetch.
What it does
When an AI agent researches the web, the usual flow is: search → fetch 10 pages → drown in 100k+ tokens of irrelevant prose. nesift sits between the web and the agent: it ingests pages on the fly, indexes them with hybrid BM25 + embeddings, deduplicates redundant content across sources, and returns only the chunks that fit your token budget.
Local — runs on CPU, no API keys, no cloud calls (other than the page fetch itself).
Zero setup —
pip install -e ., no database, no daemon.Session-scoped — index lives in
/tmpand is per-session by default.Hybrid retrieval — BM25 +
potion-retrieval-32Membeddings fused via RRF.Context budget mode —
--budget Ntrims results to N tokens.Cross-page dedup — collapses near-identical chunks, notes source count.
SearXNG bridge —
nesift search "..."does search + filter + fetch + index + answer in one command.
Related MCP server: ragi
Install
git clone git@github.com:scottgl9/nesift.git
cd nesift
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"Requires Python 3.11+.
Quickstart
# Index a page and ask about it
nesift add https://en.wikipedia.org/wiki/Retrieval-augmented_generation
nesift query "what is RAG used for" --budget 1500
nesift answer "how does RAG reduce hallucinations"
# Pre-fetch scoring — rank snippets before downloading
nesift score "vector database" "Pinecone is a vector DB" "How to bake bread"
# One-shot SearXNG search + ingest + answer
NESIFT_SEARXNG_URL=http://127.0.0.1:8888 \
nesift search "retry logic in distributed systems" --top 5 --budget 2000
nesift list
nesift clearSee docs/cli.md for every command and flag.
How it works
URL → trafilatura extract → heading-aware chunker → triage summary
→ BM25 index + potion-retrieval-32M embeddings (CPU)
→ query: RRF fusion + dedup + budget trim → ranked chunks or synthesized answerSee docs/architecture.md.
MCP server
pip install "nesift[mcp]"
nesift-mcp # stdio MCP serverTools exposed: score_snippets, add_page, add_batch, query, answer, list_pages, clear, search. See docs/mcp.md.
PDF ingestion
nesift add https://arxiv.org/pdf/2005.11401.pdfContent type is auto-detected; .pdf URLs (or any response with the PDF signature) route through pypdf.
Multilingual
nesift add https://es.wikipedia.org/wiki/... --lang--lang swaps in potion-multilingual-128M (101 languages).
License
GPL-2.0-only — see LICENSE.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/scottgl9/nesift'
If you have feedback or need assistance with the MCP directory API, please join our Discord server