sifter-mcp
Officialsifter-mcp is an MCP server for Sifter, an open-source document intelligence engine that extracts structured data from documents and enables querying and management of the extracted records.
Sift Management
list_sifts,get_sift,create_sift,update_sift,delete_sift— create and manage sifts (extraction configurations) using natural language instructions (e.g. "client, date, total amount")
Document & Folder Management
list_folders,get_folder— browse and inspect document foldersupload_document— upload a Base64-encoded file to a folder (folder is auto-created; linked sifts will automatically process the document)
Extraction
run_extraction— enqueue a document for extraction against a specific siftget_extraction_status— check whether extraction is queued, running, completed, or failed
Querying & Retrieval
list_records— paginate through extracted structured records (cursor-based)find_records— filter records using structured MongoDB-style criteria (e.g.{"total": {"$gt": 1000}})query_sift— ask a natural language question over a sift's records; Sifter generates and runs the query automaticallyaggregate_sift— run a raw MongoDB aggregation pipeline directly for custom analyticsget_record_citations— retrieve per-field citation details (page, bounding box, source text) for any extracted record
Sifter
Your documents are a dark database.
Open-source document intelligence engine — schema-driven extraction, NL query, MCP server, Python and TypeScript SDKs. Self-hostable under MIT.

Why not RAG?
RAG is built for retrieval — find me chunks similar to this query. It breaks on homogeneous collections like invoices, contracts, or receipts where every document looks alike and the question is an aggregation, not a search.

Sifter's approach: extract structured fields once (client, date, total), store them as typed records, query with real filters and aggregations. The answer is exact and reproducible — because it's a database query, not a similarity search.
Quickstart
git clone https://github.com/sifter-ai/sifter
cd sifter/code
cp server/.env.example server/.env.local # set SIFTER_DEFAULT_API_KEY (required)
docker compose up -dOpen http://localhost:3000 — create a sift, upload documents, query results.
Python SDK
pip install sifter-aifrom sifter import Sifter
s = Sifter(api_key="sk-...")
sift = s.create_sift("Invoices", "client name, date, total amount")
sift.upload("./invoices/")
sift.wait()
for record in sift.records():
print(record["extracted_data"])
# {"client": "Acme Corp", "date": "2024-01-15", "total_amount": 1500.0}TypeScript SDK
npm install @sifter-ai/sdkimport { Sifter } from "@sifter-ai/sdk";
const client = new Sifter({ apiKey: "sk-..." });
const sift = await client.createSift("Invoices", "client, date, total amount");
await sift.upload("./invoices/");
await sift.wait();
const records = await sift.records();
console.log(records);MCP server (Claude Desktop / Cursor / AI agents)
{
"mcpServers": {
"sifter": {
"command": "uvx",
"args": ["sifter-mcp", "--base-url", "http://localhost:8000"],
"env": { "SIFTER_API_KEY": "sk-dev" }
}
}
}Then ask Claude: "What's the total unpaid across all invoices from last quarter?"
Want a remote MCP URL without running a local server? → Sifter Cloud
What's included
Schema-driven extraction — describe what to extract in natural language; schema is inferred automatically and exported as Pydantic / TypeScript types
NL query — ask questions in plain language; Sifter generates inspectable MongoDB aggregation pipelines
MCP server — stdio transport, read + write tools, zero custom integration code
REST API + SDKs — full OpenAPI spec, typed clients for Python and TypeScript
Webhooks — HMAC-signed HTTP callbacks on every extraction event
Spec-driven dashboards — short NL spec → auto-generated board (KPI, breakdown, table, time series)
CLI —
sifter extract,sifter records,sifter siftsfor terminal workflows and CISelf-hostable — Docker Compose, bring your own MongoDB and LLM API key
Don't want to run infrastructure?
Sifter Cloud is the managed version — no Mongo, no ops, remote MCP endpoint, Google Drive and email ingress. Free tier available.
Docs
Full documentation at docs.sifter.run — quickstart, SDK reference, MCP guide, cookbook, self-hosting.
License
MIT — see LICENSE.
Created by Bruno Fortunato.
Maintenance
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/sifter-ai/sifter'
If you have feedback or need assistance with the MCP directory API, please join our Discord server