Skip to main content
Glama
sifter-ai

sifter-mcp

Official

Sifter

CI codecov PyPI npm Python Node License: MIT

Structure any document. Query it like a database. Build on top via API.

Open-source document intelligence engine — schema-driven extraction, NL query, MCP server, Python and TypeScript SDKs. Self-hostable under MIT.

Sifter demo


Why not RAG?

RAG is built for retrieval — find me chunks similar to this query. It breaks on homogeneous collections like invoices, contracts, or receipts where every document looks alike and the question is an aggregation, not a search.

RAG vs Sifter

Sifter's approach: extract structured fields once (client, date, total), store them as typed records, query with real filters and aggregations. The answer is exact and reproducible — because it's a database query, not a similarity search.


Quickstart

git clone https://github.com/sifter-ai/sifter
cd sifter/code
cp server/.env.example server/.env.local    # set SIFTER_DEFAULT_API_KEY (required)
docker compose up -d

Open http://localhost:3000 — create a sift, upload documents, query results.


Python SDK

pip install sifter-ai
from sifter import Sifter

s = Sifter(api_key="sk-...")

sift = s.create_sift("Invoices", "client name, date, total amount")
sift.upload("./invoices/")
sift.wait()

for record in sift.records():
    print(record["extracted_data"])
# {"client": "Acme Corp", "date": "2024-01-15", "total_amount": 1500.0}

TypeScript SDK

npm install @sifter-ai/sdk
import { Sifter } from "@sifter-ai/sdk";

const client = new Sifter({ apiKey: "sk-..." });

const sift = await client.createSift("Invoices", "client, date, total amount");
await sift.upload("./invoices/");
await sift.wait();

const records = await sift.records();
console.log(records);

MCP server (Claude Desktop / Cursor / AI agents)

{
  "mcpServers": {
    "sifter": {
      "command": "uvx",
      "args": ["sifter-mcp", "--base-url", "http://localhost:8000"],
      "env": { "SIFTER_API_KEY": "sk-dev" }
    }
  }
}

Then ask Claude: "What's the total unpaid across all invoices from last quarter?"

Want a remote MCP URL without running a local server? → Sifter Cloud


What's included

  • Schema-driven extraction — describe what to extract in natural language; schema is inferred automatically and exported as Pydantic / TypeScript types

  • NL query — ask questions in plain language; Sifter generates inspectable MongoDB aggregation pipelines

  • MCP server — stdio transport, read + write tools, zero custom integration code

  • REST API + SDKs — full OpenAPI spec, typed clients for Python and TypeScript

  • Webhooks — HMAC-signed HTTP callbacks on every extraction event

  • Spec-driven dashboards — short NL spec → auto-generated board (KPI, breakdown, table, time series)

  • CLIsifter extract, sifter records, sifter sifts for terminal workflows and CI

  • Self-hostable — Docker Compose, bring your own MongoDB and LLM API key


Don't want to run infrastructure?

Sifter Cloud is the managed version — no Mongo, no ops, remote MCP endpoint, Google Drive and email ingress. Free tier available.


Docs

Full documentation at docs.sifter.run — quickstart, SDK reference, MCP guide, cookbook, self-hosting.


License

MIT — see LICENSE.

Created by Bruno Fortunato.

Install Server
A
license - permissive license
A
quality
B
maintenance

Maintenance

Maintainers
Response time
0dRelease cycle
7Releases (12mo)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/sifter-ai/sifter'

If you have feedback or need assistance with the MCP directory API, please join our Discord server