Skip to main content
Glama

SAGA

SAGASelf-organizing Archive for Generative Agents — is a self-organizing, AI-native document archive for RAG agents: ingest any text-convertible document, enrich it with LLM-extracted metadata, keep it in Postgres (system of record), project it into OpenSearch (keyword + vector), and let agents query it via fused hybrid search and organise it over MCP.

CI License: Apache-2.0

Status: 1.0.0 — the full ingestion-to-search pipeline is implemented (storage, REST API, conversion, LLM analysis, chunking/embeddings/indexing, hybrid search + MCP, and backup/export). See the implementation plan.


What it does

  • Ingest documents in many formats (PDF, Office, HTML, images, scanned docs, …) via the REST API.

  • Convert to Markdown using containerised Docling (PDF) and Kreuzberg (everything else, incl. OCR) — routing is configurable.

  • Enrich with an LLM: doc-type classification (first-class types), extraction of identifiers/numbers (invoice/contract numbers, phone numbers, IBANs, dates, amounts, …), a short summary, and placement into folders (hierarchical, n:m) using document-similarity voting.

  • Store the authoritative record in Postgres (documents, folders, doc-types, notes, memberships) and the original binary in MinIO; project text + summary + filter fields into an OpenSearch document index and chunk vectors into a separate vector index.

  • Search via an MCP server offering performant fused hybrid (keyword + semantic, RRF) retrieval, metadata filtering, and folder-tree browsing — plus write tools to reorganise documents, folders, doc-types, and notes.

  • Back up everything to a directory tree via a paginated export API + script.

See the architecture for details.

Related MCP server: MinerU Document Explorer

Architecture at a glance

Client ──REST(Bearer)──► API ──┬─► Postgres (system of record)
                               ├─► MinIO (originals)
                               ├─► OpenSearch (doc projection + vector index)
                               └─► Redis ──► Worker (ARQ)
                                              ├─► Docling / Kreuzberg (convert)
                                              ├─► LLM (classify type / extract / summarise / place)
                                              ├─► Embeddings
                                              ├─► Postgres (persist)
                                              └─► OpenSearch (project + index)
Agent ──MCP(HTTP, Bearer)──► MCP server ──► Postgres + OpenSearch (fused hybrid search) + Embeddings

Providers (LLM + embeddings) are pluggable: Ollama (default), OpenAI, or Azure OpenAI — all configurable.

Quick start (Docker)

cp .env.example .env          # then edit the secrets
docker compose up -d          # starts the full stack

# Pull the default Ollama models (first run only)
docker compose exec ollama ollama pull llama3.1:8b
docker compose exec ollama ollama pull nomic-embed-text

Local development

# Install uv: https://docs.astral.sh/uv/
uv sync                       # create venv + install deps (app + dev)
uv run ruff check .           # lint
uv run ruff format --check .  # format check
uv run mypy                   # strict type check
uv run pytest                 # unit tests

Configuration

All behaviour is driven by YAML in config/ with ${ENV} overrides for secrets:

File

Purpose

config/config.yaml

API, MCP, security, OpenSearch, Postgres, MinIO, Redis, chunking, dedup, similarity.

config/converters.yaml

File-type → converter routing (PDF→Docling, else→Kreuzberg).

config/providers.yaml

LLM + embedding provider selection and models.

config/logging.yaml

Log levels, colour, categories.

Prompts and MCP tool descriptions live in prompts/.

Documentation

Start at the documentation index.

License

Apache-2.0 — see LICENSE.

A
license - permissive license
-
quality - not tested
A
maintenance

Maintenance

Maintainers
Response time
Release cycle
1Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mlauf-labs/saga-core'

If you have feedback or need assistance with the MCP directory API, please join our Discord server