Skip to main content
Glama

corpuskit

Config-driven knowledge corpus runtime + compression orchestration for AI coding agents — multi-agent via MCP, zero-config on any docs folder.

It turns a folder of markdown into a searchable, constraint-aware institutional memory your agent can query (instead of re-deriving from scratch), and orchestrates external token-compression tools. Generalized from a working Claude-Code system; a project is just one corpus.yaml.

What you get

  • Manifest + FTS5 search over full document bodies (SQLite, stdlib — no embeddings/torch needed). Query with BM25.

  • Machine-readable constraints@constraint TYPE | target= | rule= | adr= | severity= lines in your decision/ADR docs, queryable by component.

  • MCP server exposing kr_search / kr_constraints (names/descriptions configurable) — works with any MCP-capable agent (Claude Code, Cursor, …).

  • Agent install — idempotent registration of the MCP server + write-back (SessionEnd) + constraint-injection (UserPromptSubmit) hooks.

  • Compression orchestration — discover/install/lifecycle/health over external RTK (shell-output) + Headroom (transport proxy), used AS-IS (never vendored).

Related MCP server: CodeGraph

Install

pip install corpuskit            # core (CLI + manifest/index/constraints)
pip install "corpuskit[mcp]"     # + MCP server
pip install "corpuskit[all]"
# until published: pip install git+https://github.com/SupaKang/corpuskit

Quickstart (zero-config)

cd my-docs/                 # any folder of *.md
corpus index build         # project_key = top-level dirname; full-body FTS5 index
corpus index query "rate limiter design"
corpus constraints --component payments-api

No corpus.yaml needed — defaults to auto_layout + dirname keys + standard YAML/bulleted frontmatter.

Config (corpus.yaml) — opt in when you need it

knowledge:
  keyed_roots: { specs: spec, decisions: decision }   # relpath -> doc_type (key = subdir)
  flat_roots:  { daily: daily }
  frontmatter: { style: auto, fields: { project: [project], status: [status] } }
  constraints: { decisions_root: decisions }
agent: { type: claude-code }       # claude-code | standalone | (cursor/cline stubs)
compression: { enabled: false }

corpus init scaffolds one. See examples/overmind.yaml for a full localized (Korean) instance.

Agent integration

corpus install --agent claude-code     # idempotent: MCP + SessionEnd + UserPromptSubmit (backs up settings.json)
corpus status --agent claude-code
corpus uninstall --agent claude-code

Restart your agent; it gains kr_search / kr_constraints tools, auto-injected active constraints, and self-updating index on session end.

Compression

corpus compression install   # ensure RTK + Headroom present
corpus compression start     # launch Headroom proxy, print ANTHROPIC_BASE_URL
corpus compression health    # versions, native-Windows degradation, port

CLI

corpus init | index build|query | constraints | serve-mcp | install|uninstall|status | compression … | doctor

License

Apache-2.0 (this code). External tools RTK and Headroom are separate Apache-2.0 projects, used as-is — see THIRD_PARTY.md.

A
license - permissive license
-
quality - not tested
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/SupaKang/corpuskit'

If you have feedback or need assistance with the MCP directory API, please join our Discord server