Skip to main content
Glama

ContextLattice

context-lattice MCP server

Why Context Lattice

Context Lattice is built for teams running high-volume memory writes where durability and retrieval quality matter more than prompt bloat.

  • One ingress contract (/memory/write) with validated + normalized payloads.

  • Durable outbox fanout to specialized sinks (Qdrant, Mongo raw, MindsDB, Letta, memory-bank), plus fast retrieval indexes (topic_rollups, postgres_pgvector) in the staged read lane.

  • Retrieval orchestration that merges multi-source recall and improves ranking through a learning loop.

  • Code-context enrichment + reranking (symbol overlap, file-path proximity, recency) behind env-gated controls.

  • Local-first operation with optional cloud BYO for specific sinks.

Related MCP server: Context Fabric

Architecture Snapshot

Quickstart

Prerequisites

  • Container app requirement: a Compose v2-compatible container runtime is required (docker compose), such as Docker Desktop, Docker Engine, or another runtime that supports Compose v2

  • Supported host environments: macOS, Linux, or Windows (WSL2)

  • Host machine sized for selected profile (lite vs full) with enough CPU, RAM, and disk

  • gmake, jq, rg, python3, curl

  • Tested baseline: macOS 13+ with Docker Desktop

Distribution Options (Less technical + dev users)

  • Less technical macOS users: DMG bootstrap launcher
    https://github.com/sheawinkler/ContextLattice/releases/latest/download/ContextLattice-macOS-universal.dmg

  • Less technical Windows users: MSI bootstrap installer
    https://github.com/sheawinkler/ContextLattice/releases/latest/download/ContextLattice-windows-x64.msi

  • Less technical Linux users: bootstrap tarball
    https://github.com/sheawinkler/ContextLattice/releases/latest/download/ContextLattice-linux-bootstrap.tar.gz

  • Technical/dev users (default): repo clone or main ZIP

  • CLI fallback already exists and remains first-class: gmake quickstart

Release operator note:

gmake dmg-build
# output: dist/ContextLattice-macOS-universal.dmg
gmake msi-build
# output: dist/ContextLattice-windows-x64.msi
gmake linux-bundle-build
# output: dist/ContextLattice-linux-bootstrap.tar.gz
# attach this file to the latest GitHub release

Personal computer requirements + app versions

App lane

Recommended profile

CPU

RAM

Storage

Public v3.2.x (current public release v3.2.13)

Glama-lite (single container)

2-4 vCPU

4-8 GB

20-50 GB SSD

Public v3.2.x (current public release v3.2.13)

Full

6-8 vCPU

16-24 GB

120-200 GB SSD

Private v4 tuning lane

Full baseline + tuning headroom

Start from Full baseline

Start from Full baseline

Start from Full baseline; external NVMe strongly recommended

Notes:

  • Public operators should use v3.2.13 sizing targets above.

  • Private v4 work adds benchmark-heavy tuning and should be treated as heavier than public Full mode.

1) Configure environment

cp .env.example .env
ln -svf ../../.env infra/compose/.env

Strict runtime lock (prevents tuning drift across restarts):

gmake env-lock-apply
gmake env-lock-check

config/env/strict_runtime.env is the single source of truth for critical runtime/tuning keys. gmake up, gmake mem-up, and release/lite launch targets auto-apply this lock before compose starts.

Canonical config layout:

  • config/env/ -> runtime/tuning lockfiles

  • config/mcp/ -> MCP hub/proxy/client config files

Optional Letta backlog auto-prune tuning in .env:

LETTA_AUTO_PRUNE_ENABLED=true
LETTA_AUTO_PRUNE_INTERVAL_SECS=75
LETTA_AUTO_PRUNE_BACKLOG_TRIGGER=1000
LETTA_AUTO_PRUNE_LIMIT=20000
LETTA_AUTO_PRUNE_TIMEOUT_SECS=45
LETTA_AUTO_PRUNE_STATUSES=pending,retrying

Optional code-context and agent capability surfaces:

ORCH_CODE_CONTEXT_ENRICH_ENABLED=true
ORCH_MCP_CAPABILITY_MAP_ENABLED=true
ORCH_BROWSER_CONTEXT_INGEST_ENABLED=true

Fastembed adapter runtime (service-backed):

ORCH_ADAPTER_FASTEMBED_RS_ENABLED=true
ORCH_FASTEMBED_RS_BASE_URL=http://fastembed-sidecar:8080
ORCH_FASTEMBED_RS_ROUTE=/embed
ORCH_FASTEMBED_RS_MODEL=BAAI/bge-small-en-v1.5
ORCH_FASTEMBED_RS_TIMEOUT_SECS=2.5
ORCH_ADAPTER_FASTEMBED_RS_REQUIRE_GATE=true
ORCH_ADAPTER_FASTEMBED_RS_GATE_FILE=/app/data/gates/fastembed_gate_latest.json
ORCH_ADAPTER_FASTEMBED_RS_GATE_MAX_AGE_SECS=172800
ORCH_ADAPTER_FASTEMBED_RS_PROMOTE_OVERRIDE=true
ORCH_ADAPTER_FASTEMBED_RS_PROMOTE_REASON=manual_16pct_promotion_2026-03-16
FASTEMBED_DEFAULT_MODEL=BAAI/bge-small-en-v1.5
FASTEMBED_MAX_BATCH=256

When enabled, orchestrator Qdrant write fanout uses batched embeddings (embed_text_batch) to reduce per-item adapter overhead. If gate mode is enabled, fastembed activates only when the benchmark gate artifact reports passed=true. Manual promotion override is available for explicitly approved cases; telemetry still reports the raw gate result and marks override activation separately. fastembed-gate-refresh now runs this refresh loop automatically in compose; manual command remains available:

python3 bench/perf_shortlist_matrix.py \
  --api-key "$ORCH_KEY" \
  --runs 12 \
  --gate-warmups 1 \
  --gate-repeats 3 \
  --gate-aggregate median \
  --baseline bench/results/perf_shortlist_matrix_baseline.json \
  --gate-output /app/data/gates/fastembed_gate_latest.json

If the gate refresher starts before orchestrator readiness, it retries quickly via:

GATE_REFRESH_FAILURE_RETRY_SECS=45

Gateway staged retrieval now returns continuation_async.events_url when slow-source continuation is scheduled. Subscribe via SSE to get non-blocking completion updates:

GET /memory/search/continuations/{token}/events

Optional lexical guard for staged retrieval (policy-aware slow-source deferral):

GO_RETRIEVAL_LEXICAL_GUARD_ENABLED=true
GO_RETRIEVAL_LEXICAL_GUARD_MIN_COVERAGE=0.55
GO_RETRIEVAL_LEXICAL_GUARD_MIN_RESULTS=1

Optional mode-aware Qdrant tuning:

ORCH_QDRANT_SEARCH_MODE_HNSW_EF={"fast":48,"balanced":96,"deep":128}
ORCH_QDRANT_SEARCH_MODE_LIMIT_CAPS={"fast":80,"balanced":120,"deep":180}
ORCH_QDRANT_FILTERLESS_LIMIT_CAP=96
ORCH_QDRANT_WARMUP_ENABLED=true
ORCH_QDRANT_WARMUP_DELAY_SECS=2
ORCH_QDRANT_WARMUP_TIMEOUT_SECS=20

Deep async durability + telemetry store routing:

ORCH_RECALL_DEEP_ASYNC_PERSIST_ENABLED=true
ORCH_RECALL_DEEP_ASYNC_STORE_BACKEND=mongo
ORCH_RECALL_DEEP_ASYNC_MONGO_DB=contextlattice_raw
ORCH_RECALL_DEEP_ASYNC_MONGO_COLLECTION=recall_deep_async_jobs
ORCH_TELEMETRY_DB=contextlattice_raw
ORCH_TELEMETRY_COLLECTION=retrieval_telemetry
ORCH_TELEMETRY_PERSIST_ENABLED=true
ORCH_RETRIEVAL_MEMORY_BANK_DEFAULT_ENABLED=true
ORCH_MEMORY_BANK_SEARCH_BACKEND=icm_spike
ORCH_MEMORY_BANK_SPIKE_FALLBACK_BACKEND=surrealdb_spike
ORCH_MEMORY_BANK_SPIKE_FALLBACK_BACKENDS=surrealdb_spike,memvid_spike,shodh_spike,quickwit_spike
ORCH_MEMORY_BANK_SPIKE_HTTP_URL=http://memory-bank-spike-rs:8096
ORCH_MEMORY_BANK_SPIKE_SEARCH_ROUTE=/search
MEMORY_BANK_SPIKE_RS_MEILI_URL=http://meilisearch:7700
MEMORY_BANK_SPIKE_RS_MEILI_INDEX=contextlattice_memory
MEMORY_BANK_SPIKE_RS_MEILI_TASK_TIMEOUT_SECS=30
MEMORY_BANK_SPIKE_RS_PORT=8096
gmake quickstart

This command:

  • creates .env if missing

  • links compose env

  • generates CONTEXTLATTICE_ORCHESTRATOR_API_KEY if missing

  • applies secure local defaults

  • applies strict runtime tuning lock

  • boots the stack

  • runs smoke + auth-safe health checks

Easy monitoring after launch:

gmake monitor-open
# CLI-only checks:
gmake monitor-check
ORCH_KEY="$(awk -F= '/^CONTEXTLATTICE_ORCHESTRATOR_API_KEY=/{print substr($0,index($0,"=")+1)}' .env)"

curl -fsS http://127.0.0.1:8075/health | jq
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/status | jq '.service,.sinks'
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/ops/capabilities | jq

Expected:

  • /health returns {"ok": true, ...}

  • /status returns service and sink states (with API key)

4) Manual bootstrap (optional)

BOOTSTRAP=1 scripts/first_run.sh

MINDSDB_REQUIRED now defaults automatically from COMPOSE_PROFILES.

5) Other launch profiles

# launch using current COMPOSE_PROFILES from .env
gmake mem-up

# explicit modes
gmake mem-up-lite
gmake mem-up-full
gmake mem-up-core

# persist profile mode for future gmake mem-up
gmake mem-mode-full
gmake mem-mode-core

6) Verify health and telemetry

ORCH_KEY="$(awk -F= '/^CONTEXTLATTICE_ORCHESTRATOR_API_KEY=/{print substr($0,index($0,"=")+1)}' .env)"

curl -fsS http://127.0.0.1:8075/health | jq
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/status | jq
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/telemetry/fanout | jq
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/telemetry/fanout | jq '.lettaAutoPrune'
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/telemetry/retention | jq
curl -fsS -X POST -H "x-api-key: ${ORCH_KEY}" \
  "http://127.0.0.1:8075/telemetry/memory/cleanup-low-value/chunked?dry_run=true&project_batch=10&per_project_limit=250" | jq
curl -fsS -X POST -H "x-api-key: ${ORCH_KEY}" \
  "http://127.0.0.1:8075/telemetry/fanout/letta/auto-prune/run?force=false" | jq
curl -fsS -X POST -H "x-api-key: ${ORCH_KEY}" \
  "http://127.0.0.1:8075/maintenance/telemetry/purge?dry_run=true&include_qdrant=true&include_mindsdb=true&include_letta=true" | jq

7) First-run toggles (optional)

scripts/first_run.sh --allow-secrets-storage
scripts/first_run.sh --block-secrets-storage
scripts/first_run.sh --insecure-local

scripts/first_run.sh now enforces secure local-first defaults unless explicitly overridden:

  • loopback-only host port binding (HOST_BIND_ADDRESS=127.0.0.1)

  • production auth posture (CONTEXTLATTICE_ENV=production, strict API key requirement)

  • private status/docs/webhook endpoints

  • secrets-safe writes (SECRETS_STORAGE_MODE=redact)

Security toggles:

  • --allow-secrets-storage

  • --block-secrets-storage

  • --insecure-local (explicit opt-out)

Agent Operator Prompt (Paste Once)

Paste this into any new agent session (ChatGPT app, Claude chat apps, Claude Code, Codex):

You must use Context Lattice as the memory/context layer.

Runtime:
- Orchestrator: http://127.0.0.1:8075
- API key: CONTEXTLATTICE_ORCHESTRATOR_API_KEY from my local .env

Required behavior:
1) Before planning, call POST /memory/search with compact query + project/topic filters.
2) During long tasks, checkpoint major decisions/outcomes via POST /memory/write.
2.1) Submit outcome feedback with POST /tools/feedback_submit (include idempotencyKey).
3) Before final answer, run one more POST /memory/search for recency.
4) Keep writes compact (summary, decisions, diffs), never full transcripts.
5) If memory endpoints fail, continue task and report degraded-memory mode explicitly.
6) Use read-call timeouts that match retrieval mode:
   - fast: 25s
   - balanced: 60s
   - deep (or explicit `letta`/`memory_bank` sources): 75s
   Fast/balanced modes keep slow sources async by default unless explicitly requested (`sources=[...]`).
   Deep mode now defaults to async completion: you get immediate partial results plus `job_id`/`poll_url`/`events_url`, then fetch final results from `GET /memory/search/jobs/{job_id}` (or `/memory/search/async/{job_id}`) or stream updates from `GET /memory/search/jobs/{job_id}/events`.
   Read responses expose `retrieval_lifecycle` for explicit status (`queued|running|partial|succeeded|failed`) and source availability.
   If a deep read returns partials, show those immediately and poll once after 5-15s for warmed slow-source completion.
7) Set endpoint vars explicitly at session start:
   - `export CONTEXTLATTICE_ORCHESTRATOR_URL=http://127.0.0.1:8075`
   - `export CONTEXTLATTICE_ORCHESTRATOR_URL=http://127.0.0.1:8075`
8) Set a stable agent identity for profile defaults:
   - `export CONTEXTLATTICE_AGENT_ID=codex_gpt5`
   - `export CONTEXTLATTICE_AGENT_ID=codex_gpt5`

Detailed playbook: docs/human_agent_instruction_playbook.md

Lifecycle-aware local helper:

python3 scripts/agent_orchestration.py search-lifecycle \
  "profitability tuning baseline ladder" \
  contextlattice \
  deep \
  wait

Codex-first preflight helper:

python3 scripts/agent_orchestration.py preflight contextlattice runbooks/codex-integration

Profile-aware preflight helpers:

python3 scripts/agent_orchestration.py preflight-agent claude-code contextlattice
python3 scripts/agent_orchestration.py preflight-agent opencode contextlattice
python3 scripts/agent_orchestration.py preflight-agent hermes-agent contextlattice

ORCH_KEY="$(awk -F= '/^CONTEXTLATTICE_ORCHESTRATOR_API_KEY=/{print substr($0,index($0,"=")+1)}' .env)"
curl -fsS -H "content-type: application/json" -H "x-api-key: ${ORCH_KEY}" \
  -d '{"agent":"chatgpt-web","project":"contextlattice"}' \
  http://127.0.0.1:8075/v1/agents/preflight | jq

Unified Orchestrator Client + Tool Role Keys

  • Service traffic remains Go-first on http://127.0.0.1:8075; Python helpers are compatibility shims for operator scripts only.

  • Shared script client helper: scripts/contextlattice_client.py (legacy shim: scripts/orchestrator_helper.py).

  • Default tool policy is liberal/default-open (GO_TOOL_CALLS_ALLOW_ALL=true) to prevent startup friction.

  • Optional role split for tool lanes:

    • CONTEXTLATTICE_ORCHESTRATOR_API_KEY: orchestrator/admin lane.

    • CONTEXTLATTICE_WORKER_API_KEY: worker lane.

    • GO_TOOL_CALLS_ROLE_SPLIT_AUTO=true enables role split automatically only when both keys are present and distinct.

    • Worker defaults: allow capability_map,ops_queue_status; deny memory_write_batch,feedback_submit.

    • Orchestrator defaults: allow all unless explicitly restricted.

Agent-specific template blocks:

  • docs/public_overview/templates/agents/codex.md

  • docs/public_overview/templates/agents/claude-code.md

  • docs/public_overview/templates/agents/opencode.md

  • docs/public_overview/templates/agents/hermes-agent.md

  • docs/public_overview/templates/agents/chatgpt-web-desktop.md

  • docs/public_overview/templates/agents/claude-web-desktop.md

Agent profile defaults source:

  • config/agents/agent_profiles.json

External Agent Task Routing (Generic)

Context Lattice can queue and route tasks to external runners (Codex, OpenCode, Claude Code) and still supports internal application workers.

  • External-first pattern: set agent to the external runner id (codex, opencode, claude-code, or any custom worker name).

  • Internal app workers remain supported: use agent=internal or leave unassigned (agent empty / any) for orchestrator workers.

  • Practical default: external runners as primary path, internal workers as fallback/secondary for high-resource systems.

ORCH_KEY="$(awk -F= '/^CONTEXTLATTICE_ORCHESTRATOR_API_KEY=/{print substr($0,index($0,"=")+1)}' .env)"

# 1) Create a task targeted to any external runner id.
curl -fsS -X POST http://127.0.0.1:8075/agents/tasks \
  -H "content-type: application/json" \
  -H "x-api-key: ${ORCH_KEY}" \
  -d '{
    "title":"summarize deployment notes",
    "project":"default",
    "agent":"codex",
    "priority":3,
    "payload":{
      "action":"memory_search",
      "query":"deployment notes",
      "project":"default",
      "limit":8
    }
  }'

# 2) Runner claims only tasks assigned to its worker id (plus unassigned/any tasks).
curl -fsS -X POST "http://127.0.0.1:8075/agents/tasks/next?worker=codex" \
  -H "x-api-key: ${ORCH_KEY}"

# 3) Runner reports completion.
curl -fsS -X POST http://127.0.0.1:8075/agents/tasks/<TASK_ID>/status \
  -H "content-type: application/json" \
  -H "x-api-key: ${ORCH_KEY}" \
  -d '{"status":"succeeded","message":"completed by external runner","metadata":{"worker":"codex"}}'

Performance Profile

  • Sustained write throughput target: 100+ messages/second for typical memory payloads on modern laptop-class hardware.

  • Outbox protection: fanout retries, coalescing windows, and target-level backpressure to protect core durability.

  • Storage pressure controls: retention runner, low-value TTL pruning, optional snapshot pruning, and external NVMe cold path support.

  • Retrieval path: parallel source reads with orchestrator merge/rank loop and preference-learning feedback.

  • Telemetry routing guards (default-on): telemetry-like writes are filtered out of qdrant/mindsdb/letta fanout.

  • Memory-bank policy: promoted source (ORCH_RETRIEVAL_MEMORY_BANK_DEFAULT_ENABLED=true) with default icm_spike and fallback chain surrealdb_spike,memvid_spike,shodh_spike,quickwit_spike.

Version Lanes (Launch Clarity)

v3.2 (public) and v4 (private) are intentionally different lanes:

Area

Public v3.2

Private v4

Runtime frontdoor

gateway-go on :8075

gateway-go on :8075

Fallback lane

Python orchestrator on :18075

Python orchestrator on :18075

Rust/Go posture

Enabled by default

Enabled by default

Retrieval policy

staged fast-return + async slow continuation

staged + aggressive adaptive experiments

Memory-bank default

icm_spike

icm_spike with active candidate promotions

Release intent

stable public baseline

experimental/tuning lane behind hard gates

Promotion rule

benchmark + parity proof in release notes

benchmark + parity + operational soak before public sync

Telemetry routing/cleanup toggles:

ORCH_MEMORY_BANK_TELEMETRY_GUARD_ENABLED=true
ORCH_MEMORY_BANK_TELEMETRY_TOPIC_PREFIXES=telemetry,metrics,signals,overrides
ORCH_MEMORY_BANK_TELEMETRY_MARKERS=telemetry,metrics,__state__,__stats__,__snapshots__,__health__,__allocations__,_agg-,queue__
ORCH_QDRANT_TELEMETRY_GUARD_ENABLED=true
ORCH_MINDSDB_TELEMETRY_GUARD_ENABLED=true
ORCH_LETTA_TELEMETRY_GUARD_ENABLED=true
MINDSDB_LOW_VALUE_RETENTION_HOURS=48

v2.0.0 Runtime Comparison (v1 legacy vs v2 cutover)

Live A/B benchmark on POST /memory/search using bench/phase1_runtime_comparison.py with 8 requests and 20s timeout:

  • v2 cutover (USE_RUST_* = true, USE_GO_ORCHESTRATOR = true):

    • mean 3557ms, p50 2334ms, p95 8494ms, errors 0/8

  • v1-style legacy path (USE_RUST_* = false, USE_GO_ORCHESTRATOR = false):

    • mean 17565ms, p50 20006ms, p95 20008ms, errors 7/8 (timeouts)

  • Observed improvement:

    • mean 4.94x faster (about 5x)

    • p50 8.57x faster

    • p95 2.36x faster

Artifacts:

  • bench/results/phase1_ab_rustgo_on_fast_20260304T182812Z.json

  • bench/results/phase1_ab_rustgo_off_fast_20260304T182916Z.json

V3 Roadmap (Issues 68-72)

V3 is focused on application efficacy, not speed in isolation:

  • lower deep-read p95/p99 tails and timeout rates

  • higher recall quality for agent decisions

  • stronger runner interoperability and task-lifecycle visibility

  • ANE sidecar acceleration path (M-series macOS) with automatic fallback

Roadmap documents:

  • full plan: docs/v3-roadmap.md

  • ultra DB stack recommendation: docs/perf-candidate-notes/ultra_db_stack_recommendation_2026-03-16.md

  • public roadmap page: https://contextlattice.io/roadmap.html

Program graph:

V3 Objective: Context Efficacy at Scale
  ├─ Track A (Issues #69 + #72): performance + deep-read stability
  ├─ Track B (Issues #70 + #72): recall quality + memory semantics
  └─ Track C (Issues #68 + #71): runner interop + compute backend
      -> unified security/benchmark/recall gates -> staged cutover

Migration Runtime (Phases 1-8)

The orchestrator now runs Rust+Go as the default runtime path. Python remains in place as a legacy fallback when a proxy is unavailable.

  • Runtime interfaces: Codec, MemoryStore, Retriever, Scheduler, StateDelta

  • Status endpoint: GET /migration/runtime

  • Flags:

    • USE_RUST_CODEC

    • USE_RUST_MEMORY

    • USE_RUST_RETRIEVAL

    • ORCH_RUST_RETRIEVAL_VECTOR_BACKEND (auto|qdrant_remote|usearch_ann)

    • ORCH_RUST_RETRIEVAL_LEXICAL_BACKEND (auto|none|tantivy_lexical)

    • ORCH_RUST_RETRIEVAL_BACKEND_STRICT

    • ORCH_MEMORY_BANK_SEARCH_BACKEND (native|disabled|meilisearch_spike|quickwit_spike|tantivy_spike|lancedb_spike|trieve_spike|helixdb_spike|icm_spike|shodh_spike|memvid_spike|surrealdb_spike)

    • ORCH_MEMORY_BANK_SPIKE_FALLBACK_BACKEND

    • ORCH_MEMORY_BANK_SPIKE_HTTP_URL

    • MEMORY_BANK_SPIKE_RS_MEILI_URL

    • MEMORY_BANK_SPIKE_RS_MEILI_INDEX

    • MEMORY_BANK_SPIKE_RS_MEILI_TASK_TIMEOUT_SECS

    • GO_RETRIEVAL_LEXICAL_GUARD_ENABLED

    • GO_RETRIEVAL_LEXICAL_GUARD_MIN_COVERAGE

    • GO_RETRIEVAL_LEXICAL_GUARD_MIN_RESULTS

    • ORCH_RETRIEVAL_SYNC_ASYNC_MIN_FAST_RESULTS_BY_MODE (JSON map, e.g. {"fast":1,"balanced":2,"deep":3})

    • GO_RETRIEVAL_DISABLE_SYNC_SLOW_FALLBACK

    • GO_RETRIEVAL_SLOW_SYNC_TIMEOUT_CAP_SECS

    • GO_RETRIEVAL_RUST_LANE_PROMOTION_ENABLED

    • GO_RETRIEVAL_TOPIC_PREFILTER_ENABLED

V4 stack reference:

  • docs/perf-candidate-notes/v4_stack_and_rust_exploration_plan_2026-03-16.md

    • USE_GO_ORCHESTRATOR

    • CONTEXTLATTICE_ENGINE_MODE (embedded or service)

    • CONTEXTLATTICE_ENGINE_URL

    • CONTEXTLATTICE_GO_ORCHESTRATOR_URL

    • MIGRATION_SHADOW_DUAL_RUN

    • MIGRATION_CANARY_ENABLED

Migration scaffolding:

  • Rust crates: crates/context_codec, crates/context_engine, crates/context_retrieval

  • Service contract: proto/contextlattice_engine.proto

  • Go services: services/orchestrator-go, services/gateway-go

  • API docs: docs/engine-api.md, docs/migration-phase-status.md

Default cutover toggles:

USE_RUST_CODEC=true
USE_RUST_MEMORY=true
USE_RUST_RETRIEVAL=true
USE_GO_ORCHESTRATOR=true
CONTEXTLATTICE_ENGINE_MODE=service
CONTEXTLATTICE_ENGINE_URL=http://contextlattice-orchestrator:8075
CONTEXTLATTICE_GO_ORCHESTRATOR_URL=http://orchestrator-go:8090
MIGRATION_SHADOW_DUAL_RUN=true
MIGRATION_CANARY_ENABLED=true

Rollback/legacy toggles (temporary fallback only):

USE_RUST_CODEC=false
USE_RUST_MEMORY=false
USE_RUST_RETRIEVAL=false
USE_GO_ORCHESTRATOR=false

Pathway cache backend modes:

  • ORCH_RETRIEVAL_PATHWAY_CACHE_BACKEND=memory (in-memory only)

  • ORCH_RETRIEVAL_PATHWAY_CACHE_BACKEND=redis (read/write Redis backend)

  • ORCH_RETRIEVAL_PATHWAY_CACHE_BACKEND=redis_mirror (write-through mirror only; read path stays in-memory)

Dashboard retrieval observability:

  • contextlattice-dashboard status page now includes a retrieval flow panel with:

    • fast/deep mode selection

    • returned/pending/warming/failed source chips

    • continuation SSE event stream view

    • rollup-first result ordering and raw evidence drill-down (/v1/memory/get)

Model Runtime

  • Ships with a sane local default (qwen3.5:9b via Ollama).

  • Default task inference provider is auto:

    • on Apple Silicon (M-series macOS), auto selects ollama/coreml

    • on other hosts, auto selects standard ollama

  • Public v3 keeps ANE sidecar disabled by default.

  • Any OpenAI-compatible endpoint can be used when preferred.

  • BYO model runtimes supported through:

    • Ollama

    • LM Studio

    • llama.cpp compatible server

    • hosted OpenAI-compatible providers

Security defaults

  • SECRETS_STORAGE_MODE=redact redacts secret-like material before memory persistence/fanout.

  • SECRETS_STORAGE_MODE=block rejects writes containing secret-like material (422).

  • SECRETS_STORAGE_MODE=allow stores write payloads as-is (operator opt-in).

  • Compose host bindings default to loopback via HOST_BIND_ADDRESS=127.0.0.1.

  • Production strict mode requires CONTEXTLATTICE_ORCHESTRATOR_API_KEY.

Main branch release gate

Enforce PR-only merges on main with CODEOWNERS approval (.github/CODEOWNERS is * @sheawinkler):

scripts/enable_main_branch_protection.sh main 1

If GitHub returns Upgrade to GitHub Pro or make this repository public, switch repo visibility or plan, then rerun the command.

Web 3 Ready

  • IronClaw can be enabled as an optional messaging surface without changing the core local-first deployment.

  • OpenClaw/ZeroClaw surfaces now run with strict secret-leakage protections by default.

  • IronClaw docs and architecture conventions are excellent references for operator-facing completeness.

# optional IronClaw bridge
IRONCLAW_INTEGRATION_ENABLED=true
IRONCLAW_DEFAULT_PROJECT=messaging

# strict secret guard for openclaw/zeroclaw/ironclaw messaging surfaces
MESSAGING_OPENCLAW_STRICT_SECURITY=true

Ingress endpoints:

  • POST /integrations/messaging/openclaw

  • POST /integrations/messaging/ironclaw

  • POST /integrations/messaging/command

  • @ContextLattice task create|status|list|approve|replay|deadletter|runtime

API Surface (selected)

  • POST /memory/write

  • POST /memory/search

  • POST /memory/context-pack

  • GET /memory/search/continuations/{token}/events

  • POST /tools/feedback_submit

  • POST /integrations/messaging/command

  • POST /integrations/messaging/openclaw

  • POST /integrations/messaging/ironclaw

  • POST /integrations/telegram/webhook

  • POST /integrations/slack/events

  • POST /agents/tasks

  • GET /agents/tasks

  • GET /agents/tasks/runtime

  • GET /agents/tasks/deadletter

  • POST /agents/tasks/{task_id}/replay

  • POST /agents/tasks/recover-leases

  • GET /telemetry/memory

  • GET /telemetry/fanout

  • POST /telemetry/fanout/letta/auto-prune/run

  • GET /telemetry/retention

  • POST /telemetry/retention/run

  • POST /maintenance/telemetry/purge

Agent Context Expansion Runtime

Task workers and generic agent runners now execute a context-expansion loop by default:

  1. Pre-inference POST /memory/context-pack preflight.

  2. Budgeted context layers:

    • L0 factual snippets

    • L1 topic rollups

    • L2 raw file refs for detail dives

  3. Adaptive expansion:

    • one broadened scope pass (drop topic scope once)

    • deep async escalation when coverage is still low

  4. Tool-aware context slices exported via TASK_TOOL_CONTEXT_SLICES.

  5. Post-run checkpoint writeback to stable topic path (agent/checkpoints fallback).

  6. Fail-open lifecycle reporting with pending-source visibility.

Tune with:

CONTEXT_EXPANSION_ENABLED=true
CONTEXT_EXPANSION_L0_BUDGET_TOKENS=1200
CONTEXT_EXPANSION_L1_BUDGET_TOKENS=800
CONTEXT_EXPANSION_L2_BUDGET_TOKENS=400
CONTEXT_EXPANSION_DEEP_ESCALATION_ENABLED=true

Docs Index

  • Release notes:

    • docs/releases/v3.2.13.md (Glama-lite sqlite acceleration lane + capability detection)

    • docs/releases/v3.2.3.md (final install/deployment docs alignment for staged runtime lanes)

    • docs/releases/v3.2.2.md (README/website graphics + runtime ownership alignment)

    • docs/releases/v3.2.1.md (config canonicalization + Python fallback audit)

    • docs/releases/v3.2.0.md (public V3 Go-first cutover; Python removed from primary read path; includes A/B benchmark)

    • docs/releases/v3.1.0.md (post-v3.0.0 public, non-V4 integration/runtime updates)

  • Audits:

    • docs/audits/python_fallback_audit_v3.2.1.md (fallback-critical vs utility Python validation)

  • Phase 0 performance baseline: docs/perf-baseline.md

  • Migration plan: docs/migration-plan.md

  • Migration interfaces (Phase 1 proposal): docs/migration-interfaces.md

  • Benchmark harness docs: bench/README.md

  • Public overview site source: docs/public_overview/README.md

  • Legal and licensing: docs/legal/README.md

  • Glama release compliance: docs/glama-release-compliance.md

Pre-submit verifier:

gmake submission-preflight
python3 scripts/submission_preflight.py --online
gmake launch-lock
gmake launch-lock-public

Private/Public Sync Notes

This repository (sheawinkler/ContextLattice) is the primary codebase. Public landing collateral publishes from sheawinkler/ContextLattice branch gh-pages.

  • Source: docs/public_overview/

  • Sync script: scripts/sync_public_overview.sh

  • Primary URL: https://contextlattice.io/

  • Fallback URL: https://sheawinkler.github.io/ContextLattice/

  • Historical mirror repository sheawinkler/memmcp-overview is archived and not used for live hosting.

License

Apache License 2.0. See LICENSE.

Commercial terms for hosted offerings and private enterprise agreements are documented in docs/legal/README.md.

Install Server
A
security – no known vulnerabilities
A
license - permissive license
A
quality - confirmed to work

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/sheawinkler/context-lattice'

If you have feedback or need assistance with the MCP directory API, please join our Discord server