ContextLattice is a local-first memory orchestration system for AI systems. It offers the following capabilities:
Health Check: Query the orchestrator's health status to verify it's operational.
Write Memory: Store new memory items by providing a project name, file name, content, and optional topic path for hierarchical organization.
Search Memory: Search contextual memory entries by project and query, with optional filters for agent ID, topic path, grounding info, and retrieval debug details.
Durable Storage: Orchestrates memory writes with outbox fanout to specialized sinks (e.g., Qdrant, Mongo, MindsDB, Letta) targeting 100+ messages/second throughput.
Intelligent Retrieval: Multi-source recall with result merging, ranking, and a learning loop for continuous improvement.
Code Context Enrichment: Reranks code context based on symbol overlap, file-path proximity, and recency.
Agent Task Management: Queue, route, and manage task lifecycles (create, status, replay, recover leases) for external/internal agent runners.
Context Expansion: Dynamically expands agent context with budgeted layers (factual snippets, topic rollups, raw file refs) and async deep escalation.
Telemetry & Maintenance: Access fanout/retention telemetry, clean up low-value memory, and purge telemetry data.
Security Controls: Enforce secret storage policies (redaction, blocking, or allowing) with API key authentication.
Web3 Integration: Supports Web3 messaging surfaces like IronClaw, OpenClaw, and ZeroClaw.
ContextLattice
Why Context Lattice
Context Lattice is built for teams running high-volume memory writes where durability and retrieval quality matter more than prompt bloat.
One ingress contract (
/memory/write) with validated + normalized payloads.Durable outbox fanout to specialized sinks (Qdrant, Mongo raw, MindsDB, Letta, memory-bank), plus fast retrieval indexes (
topic_rollups,postgres_pgvector) in the staged read lane.Retrieval orchestration that merges multi-source recall and improves ranking through a learning loop.
Code-context enrichment + reranking (symbol overlap, file-path proximity, recency) behind env-gated controls.
Local-first operation with optional cloud BYO for specific sinks.
Related MCP server: Context Fabric
Architecture Snapshot
Quickstart
Prerequisites
Container app requirement: a Compose v2-compatible container runtime is required (
docker compose), such as Docker Desktop, Docker Engine, or another runtime that supports Compose v2Supported host environments: macOS, Linux, or Windows (WSL2)
Host machine sized for selected profile (
litevsfull) with enough CPU, RAM, and diskgmake,jq,rg,python3,curlTested baseline: macOS 13+ with Docker Desktop
Distribution Options (Less technical + dev users)
Less technical macOS users: DMG bootstrap launcher
https://github.com/sheawinkler/ContextLattice/releases/latest/download/ContextLattice-macOS-universal.dmgLess technical Windows users: MSI bootstrap installer
https://github.com/sheawinkler/ContextLattice/releases/latest/download/ContextLattice-windows-x64.msiLess technical Linux users: bootstrap tarball
https://github.com/sheawinkler/ContextLattice/releases/latest/download/ContextLattice-linux-bootstrap.tar.gzTechnical/dev users (default): repo clone or main ZIP
CLI fallback already exists and remains first-class:
gmake quickstart
Release operator note:
gmake dmg-build
# output: dist/ContextLattice-macOS-universal.dmg
gmake msi-build
# output: dist/ContextLattice-windows-x64.msi
gmake linux-bundle-build
# output: dist/ContextLattice-linux-bootstrap.tar.gz
# attach this file to the latest GitHub releasePersonal computer requirements + app versions
App lane | Recommended profile | CPU | RAM | Storage |
Public | Glama-lite (single container) |
|
|
|
Public | Full |
|
|
|
Private | Full baseline + tuning headroom | Start from Full baseline | Start from Full baseline | Start from Full baseline; external NVMe strongly recommended |
Notes:
Public operators should use
v3.2.13sizing targets above.Private
v4work adds benchmark-heavy tuning and should be treated as heavier than public Full mode.
1) Configure environment
cp .env.example .env
ln -svf ../../.env infra/compose/.envStrict runtime lock (prevents tuning drift across restarts):
gmake env-lock-apply
gmake env-lock-checkconfig/env/strict_runtime.env is the single source of truth for critical runtime/tuning keys.
gmake up, gmake mem-up, and release/lite launch targets auto-apply this lock before compose starts.
Canonical config layout:
config/env/-> runtime/tuning lockfilesconfig/mcp/-> MCP hub/proxy/client config files
Optional Letta backlog auto-prune tuning in .env:
LETTA_AUTO_PRUNE_ENABLED=true
LETTA_AUTO_PRUNE_INTERVAL_SECS=75
LETTA_AUTO_PRUNE_BACKLOG_TRIGGER=1000
LETTA_AUTO_PRUNE_LIMIT=20000
LETTA_AUTO_PRUNE_TIMEOUT_SECS=45
LETTA_AUTO_PRUNE_STATUSES=pending,retryingOptional code-context and agent capability surfaces:
ORCH_CODE_CONTEXT_ENRICH_ENABLED=true
ORCH_MCP_CAPABILITY_MAP_ENABLED=true
ORCH_BROWSER_CONTEXT_INGEST_ENABLED=trueFastembed adapter runtime (service-backed):
ORCH_ADAPTER_FASTEMBED_RS_ENABLED=true
ORCH_FASTEMBED_RS_BASE_URL=http://fastembed-sidecar:8080
ORCH_FASTEMBED_RS_ROUTE=/embed
ORCH_FASTEMBED_RS_MODEL=BAAI/bge-small-en-v1.5
ORCH_FASTEMBED_RS_TIMEOUT_SECS=2.5
ORCH_ADAPTER_FASTEMBED_RS_REQUIRE_GATE=true
ORCH_ADAPTER_FASTEMBED_RS_GATE_FILE=/app/data/gates/fastembed_gate_latest.json
ORCH_ADAPTER_FASTEMBED_RS_GATE_MAX_AGE_SECS=172800
ORCH_ADAPTER_FASTEMBED_RS_PROMOTE_OVERRIDE=true
ORCH_ADAPTER_FASTEMBED_RS_PROMOTE_REASON=manual_16pct_promotion_2026-03-16
FASTEMBED_DEFAULT_MODEL=BAAI/bge-small-en-v1.5
FASTEMBED_MAX_BATCH=256When enabled, orchestrator Qdrant write fanout uses batched embeddings (embed_text_batch) to reduce per-item adapter overhead.
If gate mode is enabled, fastembed activates only when the benchmark gate artifact reports passed=true.
Manual promotion override is available for explicitly approved cases; telemetry still reports the raw gate result and marks override activation separately.
fastembed-gate-refresh now runs this refresh loop automatically in compose; manual command remains available:
python3 bench/perf_shortlist_matrix.py \
--api-key "$ORCH_KEY" \
--runs 12 \
--gate-warmups 1 \
--gate-repeats 3 \
--gate-aggregate median \
--baseline bench/results/perf_shortlist_matrix_baseline.json \
--gate-output /app/data/gates/fastembed_gate_latest.jsonIf the gate refresher starts before orchestrator readiness, it retries quickly via:
GATE_REFRESH_FAILURE_RETRY_SECS=45Gateway staged retrieval now returns continuation_async.events_url when slow-source continuation is scheduled. Subscribe via SSE to get non-blocking completion updates:
GET /memory/search/continuations/{token}/eventsOptional lexical guard for staged retrieval (policy-aware slow-source deferral):
GO_RETRIEVAL_LEXICAL_GUARD_ENABLED=true
GO_RETRIEVAL_LEXICAL_GUARD_MIN_COVERAGE=0.55
GO_RETRIEVAL_LEXICAL_GUARD_MIN_RESULTS=1Optional mode-aware Qdrant tuning:
ORCH_QDRANT_SEARCH_MODE_HNSW_EF={"fast":48,"balanced":96,"deep":128}
ORCH_QDRANT_SEARCH_MODE_LIMIT_CAPS={"fast":80,"balanced":120,"deep":180}
ORCH_QDRANT_FILTERLESS_LIMIT_CAP=96
ORCH_QDRANT_WARMUP_ENABLED=true
ORCH_QDRANT_WARMUP_DELAY_SECS=2
ORCH_QDRANT_WARMUP_TIMEOUT_SECS=20Deep async durability + telemetry store routing:
ORCH_RECALL_DEEP_ASYNC_PERSIST_ENABLED=true
ORCH_RECALL_DEEP_ASYNC_STORE_BACKEND=mongo
ORCH_RECALL_DEEP_ASYNC_MONGO_DB=contextlattice_raw
ORCH_RECALL_DEEP_ASYNC_MONGO_COLLECTION=recall_deep_async_jobs
ORCH_TELEMETRY_DB=contextlattice_raw
ORCH_TELEMETRY_COLLECTION=retrieval_telemetry
ORCH_TELEMETRY_PERSIST_ENABLED=true
ORCH_RETRIEVAL_MEMORY_BANK_DEFAULT_ENABLED=true
ORCH_MEMORY_BANK_SEARCH_BACKEND=icm_spike
ORCH_MEMORY_BANK_SPIKE_FALLBACK_BACKEND=surrealdb_spike
ORCH_MEMORY_BANK_SPIKE_FALLBACK_BACKENDS=surrealdb_spike,memvid_spike,shodh_spike,quickwit_spike
ORCH_MEMORY_BANK_SPIKE_HTTP_URL=http://memory-bank-spike-rs:8096
ORCH_MEMORY_BANK_SPIKE_SEARCH_ROUTE=/search
MEMORY_BANK_SPIKE_RS_MEILI_URL=http://meilisearch:7700
MEMORY_BANK_SPIKE_RS_MEILI_INDEX=contextlattice_memory
MEMORY_BANK_SPIKE_RS_MEILI_TASK_TIMEOUT_SECS=30
MEMORY_BANK_SPIKE_RS_PORT=80962) One-command quickstart (recommended)
gmake quickstartThis command:
creates
.envif missinglinks compose env
generates
CONTEXTLATTICE_ORCHESTRATOR_API_KEYif missingapplies secure local defaults
applies strict runtime tuning lock
boots the stack
runs smoke + auth-safe health checks
Easy monitoring after launch:
gmake monitor-open
# CLI-only checks:
gmake monitor-check3) 60-second verify (recommended)
ORCH_KEY="$(awk -F= '/^CONTEXTLATTICE_ORCHESTRATOR_API_KEY=/{print substr($0,index($0,"=")+1)}' .env)"
curl -fsS http://127.0.0.1:8075/health | jq
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/status | jq '.service,.sinks'
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/ops/capabilities | jqExpected:
/healthreturns{"ok": true, ...}/statusreturns service and sink states (with API key)
4) Manual bootstrap (optional)
BOOTSTRAP=1 scripts/first_run.shMINDSDB_REQUIRED now defaults automatically from COMPOSE_PROFILES.
5) Other launch profiles
# launch using current COMPOSE_PROFILES from .env
gmake mem-up
# explicit modes
gmake mem-up-lite
gmake mem-up-full
gmake mem-up-core
# persist profile mode for future gmake mem-up
gmake mem-mode-full
gmake mem-mode-core6) Verify health and telemetry
ORCH_KEY="$(awk -F= '/^CONTEXTLATTICE_ORCHESTRATOR_API_KEY=/{print substr($0,index($0,"=")+1)}' .env)"
curl -fsS http://127.0.0.1:8075/health | jq
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/status | jq
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/telemetry/fanout | jq
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/telemetry/fanout | jq '.lettaAutoPrune'
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/telemetry/retention | jq
curl -fsS -X POST -H "x-api-key: ${ORCH_KEY}" \
"http://127.0.0.1:8075/telemetry/memory/cleanup-low-value/chunked?dry_run=true&project_batch=10&per_project_limit=250" | jq
curl -fsS -X POST -H "x-api-key: ${ORCH_KEY}" \
"http://127.0.0.1:8075/telemetry/fanout/letta/auto-prune/run?force=false" | jq
curl -fsS -X POST -H "x-api-key: ${ORCH_KEY}" \
"http://127.0.0.1:8075/maintenance/telemetry/purge?dry_run=true&include_qdrant=true&include_mindsdb=true&include_letta=true" | jq7) First-run toggles (optional)
scripts/first_run.sh --allow-secrets-storage
scripts/first_run.sh --block-secrets-storage
scripts/first_run.sh --insecure-localscripts/first_run.sh now enforces secure local-first defaults unless explicitly overridden:
loopback-only host port binding (
HOST_BIND_ADDRESS=127.0.0.1)production auth posture (
CONTEXTLATTICE_ENV=production, strict API key requirement)private status/docs/webhook endpoints
secrets-safe writes (
SECRETS_STORAGE_MODE=redact)
Security toggles:
--allow-secrets-storage--block-secrets-storage--insecure-local(explicit opt-out)
Agent Operator Prompt (Paste Once)
Paste this into any new agent session (ChatGPT app, Claude chat apps, Claude Code, Codex):
You must use Context Lattice as the memory/context layer.
Runtime:
- Orchestrator: http://127.0.0.1:8075
- API key: CONTEXTLATTICE_ORCHESTRATOR_API_KEY from my local .env
Required behavior:
1) Before planning, call POST /memory/search with compact query + project/topic filters.
2) During long tasks, checkpoint major decisions/outcomes via POST /memory/write.
2.1) Submit outcome feedback with POST /tools/feedback_submit (include idempotencyKey).
3) Before final answer, run one more POST /memory/search for recency.
4) Keep writes compact (summary, decisions, diffs), never full transcripts.
5) If memory endpoints fail, continue task and report degraded-memory mode explicitly.
6) Use read-call timeouts that match retrieval mode:
- fast: 25s
- balanced: 60s
- deep (or explicit `letta`/`memory_bank` sources): 75s
Fast/balanced modes keep slow sources async by default unless explicitly requested (`sources=[...]`).
Deep mode now defaults to async completion: you get immediate partial results plus `job_id`/`poll_url`/`events_url`, then fetch final results from `GET /memory/search/jobs/{job_id}` (or `/memory/search/async/{job_id}`) or stream updates from `GET /memory/search/jobs/{job_id}/events`.
Read responses expose `retrieval_lifecycle` for explicit status (`queued|running|partial|succeeded|failed`) and source availability.
If a deep read returns partials, show those immediately and poll once after 5-15s for warmed slow-source completion.
7) Set endpoint vars explicitly at session start:
- `export CONTEXTLATTICE_ORCHESTRATOR_URL=http://127.0.0.1:8075`
- `export CONTEXTLATTICE_ORCHESTRATOR_URL=http://127.0.0.1:8075`
8) Set a stable agent identity for profile defaults:
- `export CONTEXTLATTICE_AGENT_ID=codex_gpt5`
- `export CONTEXTLATTICE_AGENT_ID=codex_gpt5`Detailed playbook: docs/human_agent_instruction_playbook.md
Lifecycle-aware local helper:
python3 scripts/agent_orchestration.py search-lifecycle \
"profitability tuning baseline ladder" \
contextlattice \
deep \
waitCodex-first preflight helper:
python3 scripts/agent_orchestration.py preflight contextlattice runbooks/codex-integrationProfile-aware preflight helpers:
python3 scripts/agent_orchestration.py preflight-agent claude-code contextlattice
python3 scripts/agent_orchestration.py preflight-agent opencode contextlattice
python3 scripts/agent_orchestration.py preflight-agent hermes-agent contextlattice
ORCH_KEY="$(awk -F= '/^CONTEXTLATTICE_ORCHESTRATOR_API_KEY=/{print substr($0,index($0,"=")+1)}' .env)"
curl -fsS -H "content-type: application/json" -H "x-api-key: ${ORCH_KEY}" \
-d '{"agent":"chatgpt-web","project":"contextlattice"}' \
http://127.0.0.1:8075/v1/agents/preflight | jqUnified Orchestrator Client + Tool Role Keys
Service traffic remains Go-first on
http://127.0.0.1:8075; Python helpers are compatibility shims for operator scripts only.Shared script client helper:
scripts/contextlattice_client.py(legacy shim:scripts/orchestrator_helper.py).Default tool policy is liberal/default-open (
GO_TOOL_CALLS_ALLOW_ALL=true) to prevent startup friction.Optional role split for tool lanes:
CONTEXTLATTICE_ORCHESTRATOR_API_KEY: orchestrator/admin lane.CONTEXTLATTICE_WORKER_API_KEY: worker lane.GO_TOOL_CALLS_ROLE_SPLIT_AUTO=trueenables role split automatically only when both keys are present and distinct.Worker defaults: allow
capability_map,ops_queue_status; denymemory_write_batch,feedback_submit.Orchestrator defaults: allow all unless explicitly restricted.
Agent-specific template blocks:
docs/public_overview/templates/agents/codex.mddocs/public_overview/templates/agents/claude-code.mddocs/public_overview/templates/agents/opencode.mddocs/public_overview/templates/agents/hermes-agent.mddocs/public_overview/templates/agents/chatgpt-web-desktop.mddocs/public_overview/templates/agents/claude-web-desktop.md
Agent profile defaults source:
config/agents/agent_profiles.json
External Agent Task Routing (Generic)
Context Lattice can queue and route tasks to external runners (Codex, OpenCode, Claude Code) and still supports internal application workers.
External-first pattern: set
agentto the external runner id (codex,opencode,claude-code, or any custom worker name).Internal app workers remain supported: use
agent=internalor leave unassigned (agentempty /any) for orchestrator workers.Practical default: external runners as primary path, internal workers as fallback/secondary for high-resource systems.
ORCH_KEY="$(awk -F= '/^CONTEXTLATTICE_ORCHESTRATOR_API_KEY=/{print substr($0,index($0,"=")+1)}' .env)"
# 1) Create a task targeted to any external runner id.
curl -fsS -X POST http://127.0.0.1:8075/agents/tasks \
-H "content-type: application/json" \
-H "x-api-key: ${ORCH_KEY}" \
-d '{
"title":"summarize deployment notes",
"project":"default",
"agent":"codex",
"priority":3,
"payload":{
"action":"memory_search",
"query":"deployment notes",
"project":"default",
"limit":8
}
}'
# 2) Runner claims only tasks assigned to its worker id (plus unassigned/any tasks).
curl -fsS -X POST "http://127.0.0.1:8075/agents/tasks/next?worker=codex" \
-H "x-api-key: ${ORCH_KEY}"
# 3) Runner reports completion.
curl -fsS -X POST http://127.0.0.1:8075/agents/tasks/<TASK_ID>/status \
-H "content-type: application/json" \
-H "x-api-key: ${ORCH_KEY}" \
-d '{"status":"succeeded","message":"completed by external runner","metadata":{"worker":"codex"}}'Performance Profile
Sustained write throughput target:
100+ messages/secondfor typical memory payloads on modern laptop-class hardware.Outbox protection: fanout retries, coalescing windows, and target-level backpressure to protect core durability.
Storage pressure controls: retention runner, low-value TTL pruning, optional snapshot pruning, and external NVMe cold path support.
Retrieval path: parallel source reads with orchestrator merge/rank loop and preference-learning feedback.
Telemetry routing guards (default-on): telemetry-like writes are filtered out of
qdrant/mindsdb/lettafanout.Memory-bank policy: promoted source (
ORCH_RETRIEVAL_MEMORY_BANK_DEFAULT_ENABLED=true) with defaulticm_spikeand fallback chainsurrealdb_spike,memvid_spike,shodh_spike,quickwit_spike.
Version Lanes (Launch Clarity)
v3.2 (public) and v4 (private) are intentionally different lanes:
Area | Public | Private |
Runtime frontdoor |
|
|
Fallback lane | Python orchestrator on | Python orchestrator on |
Rust/Go posture | Enabled by default | Enabled by default |
Retrieval policy | staged fast-return + async slow continuation | staged + aggressive adaptive experiments |
Memory-bank default |
|
|
Release intent | stable public baseline | experimental/tuning lane behind hard gates |
Promotion rule | benchmark + parity proof in release notes | benchmark + parity + operational soak before public sync |
Telemetry routing/cleanup toggles:
ORCH_MEMORY_BANK_TELEMETRY_GUARD_ENABLED=true
ORCH_MEMORY_BANK_TELEMETRY_TOPIC_PREFIXES=telemetry,metrics,signals,overrides
ORCH_MEMORY_BANK_TELEMETRY_MARKERS=telemetry,metrics,__state__,__stats__,__snapshots__,__health__,__allocations__,_agg-,queue__
ORCH_QDRANT_TELEMETRY_GUARD_ENABLED=true
ORCH_MINDSDB_TELEMETRY_GUARD_ENABLED=true
ORCH_LETTA_TELEMETRY_GUARD_ENABLED=true
MINDSDB_LOW_VALUE_RETENTION_HOURS=48v2.0.0 Runtime Comparison (v1 legacy vs v2 cutover)
Live A/B benchmark on POST /memory/search using bench/phase1_runtime_comparison.py with 8 requests and 20s timeout:
v2 cutover (
USE_RUST_* = true,USE_GO_ORCHESTRATOR = true):mean
3557ms, p502334ms, p958494ms, errors0/8
v1-style legacy path (
USE_RUST_* = false,USE_GO_ORCHESTRATOR = false):mean
17565ms, p5020006ms, p9520008ms, errors7/8(timeouts)
Observed improvement:
mean
4.94xfaster (about5x)p50
8.57xfasterp95
2.36xfaster
Artifacts:
bench/results/phase1_ab_rustgo_on_fast_20260304T182812Z.jsonbench/results/phase1_ab_rustgo_off_fast_20260304T182916Z.json
V3 Roadmap (Issues 68-72)
V3 is focused on application efficacy, not speed in isolation:
lower deep-read p95/p99 tails and timeout rates
higher recall quality for agent decisions
stronger runner interoperability and task-lifecycle visibility
ANE sidecar acceleration path (M-series macOS) with automatic fallback
Roadmap documents:
full plan:
docs/v3-roadmap.mdultra DB stack recommendation:
docs/perf-candidate-notes/ultra_db_stack_recommendation_2026-03-16.mdpublic roadmap page:
https://contextlattice.io/roadmap.html
Program graph:
V3 Objective: Context Efficacy at Scale
├─ Track A (Issues #69 + #72): performance + deep-read stability
├─ Track B (Issues #70 + #72): recall quality + memory semantics
└─ Track C (Issues #68 + #71): runner interop + compute backend
-> unified security/benchmark/recall gates -> staged cutoverMigration Runtime (Phases 1-8)
The orchestrator now runs Rust+Go as the default runtime path. Python remains in place as a legacy fallback when a proxy is unavailable.
Runtime interfaces:
Codec,MemoryStore,Retriever,Scheduler,StateDeltaStatus endpoint:
GET /migration/runtimeFlags:
USE_RUST_CODECUSE_RUST_MEMORYUSE_RUST_RETRIEVALORCH_RUST_RETRIEVAL_VECTOR_BACKEND(auto|qdrant_remote|usearch_ann)ORCH_RUST_RETRIEVAL_LEXICAL_BACKEND(auto|none|tantivy_lexical)ORCH_RUST_RETRIEVAL_BACKEND_STRICTORCH_MEMORY_BANK_SEARCH_BACKEND(native|disabled|meilisearch_spike|quickwit_spike|tantivy_spike|lancedb_spike|trieve_spike|helixdb_spike|icm_spike|shodh_spike|memvid_spike|surrealdb_spike)ORCH_MEMORY_BANK_SPIKE_FALLBACK_BACKENDORCH_MEMORY_BANK_SPIKE_HTTP_URLMEMORY_BANK_SPIKE_RS_MEILI_URLMEMORY_BANK_SPIKE_RS_MEILI_INDEXMEMORY_BANK_SPIKE_RS_MEILI_TASK_TIMEOUT_SECSGO_RETRIEVAL_LEXICAL_GUARD_ENABLEDGO_RETRIEVAL_LEXICAL_GUARD_MIN_COVERAGEGO_RETRIEVAL_LEXICAL_GUARD_MIN_RESULTSORCH_RETRIEVAL_SYNC_ASYNC_MIN_FAST_RESULTS_BY_MODE(JSON map, e.g.{"fast":1,"balanced":2,"deep":3})GO_RETRIEVAL_DISABLE_SYNC_SLOW_FALLBACKGO_RETRIEVAL_SLOW_SYNC_TIMEOUT_CAP_SECSGO_RETRIEVAL_RUST_LANE_PROMOTION_ENABLEDGO_RETRIEVAL_TOPIC_PREFILTER_ENABLED
V4 stack reference:
docs/perf-candidate-notes/v4_stack_and_rust_exploration_plan_2026-03-16.mdUSE_GO_ORCHESTRATORCONTEXTLATTICE_ENGINE_MODE(embeddedorservice)CONTEXTLATTICE_ENGINE_URLCONTEXTLATTICE_GO_ORCHESTRATOR_URLMIGRATION_SHADOW_DUAL_RUNMIGRATION_CANARY_ENABLED
Migration scaffolding:
Rust crates:
crates/context_codec,crates/context_engine,crates/context_retrievalService contract:
proto/contextlattice_engine.protoGo services:
services/orchestrator-go,services/gateway-goAPI docs:
docs/engine-api.md,docs/migration-phase-status.md
Default cutover toggles:
USE_RUST_CODEC=true
USE_RUST_MEMORY=true
USE_RUST_RETRIEVAL=true
USE_GO_ORCHESTRATOR=true
CONTEXTLATTICE_ENGINE_MODE=service
CONTEXTLATTICE_ENGINE_URL=http://contextlattice-orchestrator:8075
CONTEXTLATTICE_GO_ORCHESTRATOR_URL=http://orchestrator-go:8090
MIGRATION_SHADOW_DUAL_RUN=true
MIGRATION_CANARY_ENABLED=trueRollback/legacy toggles (temporary fallback only):
USE_RUST_CODEC=false
USE_RUST_MEMORY=false
USE_RUST_RETRIEVAL=false
USE_GO_ORCHESTRATOR=falsePathway cache backend modes:
ORCH_RETRIEVAL_PATHWAY_CACHE_BACKEND=memory(in-memory only)ORCH_RETRIEVAL_PATHWAY_CACHE_BACKEND=redis(read/write Redis backend)ORCH_RETRIEVAL_PATHWAY_CACHE_BACKEND=redis_mirror(write-through mirror only; read path stays in-memory)
Dashboard retrieval observability:
contextlattice-dashboardstatus page now includes a retrieval flow panel with:fast/deep mode selection
returned/pending/warming/failed source chips
continuation SSE event stream view
rollup-first result ordering and raw evidence drill-down (
/v1/memory/get)
Model Runtime
Ships with a sane local default (
qwen3.5:9bvia Ollama).Default task inference provider is
auto:on Apple Silicon (M-series macOS), auto selects
ollama/coremlon other hosts, auto selects standard
ollama
Public v3 keeps ANE sidecar disabled by default.
Any OpenAI-compatible endpoint can be used when preferred.
BYO model runtimes supported through:
Ollama
LM Studio
llama.cpp compatible server
hosted OpenAI-compatible providers
Security defaults
SECRETS_STORAGE_MODE=redactredacts secret-like material before memory persistence/fanout.SECRETS_STORAGE_MODE=blockrejects writes containing secret-like material (422).SECRETS_STORAGE_MODE=allowstores write payloads as-is (operator opt-in).Compose host bindings default to loopback via
HOST_BIND_ADDRESS=127.0.0.1.Production strict mode requires
CONTEXTLATTICE_ORCHESTRATOR_API_KEY.
Main branch release gate
Enforce PR-only merges on main with CODEOWNERS approval (.github/CODEOWNERS is * @sheawinkler):
scripts/enable_main_branch_protection.sh main 1If GitHub returns Upgrade to GitHub Pro or make this repository public, switch repo visibility or plan, then rerun the command.
Web 3 Ready
IronClaw can be enabled as an optional messaging surface without changing the core local-first deployment.
OpenClaw/ZeroClaw surfaces now run with strict secret-leakage protections by default.
IronClaw docs and architecture conventions are excellent references for operator-facing completeness.
# optional IronClaw bridge
IRONCLAW_INTEGRATION_ENABLED=true
IRONCLAW_DEFAULT_PROJECT=messaging
# strict secret guard for openclaw/zeroclaw/ironclaw messaging surfaces
MESSAGING_OPENCLAW_STRICT_SECURITY=trueIngress endpoints:
POST /integrations/messaging/openclawPOST /integrations/messaging/ironclawPOST /integrations/messaging/command@ContextLattice task create|status|list|approve|replay|deadletter|runtime
API Surface (selected)
POST /memory/writePOST /memory/searchPOST /memory/context-packGET /memory/search/continuations/{token}/eventsPOST /tools/feedback_submitPOST /integrations/messaging/commandPOST /integrations/messaging/openclawPOST /integrations/messaging/ironclawPOST /integrations/telegram/webhookPOST /integrations/slack/eventsPOST /agents/tasksGET /agents/tasksGET /agents/tasks/runtimeGET /agents/tasks/deadletterPOST /agents/tasks/{task_id}/replayPOST /agents/tasks/recover-leasesGET /telemetry/memoryGET /telemetry/fanoutPOST /telemetry/fanout/letta/auto-prune/runGET /telemetry/retentionPOST /telemetry/retention/runPOST /maintenance/telemetry/purge
Agent Context Expansion Runtime
Task workers and generic agent runners now execute a context-expansion loop by default:
Pre-inference
POST /memory/context-packpreflight.Budgeted context layers:
L0factual snippetsL1topic rollupsL2raw file refs for detail dives
Adaptive expansion:
one broadened scope pass (drop topic scope once)
deep async escalation when coverage is still low
Tool-aware context slices exported via
TASK_TOOL_CONTEXT_SLICES.Post-run checkpoint writeback to stable topic path (
agent/checkpointsfallback).Fail-open lifecycle reporting with pending-source visibility.
Tune with:
CONTEXT_EXPANSION_ENABLED=true
CONTEXT_EXPANSION_L0_BUDGET_TOKENS=1200
CONTEXT_EXPANSION_L1_BUDGET_TOKENS=800
CONTEXT_EXPANSION_L2_BUDGET_TOKENS=400
CONTEXT_EXPANSION_DEEP_ESCALATION_ENABLED=trueDocs Index
Release notes:
docs/releases/v3.2.13.md(Glama-lite sqlite acceleration lane + capability detection)docs/releases/v3.2.3.md(final install/deployment docs alignment for staged runtime lanes)docs/releases/v3.2.2.md(README/website graphics + runtime ownership alignment)docs/releases/v3.2.1.md(config canonicalization + Python fallback audit)docs/releases/v3.2.0.md(public V3 Go-first cutover; Python removed from primary read path; includes A/B benchmark)docs/releases/v3.1.0.md(post-v3.0.0public, non-V4 integration/runtime updates)
Audits:
docs/audits/python_fallback_audit_v3.2.1.md(fallback-critical vs utility Python validation)
Phase 0 performance baseline:
docs/perf-baseline.mdMigration plan:
docs/migration-plan.mdMigration interfaces (Phase 1 proposal):
docs/migration-interfaces.mdBenchmark harness docs:
bench/README.mdPublic overview site source:
docs/public_overview/README.mdLegal and licensing:
docs/legal/README.mdGlama release compliance:
docs/glama-release-compliance.md
Pre-submit verifier:
gmake submission-preflight
python3 scripts/submission_preflight.py --online
gmake launch-lock
gmake launch-lock-publicPrivate/Public Sync Notes
This repository (sheawinkler/ContextLattice) is the primary codebase.
Public landing collateral publishes from sheawinkler/ContextLattice branch gh-pages.
Source:
docs/public_overview/Sync script:
scripts/sync_public_overview.shPrimary URL:
https://contextlattice.io/Fallback URL:
https://sheawinkler.github.io/ContextLattice/Historical mirror repository
sheawinkler/memmcp-overviewis archived and not used for live hosting.
License
Apache License 2.0. See LICENSE.
Commercial terms for hosted offerings and private enterprise agreements are
documented in docs/legal/README.md.