Skip to main content
Glama

Context Fabric

Local-first MCP memory for AI coding agents.

Your agent remembers decisions, patterns, project context, and what changed while you were away — across sessions, projects, and tools.

Version Tests License Node Docker CI

NOTE

Pre-1.0, but built for daily use. Context Fabric is actively used, tested, and released regularly. APIs and storage formats may still evolve before 1.0, so pin versions and review the CHANGELOG before upgrading.

Start Here

Why it exists

Coding agents are great in-session and forgetful between sessions. Important context disappears when the terminal closes: decisions, debugging discoveries, codebase conventions, partial work, and the answer to "what changed since I was last here?"

Context Fabric gives MCP-compatible coding agents a persistent memory layer that stays local, searchable, and useful.

Who it's for

  • Developers using MCP-capable coding tools like Claude Code, Cursor, Codex CLI, Gemini CLI, OpenCode, or Kimi

  • Teams that want persistent agent memory without sending code and context to a hosted memory service

  • Builders who want a lightweight local memory substrate instead of wiring up a separate vector database stack

Why Context Fabric

  • Local-first by design — SQLite storage, local embeddings, Docker/local deployment, zero cloud dependency.

  • Built for coding agents — remembers decisions, bug fixes, conventions, code patterns, and current project state.

  • MCP-native — works as a real MCP server with Tools, Resources, and Prompts.

  • Code-aware and time-aware — semantic code search, symbol indexing, and orientation around offline gaps.

  • Practical to adopt — no external vector database, no API key, no hosted control plane required.

What you get

Memory & retrieval

  • Three-layer memory — Working (L1), Project (L2), Semantic (L3). Memories auto-route to the right layer.

  • Hybrid search — FTS5 BM25 + vector cosine + Reciprocal Rank Fusion. Query-side instruction prefixes applied automatically per embedder family (BGE, E5, MiniLM). Optional explanations expose component scores and boosts without changing default ranking.

  • Semantic recall — in-process vector embeddings via ONNX + fastembed (bge-small-en-v1.5 by default, with one-env-var swap to larger models or GPU). No API keys needed.

  • Bundled ANNsqlite-vec ships as a regular dependency (since v0.13). KNN over the full corpus, graceful fallback if the loadable extension fails to attach.

  • Optional CUDA inference — set CONTEXT_FABRIC_EMBED_EP=cuda + run scripts/setup-gpu.sh for ~30× ingest throughput on NVIDIA hardware.

  • Local code indexing — scans source files, extracts symbols, stays fresh via file watching, and can inspect/repair stale or corrupted index state.

  • Time-aware orientation — "What changed while I was away?" with offline-gap detection and timezone support.

  • Ghost messages — relevant memories surface via context.getCurrent without cluttering the main workflow.

  • Public-benchmark harness — reproducible BEIR SciFact / FiQA and LongMemEval_S runners in benchmarks/public/.

Memory intelligence

  • Provenance — structured citation blocks on memories (sessionId, eventId, filePath, commitSha, sourceUrl, and more).

  • Dedup-on-store — cosine near-duplicate detection for L3 with skip, merge, or allow strategies.

  • Bi-temporal memorysupersedes, validFrom, and validUntil support for "what was true at that time?" reasoning.

  • Scoped fabric graph — temporal entities and relationships connect projects, sessions, files, symbols, memories, decisions, errors, and skills for lineage/path queries.

Agent ergonomics

  • Skills — procedural memory with slugged, invokable instruction blocks and usage tracking.

  • MCP Resources — browseable memory://skills, memory://recent, memory://conventions, memory://decisions, and templated resource views.

  • MCP Prompts — slash-command workflows like cf-orient, cf-capture-decision, cf-review-session, cf-search-code, and cf-invoke-skill.

  • context.importDocs — one-shot seeding from common onboarding docs like README.md, CHANGELOG.md, CONTRIBUTING.md, and AGENTS.md.

  • Recall-quality harness — benchmark recall@k and MRR with npm run bench:quality.

Operations & DX

  • 29 MCP tools — memory CRUD, recall/orientation, code search, code-index repair, graph query/import/export, docs import, backup/export/import, metrics/health, and 6 skill tools.

  • Graceful shutdown — drains in-flight calls, checkpoints WAL, and closes cleanly.

  • Data integrity — startup checks, explicit multi-row transactions, and online backups.

  • Observability — structured logging plus context.metrics and context.health.

  • Self-setupcontext.setup can install Context Fabric into supported CLIs.

  • Docker-first — easy docker run --rm -i transport with persistent named-volume storage.

Quick Start

Get running in a few minutes:

# 1. Clone and build the Docker image

git clone https://github.com/Abaddollyon/context-fabric.git
cd context-fabric
docker build -t context-fabric .

# 2. Verify the server responds

echo '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' \
  | docker run --rm -i context-fabric

3. Add it to your CLI with the Docker transport:

docker run --rm -i -v context-fabric-data:/data/.context-fabric context-fabric

See CLI Setup for copy-paste configs for all supported CLIs, or let your AI do it once Context Fabric is reachable:

"Install and configure Context Fabric for Cursor using Docker"

Requires Node.js 22.5+:

git clone https://github.com/Abaddollyon/context-fabric.git
cd context-fabric
npm install
npm run build

The server is at dist/server.js. Point your CLI MCP config at node dist/server.js.

Supported CLIs

CLI

Setup

Docs

Claude Code

context.setup({ cli: "claude-code" })

Guide

Kimi

context.setup({ cli: "kimi" })

Guide

OpenCode

context.setup({ cli: "opencode" })

Guide

Codex CLI

context.setup({ cli: "codex" })

Guide

Gemini CLI

context.setup({ cli: "gemini" })

Guide

Cursor

context.setup({ cli: "cursor" })

Guide

Claude Desktop

context.setup({ cli: "claude" })

Guide

TIP

Once Context Fabric is running in one MCP-capable tool, it can usually install itself into the others throughcontext.setup.

What it feels like

Start a session and the agent orients itself:

It is 9:15 AM on Wednesday, Feb 25 (America/New_York).
Project: /home/user/myapp.
Last session: 14 hours ago. 3 new memories were added while you were away.

Store a decision once:

{ "type": "decision", "content": "Use Zod for all API validation. Schemas live in src/schemas/." }

Recall it naturally later:

{ "query": "how do we validate inputs?" }
// => "Use Zod for all API validation. Schemas live in src/schemas/."

No cloud account. No hidden service dependency. No need to re-explain the codebase every session.

Performance

Numbers on a commodity dev box (Ryzen 7 5800H + RTX 3060 12 GB, warm run, 2026-04-28/29):

Retrieval quality — public benchmarks

Benchmark

Metric

Context Fabric (v0.14 rerun, GPU)

v0.13 published

OpenAI text-embedding-3-small

bge-base-en-v1.5 (dense-only)

BEIR SciFact

nDCG@10

0.7456

0.7439

0.774

0.740

BEIR SciFact

Recall@100

0.9633

0.9667

~0.93

BEIR FiQA-2018

nDCG@10

0.3809

0.3801

0.397

0.406

BEIR FiQA-2018

Recall@100

0.7360

0.7361

~0.69

LongMemEval_S (500 q, 25K sessions)

Hit@5

0.9200

0.9520

LongMemEval_S

Recall@10

0.9210

0.9472

Reading this: v0.14 keeps the v0.13 low-latency local retrieval path while adding explanation/artifact tooling for ranking diagnostics. BEIR top-k quality improved slightly in the rerun; LongMemEval's historical v0.13 number did not reproduce under the current cached runtime/dataset environment, so docs/benchmarks.md now documents both the published baseline and the v0.14 rerun with artifact output for regression analysis.

Latency and throughput

Workload

Result

BEIR SciFact query p50 (bge-base, GPU + sqlite-vec)

20 ms

BEIR FiQA query p50 (bge-base, GPU + sqlite-vec)

87 ms

LongMemEval_S query p50 (embedding-only, artifact-capable)

10.8 ms

L3 recall() @ 10K memories (FTS5 prefilter, CPU)

~8 ms p50, <100 ms p99

Ingest throughput (bge-base, RTX 3060 CUDA EP)

~170 docs/s (≈32× the CPU single-core baseline)

Full test suite

747 tests passing

Incremental tsc rebuild

~0.8 s

Server cold start (with L3 warm)

< 1 s

Benchmark scripts: benchmarks/recall-latency.ts (npm run bench), benchmarks/public/ (npm run bench:beir:scifact, npm run bench:beir:fiqa, npm run bench:longmemeval:s).

Architecture at a glance

CLI (Claude Code, Cursor, Codex, etc.)
  |
  | MCP protocol (stdio / Docker)
  v
Context Fabric Server
  |-- Smart Router -----> L1: Working Memory   (in-memory, session-scoped)
  |-- Time Service        L2: Project Memory   (SQLite, per-project)
  |-- Code Index          L3: Semantic Memory  (SQLite + embeddings, cross-project)

Memories auto-route to the right layer. Scratchpad notes go to L1. Decisions and bug fixes go to L2. Reusable patterns and conventions go to L3. See Architecture for the full deep dive.

Documentation

Resource

Description

Getting Started

Installation, first run, Docker and local setup

CLI Setup

Per-CLI configuration for all 7 supported CLIs

Tools Reference

Full docs for all 29 MCP tools

Skills

Procedural memory — create, invoke, and compose reusable skills

MCP Primitives

Resources (memory://...) and Prompts (cf-*)

Memory Types

Type system, layers, routing, decay, provenance, and dedup

Configuration

Storage paths, TTL, embedding notes, and environment variables

Agent Integration

System-prompt guidance for automatic tool usage

Architecture

Retrieval pipeline, internals, and performance design

Benchmarks

Public-benchmark results (BEIR SciFact / FiQA, LongMemEval_S) with reproduction commands

Changelog

Version history and upgrade notes

Wiki

Launch-friendly guides, FAQ, troubleshooting, and setup walkthroughs

Contributing

Contributions are welcome. See CONTRIBUTING.md for how to get started.

License

MIT


Stop re-explaining your codebase every session.

Get Started · Configure a CLI · Browse the Wiki · Report a Bug

A
license - permissive license
-
quality - not tested
B
maintenance

Maintenance

Maintainers
Response time
3dRelease cycle
17Releases (12mo)

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Abaddollyon/context-fabric'

If you have feedback or need assistance with the MCP directory API, please join our Discord server