vMem

Virtual memory for LLM context. For Claude Code and every AI agent.

Your AI never forgets — no more "context compacted" interruptions.

Python SQLite Tests License Discussions

English · 中文

One-line install via Claude Code:
/install-plugin github:soolaugust/vMem

The problem: context compaction kills your flow

If you use Claude Code, you know this pain:

⚠️ Auto-compact: conversation is approaching context limit...

Every time this happens, your AI loses track of decisions, constraints, and hard-won context. You re-explain. It re-learns. Hours of accumulated understanding — gone in one compaction event.

And if you run multiple agents? They can't share what they've learned. Each one starts from zero.

This isn't a model limitation. It's a missing infrastructure layer.

Related MCP server: mnemos

The solution: persistent context that survives compaction

vMem gives your AI agents persistent, retrievable context managed like virtual memory: the context window is the hot working set, and durable knowledge lives outside it until demand-paged back in.

The result: OS-managed context continuity. Your AI retains every decision, constraint, and lesson across sessions, across compactions, across agents.

How it works

You speak
  → vMem retrieves relevant memories → injects into context
  → AI responds with full context
  → Session ends → decisions and insights auto-extracted → persisted
  → Compaction happens? No problem — memories survive outside the window
  → Next session starts → working set restored automatically

The whole pipeline runs inside Claude Code hooks. There is no manual memory management.

Why "vMem"?

vMem is virtual memory for LLM context: instead of treating the context window as the whole world, it manages a working set with OS primitives.

What others see	What vMem does
"Context compacted"	Durable knowledge already lives outside the window
New session starts	Working set auto-restored in <100ms
Multiple agents running	All share one managed context substrate
Constraint decided 3 weeks ago	Pinned with `mlock`-style semantics

OS-managed context. Durable working sets. No repeated explanation.

Under the hood: OS context management for AI

The secret sauce? We didn't invent new algorithms. We borrowed what the Linux kernel has been doing for 40 years:

OS concept	vMem equivalent
RAM (working space)	Context window — what the AI sees right now
Disk (persistent storage)	Knowledge base — facts that survive across sessions
Demand paging	On-demand retrieval — fetch relevant memories at the right moment
`mlock`	Hard / soft pinning — guarantee a constraint is never evicted
kswapd watermarks	Capacity-aware eviction under pressure
CRIU checkpoint / restore	Session snapshots — pause and resume seamlessly
Process scheduling	Multi-agent coordination — many agents, one knowledge base
kworker thread pool	Async extraction — I/O off the critical path

How is this different from mem0 / Letta / Zep?

	vMem	mem0	Letta (MemGPT)	Zep
Design metaphor	OS-managed context	Vector store	Agent runtime	Temporal graph
Context continuity	✅ pinned knowledge survives	❌	❌	❌
Multi-agent shared	✅ native, single store	⚠️ via API	✅	✅
MCP-native	✅ first-class	❌	❌	❌
Single-file deploy	✅ SQLite, no service	❌ needs server	❌ needs server	❌ needs server
Demand-paging retrieval	✅ explicit	implicit	implicit	implicit
Eviction policy	✅ kswapd + DAMON	TTL only	recency	recency + decay
Pin / mlock semantics	✅	❌	❌	❌

TL;DR. If you're tired of context compaction wiping your AI's memory, and you want a solution that's pip install, runs as a sidecar on a laptop, shares between several Claude Code / Cursor / custom agents, and never loses a pinned constraint — vMem is built for that.

Performance at a glance

Metric	Value
Retrieval latency (P50, hot path)	~0.1 ms (540x faster than the 54 ms subprocess baseline)
Recall@3 vs baseline	+147%
Cross-session recall	94.2%
Token cost per call	~44 tokens injected, +256 tokens net ROI (avoided re-explanation)
Test suite	3,500+ tests across retrieval, eviction, MCP, privacy filter

Quick start

One-line install (recommended).

/install-plugin github:soolaugust/vMem

Manual install.

git clone https://github.com/soolaugust/vMem
cd vMem
pip install -e .
mkdir -p ~/.claude/memory-os

Detailed Claude Code hook configuration, daemon management, and troubleshooting live in docs/SETUP.md.

Architecture

Three layers:

Hooks — sit at the Claude Code syscall boundary (SessionStart, UserPromptSubmit, Stop, PostToolUse) and call into the store.
Store — single SQLite file (WAL mode) with FTS5 full-text index, behind a unified VFS interface (memory_os.store.api / memory_os.store.vfs / memory_os.store.criu).
Daemons & IPC — persistent retriever daemon (Unix socket), async extractor pool (kworker-style), cross-agent notify bus.

For the full layered diagram, on-disk schema, and the rationale behind each subsystem, see docs/ARCHITECTURE.md. For the comprehensive OS-and-cognitive-science primitive mapping, see docs/DESIGN_PHILOSOPHY.md.

Roadmap

Distributed vMem — cgroup-style multi-agent quotas, network-replicated stores
Adaptive watermarks — eviction tuning that follows observed agent behavior
arXiv preprint — formal evaluation against mem0 / Letta / Zep
Per-chunk embedding routing — different models for code vs prose

What landed already (1,051+ tuning iterations, eight major capability rounds) is summarized in CHANGELOG.md. Pain points it has resolved along the way are in docs/PROBLEMS_SOLVED.md.

Testing

# stable test subset
python3 -m pytest tests/test_agent_team.py tests/test_chaos.py -q

Coverage: per-session DB isolation, concurrent-write safety, cross-agent IPC delivery, extractor-pool queue semantics, CRIU checkpoint validation, goals-progress idempotency.

Dependencies

No GPU. No external API. Everything runs locally.

Dependency	Purpose
Python 3.12+	Core runtime
SQLite (built-in)	Store + FTS5 full-text index
`nc`, `flock`	Daemon socket + single-instance startup

Paper

📄 Beyond Eviction: Full OS Context-Management Semantics for LLM Agent Persistence (PDF, 8 pages)

Technical paper describing the complete OS→agent-context mapping: demand paging, kswapd, DAMON, mlock, CRIU, kworker, and shared memory.

Citation

@software{su2026compactmem,
  title = {vMem: Full OS Memory Semantics for LLM Agent Persistence},
  author = {Su, Zhidao},
  year = {2026},
  url = {https://github.com/soolaugust/vMem}
}

Contributing

Each subsystem hides behind a clean VFS interface, so components are testable in isolation. Issues, design proposals, and pull requests are welcome — see the Discussions tab for design questions, and please run the test subset above before submitting a PR.

Context compaction is the #1 productivity killer in Claude Code. vMem makes it a non-event.

English · 中文

0CompactMem