AstraMemory Local
Provides locally-run LLM and embedding models for memory distillation, extraction, and search, enabling offline operation with zero inference cost.
Provides cloud-based LLM and embedding capabilities via Azure OpenAI for memory distillation, extraction, and search, with pay-per-token usage.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@AstraMemory Localsearch memories for sqlite-vec configuration"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
AstraMemory Local
Local-first memory daemon for AI coding agents — wire-compatible with memory-plugin.
Why it exists
Claude Code sessions compact and terminate, taking context with them. AstraMemory Local captures every session transcript, distills typed memories (decisions, facts, lessons, commands, todos), and serves them back via hybrid search (BM25 + vector + importance + freshness). It runs entirely on your workstation — no cloud account, no data leaves your machine. The plugin's hooks post to the local daemon instead of the SaaS endpoint through a single environment variable swap.
Related MCP server: RecallNest
Quick start (5 commands)
npm install -g @astragenie/astramemory-local
astra-memory init
# follow the wizard — picks Ollama or Azure, writes config.yaml + secrets.env
astra-memory service install
export MEMORY_API_URL=http://127.0.0.1:7777
export MEMORY_BEARER=$(astra-memory token print)Restart Claude Code. All plugin hooks (PreCompact, SessionEnd, SubagentStop) now post to the local daemon. No other plugin changes needed.
Architecture
memory-plugin hooks (unchanged)
|
| POST /ingest/transcript
| Authorization: Bearer <token>
v
+------------------+ SQLite (memory.sqlite)
| HTTP daemon | ---> +-------------------+
| Fastify | | sessions |
| 127.0.0.1:7777 | | messages |
+------------------+ | transcripts |
| jobs (queue) |
| memories |
| memories_fts (FTS5)|
| memories_vec (vec0)|
| budget_spend |
+-------------------+
|
in-process worker loop
|
8-stage distillation
(cleanup -> normalize ->
chunk -> compact ->
extract -> reduce ->
memory-normalize ->
embed + index)
|
+--------------+--------------+
| | |
memories FTS5 index sqlite-vec
(rows) (BM25 search) (cosine ANN)
| | |
+--------------+--------------+
|
hybrid score fusion
a*BM25 + b*cosine +
c*importance + d*freshness
|
GET /search POST /recall
|
/recall in plugin slash commandsSingle Node process. Workers run in-process on a polling loop. SQLite is the source of truth. Everything derived (vectors, FTS rows, compactions) can be rebuilt by replaying the jobs table.
Memory types
Type | Description | Example |
| Architectural or design choice made during a session | "Use sqlite-vec for v1 vector storage" |
| Objective project fact, configuration detail | "Port 7777 is the default daemon port" |
| Something that went wrong and how it was resolved | "sqlite-vec rowid must match memories rowid" |
| CLI command or script worth remembering | "npm run build && npm test -- migrate" |
| Outstanding work item surfaced in conversation | "Add reembed job when provider changes" |
Provider matrix
Concern | Ollama (local, free) | Azure OpenAI (cloud) |
LLM compaction | qwen2.5-coder:7b (default) | gpt-4.1 or any deployment |
LLM extraction | qwen2.5-coder:7b (default) | gpt-4.1 or any deployment |
Embedding | nomic-embed-text-v2-moe (1024-dim) | text-embedding-3-small (1024 via dimensions) |
Cost | $0 (local inference) | ~$0.02/1K tokens + $0.0001/1K embed tokens |
Setup |
| Azure portal + endpoint + deployment name |
Providers are configurable independently per stage. Embedding provider is system-wide — switching
requires astra-memory rebuild --reembed to re-index all memories in the new model's vector space.
See docs/providers.md for full setup instructions.
MCP tools (Claude Code auto-discovery)
The daemon exposes a Model Context Protocol (Streamable HTTP) endpoint at POST /mcp.
Claude Code discovers and calls the 4 tools below automatically when configured in .mcp.json.
Tool | Description | Maps to |
| Hybrid FTS + vector search with optional type/repo/project/since filters |
|
| Top-K semantic recall (default k=5) |
|
| Direct memory insert, bypasses distillation |
|
| Daemon health probe: |
|
Plugin .mcp.json wiring:
{
"mcpServers": {
"astramem": {
"type": "http",
"url": "${MEMORY_API_URL}/mcp",
"headers": { "Authorization": "Bearer ${MEMORY_BEARER}" }
}
}
}Set MEMORY_API_URL=http://127.0.0.1:7777 and MEMORY_BEARER to your token
(printed by astra-memory token print).
Budget cap
The daily LLM spend cap (default: $10 USD) is enforced before each LLM call.
Ollama always reports
$0cost — the cap only applies to Azure usage.When the cap is reached, pending distillation jobs move to
pausedstate. Ingest continues to accept transcripts (no data loss). Distillation resumes the next UTC day automatically.Override:
astra-memory budget --reset(logged).Check current spend:
astra-memory budget.
Commands reference
Command | What it does |
| Interactive wizard — writes config + secrets, runs migrations, installs service |
| Start daemon in foreground (dev/debug) |
| Register daemon as a user-scope OS service |
| Show service state |
| Start the service |
| Stop the service |
| Remove the service unit |
| Run all health checks, print table |
| Machine-readable health check output |
| Hybrid search, print results table |
| Filter by memory type |
| Top-5 semantic recall (alias for search k=5) |
| Direct insert, bypasses distillation pipeline |
| Show pending/failed jobs |
| Show only failed jobs |
| Rebuild derived indexes; --reembed re-vectors all |
| List configured providers and their health |
| Ping provider, print latency + dim |
| Show today and month spend vs cap |
| Clear today's spend counter (override, logged) |
| Print the current Bearer token |
| Generate new token, invalidate the old one |
Further reading
docs/migration-from-saas.md — switch the plugin from remote SaaS to local daemon
docs/configuration.md — full config.yaml reference
docs/providers.md — Ollama and Azure OpenAI setup
docs/troubleshooting.md — common issues and fixes
docs/contracts.md — frozen type interfaces (for contributors)
CHANGELOG.md — release history
Status
v0.1.0 — Waves 1-4 of the implementation plan completed.
Wave 1: SQLite schema, migration runner, FTS5, sqlite-vec, ingest endpoint, Fastify server, CLI skeleton.
Wave 2: Job worker loop, hybrid search, service install adapters, Ollama + Azure providers.
Wave 3: 8-stage distillation pipeline, budget tracker, Zod-validated extraction.
Wave 4: Install wizard, cross-OS CI matrix, E2E plugin integration test, this documentation.
Spec: astramemory-plugin/docs/superpowers/specs/2026-06-27-astramemory-local-v1-design.md
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/astragenie/astramem-local'
If you have feedback or need assistance with the MCP directory API, please join our Discord server