Vektor Memory
Enables CLI chat with persistent memory using Google Gemini models.
Enables CLI chat with persistent memory using local Ollama models, supporting automatic model detection.
Enables CLI chat with persistent memory using OpenAI models.
VEKTOR MEMORY - Slipstream
Persistent memory for AI agents. Local-first. No cloud. No amnesia.
Documentation · Install · Quick Start · MCP Tools · Pricing
VEKTOR fixes the architecture. Not the prompt.
The problems are architectural, not instructional. You cannot prompt your way out of a stateless architecture.
Session starts ──► Reconstruct context from logs ──► 10,000–30,000 tokens burned
before a single line of work runs
Cron job fires ──► Agent has no memory of last run ──► Repeats completed work
Loops. Bills stack up.
Add more guardrails ──► Longer prompts ──► More tokens
──► More complexity ──► More failure surface
──► More maintenance ──► Less time savedThe control paradox: the more control you try to add through prompts, the more expensive and fragile the system becomes. You end up spending more time fixing the automation than the automation saves.
The Solution Stack
┌─────────────────────────────────────────────────────────────┐
│ │
│ DXT drag-and-drop install · 44 tools registered │
│ automatically · no JSON editing |
│ │
│ MCP stateless on-demand tool invocation │
│ no persistent process between runs │
│ agent wakes, works, terminates cleanly │
│ │
│ Skill ~150 tokens of scoped context injected │
│ Files only when relevant · unloaded when done │
│ 90% less context overhead per session │
│ │
│ VEKTOR ◄─ persistent memory graph · BM25 + vector RRF │
│ recall · self-organising intelligence layer │
│ state that actually survives between sessions │
│ │
└─────────────────────────────────────────────────────────────┘Layer | Solves | Token impact |
DXT | Setup friction, misconfigured tools | Surfaces only relevant tools per task |
MCP | Persistent process requirement, cold starts | Stateless invocation on demand |
Skill Files | Monster prompts, competing instructions | 150 tokens vs 8,000–20,000 |
VEKTOR | Session amnesia, broken cron jobs, control paradox | 250–4,000 tokens regardless of DB size |
Install
npm install -g vektor-slipstream
npx vektor setupOr drag vektor-slipstream.dxt directly into Claude Desktop. All 44 tools register automatically. No JSON editing. No path configuration.
Quick Start
const { createMemory } = require('vektor-slipstream');
const memory = await createMemory({
agentId: 'my-agent',
licenceKey: process.env.VEKTOR_LICENCE_KEY,
});
// Store a memory
await memory.remember('User prefers TypeScript. Deployed to prod on Friday.');
// Recall by semantic similarity -- sub-1ms, fully local
const results = await memory.recall('deployment preferences', 5);
// → [{ content, score, id, timestamp }]
// Traverse the associative memory graph
const graph = await memory.graph('TypeScript', { hops: 2 });
// What changed in the last 7 days?
const delta = await memory.delta('project decisions', 7);
// Morning briefing from recent memories
const brief = await memory.briefing();Before vs After
Without VEKTOR | With VEKTOR | |
Context cost per session | 15,000–50,000 tokens reconstructing history | 250–4,000 tokens for full semantic recall |
Cron jobs | Agent repeats completed work -- no memory of last run | Recalls previous run outcome in one call |
Configuration memory | Forgotten every session | Graph surfaces what worked last time automatically |
Autonomy vs control | Either full autonomy (dangerous) or manual gates (slow) | Agent learns from outcome history when to proceed vs escalate |
Between-session state | Persistent process required or state is lost | SQLite persists -- stateless invocation, stateful recall |
Embedding cost | Cloud API call on every store and recall | $0 -- fully local ONNX, no API key required |
Session Flow
Task triggered (cron / webhook / user action)
│
▼
Skill File injected based on task context ~150 tokens
│
▼
vektor_recall_rrf called ~800 tokens
Top-10 semantically relevant memories returned
│
▼
Agent classifies situation from memory history
│
┌────┴─────────────────────┐
▼ ▼
familiar pattern novel / previously failed
proceed autonomously surface for human review
│ │
└────────────┬─────────────┘
▼
Execute task via MCP tools
│
▼
Result stored via vektor_store
Memory graph updated with outcome
│
▼
Session ends · SQLite persists everything
│
▼
Next invocation: same startup cost · full outcome history availableTotal context overhead for a routine task: under 2,000 tokens. The same task with a monolithic system prompt and history reconstruction: 15,000–50,000 tokens, with no retention of outcome.
Performance
Metric | Value |
Recall latency | sub-1ms (local SQLite + ONNX) |
Embedding cost | $0 -- fully local ONNX |
Embedding latency | ~10ms GPU / ~25ms CPU |
LoCoMo benchmark | 66.9% adjusted judge accuracy |
Min tokens for full recall | 250 |
Max tokens regardless of DB size | 4,000 |
First run | ~2 min (downloads ~25MB model once) |
Subsequent boots | <100ms |
**LoCoMo benchmark results
Category | Accuracy |
Multi-hop | 79.1% |
Adversarial | 70.4% |
Single-hop | 51.6% |
Temporal | 46.2% |
Adjusted total | 66.9% |
CLI Chat -- Persistent Memory Across Every Session
npx vektor chat # auto-detects Ollama
npx vektor chat --provider claude # Anthropic Claude
npx vektor chat --provider groq --model llama-3.3-70b-versatile
npx vektor chat --provider openai
npx vektor chat --provider geminiProvider | Details |
| Default -- free, local, no API key. Auto-detects best model. |
| Anthropic Claude -- set |
| OpenAI GPT -- set |
| Groq LLaMA -- set |
| Google Gemini -- set |
In-chat commands:
Command | Action |
| Search memory mid-conversation |
| Node count, edges, pinned memories |
| Generate memory briefing inline |
| Exit (Ctrl+C also works) |
One-liner commands:
# Store facts
npx vektor remember "I prefer TypeScript over JavaScript"
npx vektor remember "deadline is Friday" --importance 5
cat meeting-notes.txt | npx vektor remember
# Query
npx vektor ask "what stack am I using?"
npx vektor ask "what did we decide about the database?"
# Autonomous agent
npx vektor agent "summarise everything I know about project Alpha"
npx vektor agent "research AI memory tools" --steps 15 --provider groqClaude Desktop Extension (DXT)
Install the .dxt extension for zero-config persistent memory in every Claude Desktop session.
Install: drag vektor-slipstream.dxt onto the Claude Desktop Extensions page.
Once installed, Claude automatically:
Recalls relevant context at session start
Stores facts and decisions during conversation
Summarises and consolidates at session end
All 44 tools available. No configuration beyond your licence key.
Download: vektormemory.com/docs/dxt
MCP Tools -- All 44
Memory
Tool | Function |
| Semantic + BM25 + graph search across memory |
| BM25+RRF dual-channel recall with cross-encoder rerank |
| Store memory with importance score |
| Batch ingest conversation turns with session date |
| Traverse associative memory graph |
| See what changed on a topic over time |
| Generate briefing from recent memories |
| Memory DB stats -- node count, edges, entities |
| Query memories by date range |
Cloak -- Stealth Browser, SSH, Fetch
Tool | Function |
| Stealth headless browser fetch via Playwright |
| Checks llms.txt first, falls back to stealth browser |
| Full CSS/DOM layout sensor |
| Semantic diff of URL since last fetch |
| Structural diff between two text blobs |
| AES-256-GCM credential vault (get/set/delete/list) |
| Execute commands on remote server via SSH |
| Upload file to remote server via SFTP |
| Scan project directory into memory graph |
| Get cached file anatomy without rescanning |
| Token efficiency ROI calculator |
Identity + Behaviour (Anti-Bot Bypass)
Tool | Function |
| Create persistent browser fingerprint identity |
| Apply saved identity to a fetch call |
| List saved identities with trust summary |
| Human mouse/scroll injection for reCAPTCHA/Cloudflare |
| List available patterns and categories |
| Load custom recorded behaviour pattern |
| Self-improving pattern store tier breakdown |
| List patterns with scores and tier |
| Remove stale/low-scoring patterns |
| Seed store with built-in patterns |
CAPTCHA
Tool | Function |
| Detect CAPTCHA type and sitekey |
| Solve via vision AI (Claude/GPT-4o/2captcha) |
Compression
Tool | Function |
| PolarQuant vector compression (~75% smaller) |
| Compression ratio and savings stats |
Multimodal
Tool | Function |
| Text generation (OpenAI/Claude/Groq/Gemini/NVIDIA NIM) |
| Image generation (DALL-E, Stability, NVIDIA) |
| Image understanding and analysis |
| Text-to-speech and transcription |
| Web search with memory integration |
| List available providers and status |
Agent
Tool | Function |
| Autonomous goal executor with memory |
| Multi-agent swarm task |
| File system watcher -- auto-ingest on change |
All CLI Commands
npx vektor setup # First-run wizard -- licence, hardware, integrations
npx vektor activate # Activate licence key on this machine
npx vektor test # Test memory engine with progress bar
npx vektor status # System health check
npx vektor mcp # Start Claude Desktop MCP server
npx vektor rem # Run REM dream cycle (memory consolidation)
npx vektor chat # Persistent memory chat (all LLMs)
npx vektor remember # Store a fact
npx vektor ask # Query memory + LLM answer
npx vektor agent # Autonomous goal executor
npx vektor help # All commandsClaude Code Setup
Add to .claude/settings.json in your project:
{
"mcpServers": {
"vektor": {
"command": "node",
"args": ["/path/to/node_modules/vektor-slipstream/index.js"],
"env": {
"VEKTOR_LICENCE_KEY": "your-licence-key",
"CLOAK_PROJECT_PATH": "/path/to/your/project"
}
}
}
}What's Included
Memory Core (MAGMA)
4-layer associative graph -- semantic, causal, temporal, entity edges
bge-small-en-v1.5 bi-encoder + ms-marco cross-encoder reranker
BM25 + stemmed BM25 + RRF fusion -- keyword + semantic dual-channel recall
Persistent entity index -- guaranteed named-entity retrieval
Foresight extraction -- future-tense statements stored with temporal metadata
ADD-only contradiction detection -- full history preserved, no silent overwrites
REM dream cycle -- up to 50:1 memory compression
Sub-1ms recall -- local SQLite, no network required
Local ONNX embeddings -- $0 embedding cost, no API key required
Intelligence Layer (runs automatically, no config)
Module | Function |
| Adjusts retrieval weights based on which memories produced correct outcomes |
| Scores memories by reliability across corroborating sources |
| Removes semantic duplicates, keeps the graph clean |
| Reorganises memory clusters as new information accumulates |
| Reinforcement signals surface higher-quality memories preferentially |
| Periodic summaries of memory activity |
Integrations
Claude Desktop -- DXT extension, 44 tools, auto-memory on every session
Claude Code -- MCP server, all 44 tools
CLI --
chat,remember,ask,agentcommandsLangChain -- v1 + v2 adapter included
OpenAI Agents SDK -- drop-in integration
Groq · Gemini · Ollama · NVIDIA NIM -- provider agnostic
Hardware Auto-Detection
Zero config. VEKTOR detects and uses the best available accelerator:
NVIDIA CUDA -- GPU acceleration
Apple Silicon -- CoreML
CPU -- optimised fallback, works everywhere
Environment Variables
Variable | Default | Purpose |
|
| Enable LLM session summarisation on ingest |
|
| Enable batch triple extraction on ingest |
|
| Extract future-tense foresight signals |
|
| Enable temporal index and date boosting |
|
| Enable ADD-only contradiction detection |
| -- | Enable verbose retrieval debug output |
|
| Swap embedding model |
|
| Enable cross-encoder reranking |
Research Foundation
Built on peer-reviewed research:
MAGMA (arxiv:2601.03236) -- Multi-Graph Agentic Memory Architecture
EverMemOS (arxiv:2601.02163) -- Self-Organizing Memory OS
HippoRAG (arxiv:2405.14831) -- Neurobiologically Inspired Long-Term Memory (NeurIPS 2024)
Mem0 (arxiv:2504.19413) -- Production-Ready Agent Memory
LoCoMo Benchmark (arxiv:2402.17753) -- Long-Context Conversational Memory
Pricing
Plan | Price | Licences |
Solo | $9/mo | 3 |
Team | $35/mo | 5 |
Studio | $59/mo | 10 |
Enterprise | $99/mo | 25 |
What's New in v1.5.0
Retrieval pipeline rebuilt from scratch:
bge-small-en-v1.5 bi-encoder + ms-marco cross-encoder reranker (spec-decode architecture)
BM25 + Porter-stemmed BM25 + named entity injection, fused via RRF
MAGMA graph layer -- co-occurrence and temporal edges between entities in SQLite
Persistent entity index (
vektor_entities) for guaranteed named-entity recallForesight extraction -- future-tense statements stored for temporal queries
Question type classifier -- routes single-hop vs multi-hop to optimal retrieval path
ADD-only contradiction detection -- conflicting facts survive with timestamps
Agentic sufficiency check -- reformulates query if key entities missing from top results
vektormemory.com · Docs · hello@vektormemory.com
Stop prompting like it's 2024. Build agents that remember.
This server cannot be installed
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/Vektor-Memory/Vektor-memory'
If you have feedback or need assistance with the MCP directory API, please join our Discord server