Skip to main content
Glama

VEKTOR MEMORY - Slipstream

Persistent memory for AI agents. Local-first. No cloud. No amnesia.

npm version npm downloads LoCoMo benchmark recall latency license MCP tools

Documentation · Install · Quick Start · MCP Tools · Pricing


VEKTOR fixes the architecture. Not the prompt.

The problems are architectural, not instructional. You cannot prompt your way out of a stateless architecture.

Session starts        ──►  Reconstruct context from logs   ──►  10,000–30,000 tokens burned
                                                                 before a single line of work runs

Cron job fires        ──►  Agent has no memory of last run  ──►  Repeats completed work
                                                                 Loops. Bills stack up.

Add more guardrails   ──►  Longer prompts                   ──►  More tokens
                      ──►  More complexity                  ──►  More failure surface
                      ──►  More maintenance                 ──►  Less time saved

The control paradox: the more control you try to add through prompts, the more expensive and fragile the system becomes. You end up spending more time fixing the automation than the automation saves.


The Solution Stack

┌─────────────────────────────────────────────────────────────┐
│                                                             │
│   DXT       drag-and-drop install · 44 tools registered     │
│             automatically · no JSON editing                 |  
│                                                             │
│   MCP       stateless on-demand tool invocation             │
│             no persistent process between runs              │
│             agent wakes, works, terminates cleanly          │
│                                                             │
│   Skill     ~150 tokens of scoped context injected          │
│   Files     only when relevant · unloaded when done         │
│             90% less context overhead per session           │
│                                                             │
│   VEKTOR ◄─ persistent memory graph · BM25 + vector RRF     │
│             recall · self-organising intelligence layer     │
│             state that actually survives between sessions   │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Layer

Solves

Token impact

DXT

Setup friction, misconfigured tools

Surfaces only relevant tools per task

MCP

Persistent process requirement, cold starts

Stateless invocation on demand

Skill Files

Monster prompts, competing instructions

150 tokens vs 8,000–20,000

VEKTOR

Session amnesia, broken cron jobs, control paradox

250–4,000 tokens regardless of DB size


Install

npm install -g vektor-slipstream
npx vektor setup

Or drag vektor-slipstream.dxt directly into Claude Desktop. All 44 tools register automatically. No JSON editing. No path configuration.


Quick Start

const { createMemory } = require('vektor-slipstream');

const memory = await createMemory({
  agentId: 'my-agent',
  licenceKey: process.env.VEKTOR_LICENCE_KEY,
});

// Store a memory
await memory.remember('User prefers TypeScript. Deployed to prod on Friday.');

// Recall by semantic similarity -- sub-1ms, fully local
const results = await memory.recall('deployment preferences', 5);
// → [{ content, score, id, timestamp }]

// Traverse the associative memory graph
const graph = await memory.graph('TypeScript', { hops: 2 });

// What changed in the last 7 days?
const delta = await memory.delta('project decisions', 7);

// Morning briefing from recent memories
const brief = await memory.briefing();

Before vs After

Without VEKTOR

With VEKTOR

Context cost per session

15,000–50,000 tokens reconstructing history

250–4,000 tokens for full semantic recall

Cron jobs

Agent repeats completed work -- no memory of last run

Recalls previous run outcome in one call

Configuration memory

Forgotten every session

Graph surfaces what worked last time automatically

Autonomy vs control

Either full autonomy (dangerous) or manual gates (slow)

Agent learns from outcome history when to proceed vs escalate

Between-session state

Persistent process required or state is lost

SQLite persists -- stateless invocation, stateful recall

Embedding cost

Cloud API call on every store and recall

$0 -- fully local ONNX, no API key required


Session Flow

Task triggered (cron / webhook / user action)
        │
        ▼
Skill File injected based on task context          ~150 tokens
        │
        ▼
vektor_recall_rrf called                           ~800 tokens
Top-10 semantically relevant memories returned
        │
        ▼
Agent classifies situation from memory history
        │
   ┌────┴─────────────────────┐
   ▼                          ▼
familiar pattern          novel / previously failed
proceed autonomously       surface for human review
   │                          │
   └────────────┬─────────────┘
                ▼
Execute task via MCP tools
        │
        ▼
Result stored via vektor_store
Memory graph updated with outcome
        │
        ▼
Session ends · SQLite persists everything
        │
        ▼
Next invocation: same startup cost · full outcome history available

Total context overhead for a routine task: under 2,000 tokens. The same task with a monolithic system prompt and history reconstruction: 15,000–50,000 tokens, with no retention of outcome.


Performance

Metric

Value

Recall latency

sub-1ms (local SQLite + ONNX)

Embedding cost

$0 -- fully local ONNX

Embedding latency

~10ms GPU / ~25ms CPU

LoCoMo benchmark

66.9% adjusted judge accuracy

Min tokens for full recall

250

Max tokens regardless of DB size

4,000

First run

~2 min (downloads ~25MB model once)

Subsequent boots

<100ms

**LoCoMo benchmark results

Category

Accuracy

Multi-hop

79.1%

Adversarial

70.4%

Single-hop

51.6%

Temporal

46.2%

Adjusted total

66.9%


CLI Chat -- Persistent Memory Across Every Session

npx vektor chat                                        # auto-detects Ollama
npx vektor chat --provider claude                      # Anthropic Claude
npx vektor chat --provider groq --model llama-3.3-70b-versatile
npx vektor chat --provider openai
npx vektor chat --provider gemini

Provider

Details

ollama

Default -- free, local, no API key. Auto-detects best model.

claude

Anthropic Claude -- set ANTHROPIC_API_KEY

openai

OpenAI GPT -- set OPENAI_API_KEY

groq

Groq LLaMA -- set GROQ_API_KEY (free tier available)

gemini

Google Gemini -- set GEMINI_API_KEY

In-chat commands:

Command

Action

/recall <query>

Search memory mid-conversation

/stats

Node count, edges, pinned memories

/briefing

Generate memory briefing inline

/exit

Exit (Ctrl+C also works)

One-liner commands:

# Store facts
npx vektor remember "I prefer TypeScript over JavaScript"
npx vektor remember "deadline is Friday" --importance 5
cat meeting-notes.txt | npx vektor remember

# Query
npx vektor ask "what stack am I using?"
npx vektor ask "what did we decide about the database?"

# Autonomous agent
npx vektor agent "summarise everything I know about project Alpha"
npx vektor agent "research AI memory tools" --steps 15 --provider groq

Claude Desktop Extension (DXT)

Install the .dxt extension for zero-config persistent memory in every Claude Desktop session.

Install: drag vektor-slipstream.dxt onto the Claude Desktop Extensions page.

Once installed, Claude automatically:

  • Recalls relevant context at session start

  • Stores facts and decisions during conversation

  • Summarises and consolidates at session end

All 44 tools available. No configuration beyond your licence key.

Download: vektormemory.com/docs/dxt


MCP Tools -- All 44

Memory

Tool

Function

vektor_recall

Semantic + BM25 + graph search across memory

vektor_recall_rrf

BM25+RRF dual-channel recall with cross-encoder rerank

vektor_store

Store memory with importance score

vektor_ingest

Batch ingest conversation turns with session date

vektor_graph

Traverse associative memory graph

vektor_delta

See what changed on a topic over time

vektor_briefing

Generate briefing from recent memories

vektor_stats

Memory DB stats -- node count, edges, entities

vektor_timeline

Query memories by date range

Cloak -- Stealth Browser, SSH, Fetch

Tool

Function

cloak_fetch

Stealth headless browser fetch via Playwright

cloak_fetch_smart

Checks llms.txt first, falls back to stealth browser

cloak_render

Full CSS/DOM layout sensor

cloak_diff

Semantic diff of URL since last fetch

cloak_diff_text

Structural diff between two text blobs

cloak_passport

AES-256-GCM credential vault (get/set/delete/list)

cloak_ssh_exec

Execute commands on remote server via SSH

cloak_ssh_upload

Upload file to remote server via SFTP

cloak_cortex

Scan project directory into memory graph

cloak_cortex_anatomy

Get cached file anatomy without rescanning

tokens_saved

Token efficiency ROI calculator

Identity + Behaviour (Anti-Bot Bypass)

Tool

Function

cloak_identity_create

Create persistent browser fingerprint identity

cloak_identity_use

Apply saved identity to a fetch call

cloak_identity_list

List saved identities with trust summary

cloak_inject_behaviour

Human mouse/scroll injection for reCAPTCHA/Cloudflare

cloak_behaviour_stats

List available patterns and categories

cloak_load_pattern

Load custom recorded behaviour pattern

cloak_pattern_stats

Self-improving pattern store tier breakdown

cloak_pattern_list

List patterns with scores and tier

cloak_pattern_prune

Remove stale/low-scoring patterns

cloak_pattern_seed

Seed store with built-in patterns

CAPTCHA

Tool

Function

cloak_detect_captcha

Detect CAPTCHA type and sitekey

cloak_solve_captcha

Solve via vision AI (Claude/GPT-4o/2captcha)

Compression

Tool

Function

turbo_quant_compress

PolarQuant vector compression (~75% smaller)

turbo_quant_stats

Compression ratio and savings stats

Multimodal

Tool

Function

vektor_text

Text generation (OpenAI/Claude/Groq/Gemini/NVIDIA NIM)

vektor_image

Image generation (DALL-E, Stability, NVIDIA)

vektor_vision

Image understanding and analysis

vektor_speech

Text-to-speech and transcription

vektor_search

Web search with memory integration

vektor_providers

List available providers and status

Agent

Tool

Function

vektor_agent_run

Autonomous goal executor with memory

vektor_swarm

Multi-agent swarm task

vektor_watch

File system watcher -- auto-ingest on change


All CLI Commands

npx vektor setup       # First-run wizard -- licence, hardware, integrations
npx vektor activate    # Activate licence key on this machine
npx vektor test        # Test memory engine with progress bar
npx vektor status      # System health check
npx vektor mcp         # Start Claude Desktop MCP server
npx vektor rem         # Run REM dream cycle (memory consolidation)
npx vektor chat        # Persistent memory chat (all LLMs)
npx vektor remember    # Store a fact
npx vektor ask         # Query memory + LLM answer
npx vektor agent       # Autonomous goal executor
npx vektor help        # All commands

Claude Code Setup

Add to .claude/settings.json in your project:

{
  "mcpServers": {
    "vektor": {
      "command": "node",
      "args": ["/path/to/node_modules/vektor-slipstream/index.js"],
      "env": {
        "VEKTOR_LICENCE_KEY": "your-licence-key",
        "CLOAK_PROJECT_PATH": "/path/to/your/project"
      }
    }
  }
}

What's Included

Memory Core (MAGMA)

  • 4-layer associative graph -- semantic, causal, temporal, entity edges

  • bge-small-en-v1.5 bi-encoder + ms-marco cross-encoder reranker

  • BM25 + stemmed BM25 + RRF fusion -- keyword + semantic dual-channel recall

  • Persistent entity index -- guaranteed named-entity retrieval

  • Foresight extraction -- future-tense statements stored with temporal metadata

  • ADD-only contradiction detection -- full history preserved, no silent overwrites

  • REM dream cycle -- up to 50:1 memory compression

  • Sub-1ms recall -- local SQLite, no network required

  • Local ONNX embeddings -- $0 embedding cost, no API key required

Intelligence Layer (runs automatically, no config)

Module

Function

recall-tune

Adjusts retrieval weights based on which memories produced correct outcomes

confidence

Scores memories by reliability across corroborating sources

dedup

Removes semantic duplicates, keeps the graph clean

selforg

Reorganises memory clusters as new information accumulates

rl-memory

Reinforcement signals surface higher-quality memories preferentially

briefing-scheduler

Periodic summaries of memory activity

Integrations

  • Claude Desktop -- DXT extension, 44 tools, auto-memory on every session

  • Claude Code -- MCP server, all 44 tools

  • CLI -- chat, remember, ask, agent commands

  • LangChain -- v1 + v2 adapter included

  • OpenAI Agents SDK -- drop-in integration

  • Groq · Gemini · Ollama · NVIDIA NIM -- provider agnostic


Hardware Auto-Detection

Zero config. VEKTOR detects and uses the best available accelerator:

  • NVIDIA CUDA -- GPU acceleration

  • Apple Silicon -- CoreML

  • CPU -- optimised fallback, works everywhere


Environment Variables

Variable

Default

Purpose

VEKTOR_SUMMARIZE

false

Enable LLM session summarisation on ingest

VEKTOR_TRIPLES

true

Enable batch triple extraction on ingest

VEKTOR_FORESIGHT

true

Extract future-tense foresight signals

VEKTOR_TEMPORAL

true

Enable temporal index and date boosting

VEKTOR_CONTRADICT

true

Enable ADD-only contradiction detection

VEKTOR_DEBUG

--

Enable verbose retrieval debug output

VEKTOR_MODEL

Xenova/bge-small-en-v1.5

Swap embedding model

VEKTOR_RERANK

true

Enable cross-encoder reranking


Research Foundation

Built on peer-reviewed research:


Pricing

Plan

Price

Licences

Solo

$9/mo

3

Team

$35/mo

5

Studio

$59/mo

10

Enterprise

$99/mo

25

Purchase at vektormemory.com


What's New in v1.5.0

Retrieval pipeline rebuilt from scratch:

  • bge-small-en-v1.5 bi-encoder + ms-marco cross-encoder reranker (spec-decode architecture)

  • BM25 + Porter-stemmed BM25 + named entity injection, fused via RRF

  • MAGMA graph layer -- co-occurrence and temporal edges between entities in SQLite

  • Persistent entity index (vektor_entities) for guaranteed named-entity recall

  • Foresight extraction -- future-tense statements stored for temporal queries

  • Question type classifier -- routes single-hop vs multi-hop to optimal retrieval path

  • ADD-only contradiction detection -- conflicting facts survive with timestamps

  • Agentic sufficiency check -- reformulates query if key entities missing from top results


vektormemory.com · Docs · hello@vektormemory.com

Stop prompting like it's 2024. Build agents that remember.

F
license - not found
-
quality - not tested
C
maintenance

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Vektor-Memory/Vektor-memory'

If you have feedback or need assistance with the MCP directory API, please join our Discord server