How do I use Turbo Quant Memory MCP Server?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Turbo Quant Memory MCP Server search memory for previous decisions on logging" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

Turbo Quant Memory MCP Server

by Lexus2016

Overview Schema Related Servers Score Discussions

Python

Local

🧠 Turbo Quant Memory for AI Agents (v0.18.0)

The first self-installable, trilingual local-first memory & knowledge graph for AI coding agents. Save up to 60% of your token budget while giving your AI assistant a permanent, hyper-fast, and highly connected brain.

👋 What is this awesome tool? (For Humans)

Imagine you are working with an AI coding assistant (like Claude Code, Gemini CLI, Cursor, or Codex). Every time you restart a session, the AI forgets everything. It forgets your architectural decisions, custom styling rules, how you solved that tricky database bug, or even your coding preferences. You have to explain it all over again, or feed the AI huge files, which wastes your time and burns through your token budget (costing you real money).

Turbo Quant Memory solves this once and for all. It is a local-first Model Context Protocol (MCP) server that gives your AI agents a persistent brain. It stores:

🎯 Decisions & Lessons: Why things were built this way, so the AI doesn't break them.
💡 Patterns & Gotchas: Reusable tricks and hard-won bug fixes.
🕸️ Knowledge Graph Relations: Structured associations linking memory notes, source files, tasks, or bugs.
📦 Codebase Index: Compact Markdown block search so the AI understands your project structure instantly.

💰 Cost-Saving Magic

Instead of reading massive files every time, your AI agent uses Compact Retrieval to query its memory and fetch only highly-relevant 600-token summaries.

Metric	Value	Benefit for You
Context Savings	📉 ~83.79% fewer bytes	Reduced API costs, longer context windows
Search Latency	⚡ ~400 ms	Fast enough as the default retrieval path (incl. CPU query embedding)
Architectural Focus	🎯 Dynamic Pruning	AI sees only what matters, ignoring session noise
Linked Knowledge	🕸️ Knowledge Graph	AI understands relationships between code, tasks, and decisions
Self-Cleaning Graph	🔄 Dynamic Lifecycle	Stale relationships are deprecated or unlinked automatically

Related MCP server: Archivist MCP

🚀 DON'T INSTALL THIS MANUALLY! (Let the AI Do It)

You don't need to type commands in the terminal or configure JSON files. Let your AI assistant handle the setup!

Simply copy the link to this repository: https://github.com/Lexus2016/turbo_quant_memory

And send this exact prompt to your AI assistant (Claude Code, Gemini CLI, Codex, etc.):

"Hey! Please install and configure the Turbo Quant Memory server for my workspace using this repository: https://github.com/Lexus2016/turbo_quant_memory. Read the README.md, follow the 'Instructions for AI Agents' at the bottom of the file to install it via uv tool, register the tqmemory MCP server, run health checks, index this project, and set up our persistent memory. Let me know when you're ready!"

Your AI agent will automatically clone, install, register, and index everything for you!

🛠️ Quick Start (If You Really Want to Do It Yourself)

If you prefer the manual way, run this 60-second flow:

Install the CLI Tool:

uv tool install git+https://github.com/Lexus2016/turbo_quant_memory@v0.18.0

Add tqmemory MCP Server to your client:

# Codex
codex mcp add tqmemory -- turbo-memory-mcp serve

# Gemini CLI
gemini mcp add tqmemory turbo-memory-mcp serve

# Claude Code (Project scope)
claude mcp add --scope project tqmemory -- turbo-memory-mcp serve

Restart your client and let the magic begin!

For custom integrations (Cursor, OpenCode, Antigravity, etc.), see CLIENT_INTEGRATIONS.md.

🌟 Advanced Features (Under the Hood)

1. Hybrid BM25 + Vector Search (vector-first gated)

Every query searches a dense-vector space (semantic meaning) and a BM25 full-text index (exact terms like function names, file paths, or IDs). The dense lane leads: when its top hit is confident, it is returned directly; otherwise the BM25 lane is fused in via Reciprocal Rank Fusion (RRF, k=60) as a down-weighted rescue. This vector-first gating measurably beat plain equal-weight RRF on real multilingual corpora (it stops a noisy keyword lane from dragging a confident semantic hit down). If a lane fails, search degrades gracefully to vector-only.

2. Knowledge Graph Relations

You can build associations between notes, source files, issues, or tasks using directed relations. The memory server automatically enriches search and hydration results with these relations, letting AI agents browse associated context effortlessly.

🔄 Dynamic Relation Lifecycle (Core Strength):

Aging & Syncing: Relations are created with a created_at timestamp and dynamically inherit entity state. If a linked note grows stale and is deprecated via deprecate_note(), the entire connected graph path is smartly flagged as outdated for AI agents.
Flexible Decoupling (Unlinking): Any relation can be easily severed using the unlink_entities() tool. This gives the agent memory absolute flexibility to adapt to refactorings and design changes.
Auto-Diagnostics: When calling lint_knowledge_base(), the system automatically runs integrity checks on the graph, pinpointing "orphan" relations and helping prevent stale-context build-up.

📊 Visual Memory Architecture:

graph TD
    A[AI Agent / Query] -->|1. semantic_search| B[tqmemory Server]
    B -->|2. Vector Index| C[Dense Vector Search]
    B -->|2. Full-Text Index| D[BM25 FTS Search]
    C -->|3. RRF Fusion| E[Knowledge Candidates]
    D -->|3. RRF Fusion| E
    E -->|4. Graph Enrichment| F[Knowledge Graph / Associations]
    F -->|5. Enriched Context| A
    
    subgraph Relation Lifecycle
        G[Create Link: link_entities] -->|Knowledge Evolution| H[Deprecate Note: deprecate_note]
        H -->|Diagnosis: lint_knowledge_base| I[Sever Link: unlink_entities]
    end

3. Tiered Memory Architecture

Memory notes are separated into logical tiers:

durable: Decisions, architectural patterns, lessons.
episodic: Session handoffs, daily progress.
reference: Markdown blocks, file references.

Default searches return only durable + reference so session noise never drowns out critical architectural decisions!

4. Lightweight ONNX backend (low-RAM, opt-in)

By default the embedder runs on PyTorch. On a small machine (e.g. ~2 GB RAM) you can run the same multilingual model through ONNX Runtime instead — dropping the heavy PyTorch footprint for a much smaller resident size, with no quality change and no reindex (the embeddings stay compatible).

pip install 'turbo-memory-mcp[onnx]'
export TQMEMORY_EMBEDDING_BACKEND=fastembed   # default: sentence-transformers

The default install is unchanged — this is purely opt-in.

5. User-Flagged Memory (provenance)

Every note records who created it: human-explicit when you explicitly ask the agent to remember something ("remember this", "save this to my knowledge base"), or agent when the agent saves a lesson/decision on its own. Human-flagged notes are trusted more — they rank above agent-written notes of equal relevance (a deterministic tie-breaker plus a small score bonus). The field is optional and backward compatible: existing notes simply read as agent, so no migration is needed.

6. Full-text search language (multilingual, opt-in)

The BM25 full-text lane tokenizes on Unicode word boundaries with lower-casing and accent folding, so Ukrainian, Russian and other non-English exact terms already match (case- and accent-insensitive) out of the box — Cyrillic is never mangled. What one index cannot do is stem more than one language at once. The default stems English; a Cyrillic-dominant deployment can switch the stemmer:

export TQMEMORY_FTS_LANGUAGE=Russian   # default: English

Russian stemming additionally matches inflected Cyrillic forms (документ ↔ документами, plus many shared Ukrainian suffixes) — at the cost of English stemming, since LanceDB applies one stemmer per index. Ukrainian has no dedicated Snowball stemmer, so Russian is the closest option; an unsupported value safely falls back to English with a warning. The change takes effect after the FTS index is rebuilt (a retrieval reset + reindex), like switching the embedding model — and inflected matching is in any case already covered semantically by the dense vector lane.

🔐 Secrets Vault (NEW in v0.7.0)

Tired of pasting SSH keys, DB connection strings, or API tokens into every new chat session? The secrets vault solves that — without you giving up an inch of control over your data.

Why this exists

Agents kept asking you for the same prod-DB DSN, the same staging SSH host, the same bearer token, every session. Project memory wasn't the right home for those (anything indexed is at risk of leaking back into search results). So Phase 9 adds a separate, encrypted, strictly project-scoped vault next to your notes.

What changes in your install

Four new MCP tools: set_secret, get_secret, list_secrets, delete_secret. Tool count grew 14 → 18 (now 19 with the v0.12.0 recent_context bootstrap tool).
A one-time migration provisions an empty secrets/ directory under each existing project on first turbo-memory-mcp migrate --apply after upgrade.

What does NOT change (read this if you're nervous)

Your existing notes, markdown index, semantic_search, hydrate, and lint_knowledge_base behave byte-identically. The upgrade does not touch them.
The vault is opt-in. If you never call set_secret, the only thing on disk is an empty 28-byte encrypted blob per project. Zero impact.
If you remove the feature mentally, you can ignore the four new tools forever and nothing breaks.

Where your secrets live (and where they don't)

On your machine, encrypted at rest: ~/.turbo-quant-memory/projects/<project_id>/secrets/vault.tqv, AES-256-GCM, per-project master key.
Never anywhere else: the src/ tree of this package contains zero outbound HTTP code — no requests, no httpx, no urllib.request, no raw sockets. We have nothing to send your secrets to, even if we wanted to. (Verify with grep -rE 'requests|httpx|urllib\.request|aiohttp' src/ — clean.)
Never in your retrieval index: the ingestion walker and the lint walker hard-refuse to traverse any secrets/ subdirectory. semantic_search cannot reach the vault by design.
Never in agent transcripts (when used right): get_secret returns the value in a dedicated secret_value field, separate from any descriptive text. Agents are instructed to pass it through programmatically, not echo it.

How to use it

One-time master-key setup (pick one path):
```
# macOS (auto-uses Keychain after first set_secret if you skip this step):
keyring set turbo-quant-memory secrets-master-<project_id> <32-byte-base64>

# Headless / Linux / CI / Docker:
export TQMEMORY_SECRETS_PASSPHRASE='your-long-passphrase'   # add to shell rc
```
⚠️ TQMEMORY_SECRETS_PASSPHRASE is a passphrase, not the raw key. It is run through Argon2id to derive the master key. Do not paste the keyring base64 value into this env var — that derives a different key and a vault created via the keyring will fail to decrypt with a master_key_mismatch error. Pick one path: keyring or passphrase, and if you share one daemon across MCP clients, set the same passphrase on all of them or on none. The env var always wins over the keyring when both are set.
Save a secret once, reuse forever — two paths, picked by whether the value is already in the chat:
- Value NOT yet in the chat — use the CLI (prophylactic path):
  turbo-memory-mcp secret-set prod-db-dsn # prompts: Value for 'prod-db-dsn' (input hidden): ******
  The value is read via getpass — it never enters shell history, scrollback, or any chat transcript. Recommended when you're about to provision a fresh credential and want to keep it out of the conversation entirely.
- Value already in the chat — let the agent write it (reactive path):
  set_secret("prod-db-dsn", "postgresql://user:pass@host:5432/db")
  Use this whenever the value is already visible: you pasted it, or the agent generated it inside the conversation. The agent resolves the active project_id deterministically from cwd — better than asking the user to retype the value in a terminal where their cwd may not match the intended project. Once exposure has happened in chat, the CLI offers no additional secrecy; set_secret is the safer write path.

Agents fetch on demand:

get_secret("prod-db-dsn") → {"status": "ok", "secret_value": "postgresql://..."}

Threat model — what we protect, what we don't

We protect against (the realistic single-developer threats):

Accidental backup leaks (Time Machine, rsync, iCloud Desktop sync of plaintext files).
Share-screen / screenshot mishaps showing a stored credential.
Accidental git add of the wrong file under your home directory.

We do not protect against (and never claim to):

A compromised root user on your laptop.
A live attacker who has already taken over the running daemon process.
Hardware-level attacks, evil-maid attacks, cold-boot attacks.

If your threat model is bigger than ours, use a dedicated secret manager (1Password, HashiCorp Vault) and keep tqmemory for knowledge only.

FAQ

"Will my existing memory be affected?" No. Notes, markdown, search, hydrate, lint — all unchanged.
"Do I need to do anything to upgrade?" Only if you want to start using set_secret. Otherwise, ignore.
"Where exactly are secrets stored?" Per project, under ~/.turbo-quant-memory/projects/<project_id>/secrets/ (encrypted blob + minimal metadata + an audit log of access timestamps that never contains the values).
"Can my secrets leak into semantic_search?" No. Hard isolation enforced at the ingester, the linter, and verified by a regression test that plants a sentinel and proves search returns zero hits.
"What if I want this feature off?" It already is — the vault is empty until you populate it. There is nothing running, nothing transmitted, nothing scheduled.
"What if I delete a project?" Its vault, audit log, and per-project master key entry are all under that project's directory and go with it.

🤖 Instructions for AI Agents (System Directive)

IMPORTANT

ATTENTION AI AGENT (Claude, Gemini, Codex, or Antigravity): You are equipped with the tqmemory MCP server. This is your long-term memory. Read and obey these instructions to ensure seamless integration and zero-friction memory retrieval.

1. Pre-Flight Ritual (Every Session Start)

Check Migrations & Health: Call health() and server_info(). If migrations_pending (in health) or migrations.pending (in server_info) is true, surface the migrations_hint verbatim to the user so they can run the CLI upgrade. Do not run migrate --apply yourself as it requires closing active MCP clients.
Resume Where You Left Off: At session start — or right after a context compaction — call recent_context() first. It is a query-free bootstrap that returns your most recently updated notes (newest first), including session handoff notes that a plain semantic_search hides by default. This is the reliable "where did I leave off" entry point when you do not yet know what to query.
Retrieve Context: For a specific task, run semantic_search(query="<task_topic>", scope="hybrid") to retrieve existing architectural decisions, styling rules, lessons, or guidelines. To recover a session handoff by query, pass tier_filter=["episodic"].

2. Memory Writing Discipline

When you learn something important, solve a complex bug, or make an architectural decision, immediately save it using remember_note(). Do not wait until the end of the session.

Lessons (kind="lesson"): Store hard-won patterns or bug fixes.
Decisions (kind="decision"): Store structural choices or tooling decisions.
Patterns (kind="pattern"): Store reusable templates or styling conventions.
Handoffs (kind="handoff"): Store episodic progress before pausing or ending a session (automatically assigned to episodic tier).

User-flagged notes: when the USER explicitly asks to remember something ("remember this", "save this to my knowledge base"), call remember_note(..., provenance="human-explicit"). Notes you write on your own initiative keep the default provenance="agent" — human-flagged notes rank above agent-written ones of equal relevance.

Format your notes in English. Keep them concise, technical, and actionable. Add semantic tags.

3. Knowledge Graph Linking

Use relations to connect ideas, files, and tasks:

When saving a note about a bug fix in src/auth.py, call link_entities(source="note:[note_id]", target="file:///absolute/path/to/src/auth.py", relation_type="fixes").
Link related notes: link_entities(source="note:[new_note]", target="note:[old_note]", relation_type="supersedes").
Link files to tasks: link_entities(source="file:///path/to/file", target="task:[task_id]", relation_type="implements").

4. Zero Ambiguity & Memory Updates

Deprecate Outdated Notes: When a pattern or decision changes, write the new note, then call deprecate_note() on the old one to avoid search pollution.
No Smoke Notes: Do not write temporary or smoke test notes.
Provenance: Always preserve file paths and line numbers in your memory payloads.

5. Secrets Vault (v0.7.0+)

Discover, don't guess: Find the right get_secret(name) call by semantic_search for a pattern-kind recipe note that documents the credential. Never fish names from chat history.
Fetch through the dedicated field: get_secret("name") returns the value in secret_value. Pass it programmatically (env var injection, subprocess argument). Do NOT echo it into summaries, logs, or remember_note.
Write what the user gives you: if the user pastes a credential into the chat (or you generated one inside the conversation), just call set_secret(name, value). You know the exact active project_id from cwd resolution; the user running the CLI from terminal may not. Do NOT push the user back to the CLI just to redo what is already done — the exposure happened when they typed it; friction won't undo it.
Reach for the CLI only as prevention: if the user is ABOUT to share a credential but hasn't pasted yet, then suggest turbo-memory-mcp secret-set NAME from a terminal — getpass keeps the value out of the chat in the first place. After the value is already in chat, CLI is friction with no payoff.
Surface master_key_unavailable errors verbatim: the response carries a setup_hint field with the exact export / keyring set commands the user needs. Print it, then stop — do not try to invent keys.

🛰️ Platform-Specific: Hermes Agent

Hermes runs MCP servers via a systemd-managed gateway — a different setup from Claude Code or Cursor.

Installation

uv tool install turbo-quant-memory

Add to ~/.hermes/config.yaml:

mcp_servers:
  tqmemory:
    command: turbo-memory-mcp
    args: ["serve"]
    enabled: true

Restart the gateway:

systemctl --user restart hermes-gateway

Troubleshooting MCP Timeouts

If MCP tools timeout with "MCP call timed out after 120.0s", the daemon lock is likely stale from a previous crash or host sleep. Recovery:

# 1. Kill all daemon processes
pkill -f turbo-memory-mcp

# 2. Remove stale lock file
rm -f ~/.turbo-quant-memory/.daemon.lock

# 3. Check and apply pending migrations
turbo-memory-mcp migrate --status
turbo-memory-mcp migrate --apply

# 4. Quick health check
turbo-memory-mcp doctor

# 5. Restart gateway
systemctl --user restart hermes-gateway

# 6. Wait 30-60s for MCP reconnect

Auto-Migration on Startup

Set TQMEMORY_MIGRATE_ON_STARTUP=1 in the environment to have the server automatically apply pending schema migrations (with a rolling snapshot) when it starts as primary or standalone:

mcp_servers:
  tqmemory:
    command: turbo-memory-mcp
    args: ["serve"]
    enabled: true
    env:
      TQMEMORY_MIGRATE_ON_STARTUP: "1"

Auto-migration result is visible in the health() response under migration_auto_result.

Common Hermes Issues

Symptom	Cause	Fix
MCP timeout	Stale `.daemon.lock`	`rm -f ~/.turbo-quant-memory/.daemon.lock`
Multiple daemons	Crash left orphans	`pkill -f turbo-memory-mcp`
Tools return errors	Pending schema migration	`turbo-memory-mcp migrate --apply`
Gateway won't load MCP	Config syntax error	Validate `config.yaml`
Silent startup failure	No visibility into daemon role	Check stderr: `[tqmemory] role=...`

🌍 Language Versions

This documentation is maintained in three synchronized languages:

This server cannot be installed

license - not found

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

3dRelease cycle

25Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Lexus2016/turbo_quant_memory'

If you have feedback or need assistance with the MCP directory API, please join our Discord server