Skip to main content
Glama

🧠 CORTEX Memory MCP

Persistent semantic memory for AI agents. TypeScript Β· LangGraph.js Β· Qdrant Β· fastembed ONNX Β· Ollama β€” 100% local, zero mandatory cloud.

TypeScript LangGraph.js Qdrant MCP License


What is CORTEX?

CORTEX is a Model Context Protocol (MCP) server that gives AI agents a persistent, semantically searchable long-term memory. Unlike simple key-value stores, CORTEX understands what information is important, how memories relate to each other, and which memories are becoming stale over time.

Built for agents running in CPU-only environments β€” no GPU required, no cloud dependencies.


Related MCP server: RecallNest

Architecture β€” 3 Memory Layers

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    CORTEX v3.2                          β”‚
β”‚                                                         β”‚
β”‚  β‘  WORKING MEMORY    temp_memories (Qdrant)             β”‚
β”‚     └─ quick_observe β†’ instant write, no LLM           β”‚
β”‚                                                         β”‚
β”‚  β‘‘ SEMANTIC MEMORY   cortex_<project> (Qdrant)          β”‚
β”‚     └─ dense (all-MiniLM-L6-v2) + sparse (SPLADE)      β”‚
β”‚     └─ scored by qwen3 Β· linked Β· decay-weighted        β”‚
β”‚                                                         β”‚
β”‚  β‘’ EPISODIC MEMORY   cortex_episodes_<project> (Qdrant) β”‚
β”‚     └─ sessions with timestamped events, no LLM        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Features

Feature

Details

LLM scoring on ingest

qwen3 assigns importance (1-10), type, and tags to every memory

Hybrid search

Dense (all-MiniLM-L6-v2) + Sparse (SPLADE_PP_en_v1) via Qdrant RRF fusion

Cross-encoder reranking

Single qwen3 call evaluates all (query, candidate) pairs in batch

Dual decay engine

Bayesian (DECISION/FACT/ERROR) + FSRS-inspired (PREFERENCE/CONTEXT)

Contradiction detection

Auto-marks superseded memories on ingest

Episodic sessions

Zero-LLM session tracking with typed events

Operator Profile

Persistent coding preferences and work patterns

20 MCP tools

Complete CRUD + search + analytics surface

100% local

fastembed ONNX for embeddings, Ollama for LLM β€” no API keys needed


Competitive Landscape (June 2026)

CORTEX

mem0

Graphiti

Basic Memory

MCP Official

LLM scoring on ingest

βœ…

❌

❌

❌

❌

Hybrid search (dense+sparse)

βœ…

⚠️ cloud

❌

❌

❌

Cross-encoder reranking

βœ…

⚠️ cloud

❌

❌

❌

Episodic layer (no LLM)

βœ…

❌

❌

❌

❌

Dual decay engine

βœ…

βœ…

βœ… bi-temporal

❌

❌

Contradiction detection

βœ…

❌

❌

❌

❌

TypeScript + LangGraph.js

βœ…

❌ Python

❌ Python

❌ Python

βœ…

100% local

βœ…

⚠️ MCP=cloud

βœ…

βœ…

βœ…


Prerequisites

# 1. Qdrant (Docker)
docker run -d -p 6333:6333 --name qdrant qdrant/qdrant

# 2. Ollama + models
ollama pull qwen3:8b      # scoring, tagging, reranking, consolidation
# optional faster alternative:
# ollama pull qwen3:1.7b  # 3-5Γ— faster on CPU, slightly lower accuracy

fastembed (embeddings) is bundled as an npm dependency β€” no separate installation needed. Models download automatically to .fastembed_cache/ on first use (~22 MB dense, ~110 MB sparse).


Installation

git clone git@github.com:alainrc2005/cortex_memory_mcp.git
cd cortex_memory_mcp
npm install
npm run build

Environment Configuration

Create a .env file in the project root:

QDRANT_URL=http://localhost:6333
FASTEMBED_CACHE_DIR=/absolute/path/to/cortex_memory_mcp/.fastembed_cache

# Optional β€” only needed if Qdrant has auth enabled
# QDRANT_API_KEY=your_key

# Optional β€” defaults to http://localhost:11434
# OLLAMA_URL=http://localhost:11434

MCP Configuration

Add to your MCP client config (e.g. ~/.gemini/config/mcp_config.json):

{
  "mcpServers": {
    "langgraph-memory-mcp": {
      "command": "/absolute/path/to/cortex_memory_mcp/cortex-mcp.sh"
    }
  }
}

Tool Reference β€” 20 Tools

Semantic Memory (17 tools)

Tool

Description

Trigger

observe

Store a memory through the full pipeline: score β†’ embed β†’ link β†’ persist

Manual / session close

recall

Hybrid BM25+dense search with LLM cross-encoder reranking

On demand

get_context_for

RAG-style context injection for a project + message

Auto (cold start)

consolidate

Merge duplicate/similar memories using LLM

End of long session

detect_patterns

Extract operator behavior patterns, update Operator Profile

Periodic

get_operator_profile

Read coding preferences and detected patterns

Auto (cold start)

cortex_status

System health: collections, engram counts, pending buffer

Diagnostic

delete_memory

Delete a single engram by ID

On demand

update_memory

Update engram content, recalculate embedding + score

On demand

get_all_memories

List all engrams for a project sorted by decay score

Audit

delete_all_memories

⚠️ Irreversible reset of a project (requires confirm: true)

Explicit only

batch_observe

Store up to 20 memories in one call

Bulk import

export_memories

Export project as JSON (backup/migration)

On demand

quick_observe

Write to working buffer instantly β€” no LLM, no embedding

Auto (post-turn hook)

list_pending

View working buffer contents by project

Diagnostic

index_temp

Promote buffer β†’ semantic memory with ONNX embedding + LLM scoring

Auto (next cold start)

recall_hybrid

Search both buffer (keyword) and indexed memories (semantic) simultaneously

On demand

Episodic Memory (3 tools)

Tool

Description

start_session

Open an episodic session for a project. Auto-closes any previous open session. Returns sessionId.

log_event

Record a typed event in the active session. Types: DECISION ERROR SOLUTION INSIGHT CONTEXT_CHANGE

recall_sessions

Semantic search over past session summaries using fastembed ONNX


Indexing Pipeline

quick_observe(content)
      β”‚
      β–Ό  (instant, no LLM, no embedding)
temp_memories  ◄──── working buffer (Qdrant, dummy vectors)
      β”‚
      β”‚  index_temp() β€” called at next session cold start
      β–Ό
  fastembed ONNX
  β”œβ”€ AllMiniLML6V2   β†’ dense vector 384d
  └─ SpladePPEnV1    β†’ sparse BM25 vector
      β”‚
  qwen3 (Ollama)
  β”œβ”€ importance: 1-10
  β”œβ”€ type: DECISION | FACT | ERROR | PATTERN | PREFERENCE | CONTEXT
  └─ tags: [keyword, ...]
      β”‚
  Contradiction detection (qwen3)
  └─ marks superseded memories if similarity > 0.88
      β”‚
      β–Ό
  Qdrant upsert (dense + sparse vectors)
  └─ bidirectional links to related engrams

Decay Engine

CORTEX uses a dual decay model tuned per memory type:

Bayesian (DECISION Β· FACT Β· ERROR Β· PATTERN)

utility = alpha / (alpha + beta)
alpha += importance on access
beta  += 1 per day without access

FSRS-inspired (PREFERENCE Β· CONTEXT)

stability     = log1p(accessCount) Γ— (importance / 5)
retrievability = exp(-daysSinceAccess / stability)

Memories accessed frequently become more stable. Stale, unaccessed memories decay toward zero and eventually become candidates for consolidation.


Project Structure

src/
β”œβ”€β”€ server.ts                    # MCP server β€” 20 tools, ~1500 LOC
β”œβ”€β”€ bootstrap.ts                 # Qdrant collection init on startup
β”œβ”€β”€ graph/
β”‚   β”œβ”€β”€ observe/
β”‚   β”‚   β”œβ”€β”€ workflow.ts          # LangGraph pipeline: scoreβ†’embedβ†’linkβ†’persist
β”‚   β”‚   β”œβ”€β”€ state.ts             # Graph state types
β”‚   β”‚   └── nodes/
β”‚   β”‚       β”œβ”€β”€ score.ts         # qwen3: importance + type + tags
β”‚   β”‚       β”œβ”€β”€ embed.ts         # fastembed ONNX: dense + sparse vectors
β”‚   β”‚       β”œβ”€β”€ link.ts          # Bidirectional links in Qdrant
β”‚   β”‚       └── persist.ts       # Final upsert
β”‚   └── consolidate/
β”‚       └── nodes.ts             # LLM merge of duplicate engrams
β”œβ”€β”€ services/
β”‚   β”œβ”€β”€ qdrant.ts                # Qdrant client, collections, hybrid search
β”‚   β”œβ”€β”€ fastembed.ts             # ONNX embeddings: dense (AllMiniLM) + sparse (SPLADE)
β”‚   β”œβ”€β”€ ollama.ts                # qwen3: scoring, reranking, contradiction detection
β”‚   β”œβ”€β”€ decay.ts                 # Dual decay engine: Bayesian + FSRS-inspired
β”‚   └── episode.ts               # Episodic session management
└── types/
    β”œβ”€β”€ engrama.ts               # Engram TypeScript types
    └── episode.ts               # Episode/event types

Running Tests

npm test
# Covers all 20 tools with valid, invalid, and connectivity test cases

Memory Lifecycle

Session N (active)
    Agent detects storable fact
         ↓
    quick_observe()  ← instant, no CPU cost
         ↓
    Written to temp_memories

Session N+1 cold start
    get_context_for() + get_operator_profile()  ← parallel
         ↓
    Pending in temp_memories? β†’ index_temp()
         ↓
    fastembed ONNX + qwen3 scoring applied
         ↓
    Promoted to cortex_<project> with full embeddings
         ↓
    Available for recall() and get_context_for()

License

MIT Β© 2026 β€” Built by Zeus with Antigravity

F
license - not found
-
quality - not tested
C
maintenance

Maintenance

–Maintainers
–Response time
–Release cycle
–Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/alainrc2005/cortex_memory_mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server