Skip to main content
Glama

SMRITI Memory

A neuro-inspired long-term memory architecture for AI agents.

SMRITI combines a capacity-bounded Working Memory, a graph-based Semantic Palace, and asynchronous background consolidation to give LLM agents persistent, scalable memory β€” without blocking real-time interactions.

πŸ“„ Paper: SMRITI: A Scalable, Neuro-Inspired Architecture for Long-Term Event Memory in LLM Agents β€” Shivam Tyagi, 2025 β€” DOI: 10.13140/RG.2.2.25477.82407

PyPI Python 3.9+ License: MIT


Architecture

                           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                           β”‚    Asynchronous Consolidation   β”‚
                           β”‚      (8 Background Processes)   β”‚
                           β”‚  β€’ Chunking      β€’ Cross-Ref.   β”‚
                           β”‚  β€’ Conflict Res. β€’ Skill Ext.   β”‚
                           β”‚  β€’ Forgetting    β€’ Spaced Rep.  β”‚
                           β”‚  β€’ Reflection    β€’ Defragment.  β”‚
                           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                            β”‚ background
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  Input   │──▢│ Attention │──▢│   Episode Buffer    │──▢│ Semantic β”‚
  β”‚  Text    β”‚   β”‚   Gate    β”‚   β”‚  (append-only log)  β”‚   β”‚  Palace  β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚ (salience β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚  Graph   β”‚
                 β”‚  filter)  β”‚                              β”‚ G=(V,E)  β”‚
                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                              β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
                                                                β”‚
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”‚
  β”‚  Query   │──▢│ Retrieval│──▢│  Working Memory   β”‚β—€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  β”‚          β”‚   β”‚  Engine  β”‚   β”‚   (7 Β± 2 slots)   β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚ Q(v) =   β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚ β₁cos +  β”‚
                 β”‚ Ξ²β‚‚decay+ β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                 β”‚ β₃freq + │──▢│    Meta-Memory    β”‚
                 β”‚ Ξ²β‚„sal    β”‚   β”‚ (confidence map)  β”‚
                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Core idea: Inspired by human Dual-Process Theory (Daniel Kahneman's Thinking, Fast and Slow), SMRITI decouples memory operations into two pathways:

  • System 1 (Fast & Heuristic): Real-time ingestion. Routes interactions to the short-term Episode Buffer in milliseconds without blocking the agent.

  • System 2 (Slow & Analytical): Background consolidation. Uses LLM reasoning to chunk, organize, and abstract semantic knowledge asynchronously while the agent is idle.


Related MCP server: AI Long-Term Memory MCP Server

Quick Start β€” Claude, Gemini & Codex (MCP)

SMRITI can be used as a unified, global persistent memory layer across Claude Code, Claude Desktop, Gemini (Antigravity), and Codex (Antigravity-IDE).

Choose one of the two installation methods:

Run the install script directly from your terminal:

bash <(curl -s https://raw.githubusercontent.com/smriti-memcore/smriti-memcore/main/install_smriti_mcp.sh)

Method B: Via PyPI

Install the package and run the setup CLI:

pip3 install smriti-memcore
smriti_install

What the installer does:

  • Creates a dedicated virtual environment at ~/.smriti/venv.

  • Installs smriti-memcore[mcp] and active dependencies.

  • Prompts for your LLM consolidation choice (local Ollama or cloud models) and API keys.

  • Automatically registers the MCP server in Claude Code (~/.claude.json), Claude Desktop, Gemini (~/.gemini/config/mcp_config.json), and Codex (~/.gemini/antigravity-ide/mcp_config.json).

  • Appends global agent rules and configures automatic prompt recall/encode hooks.

Then restart your editor or agent session. You can verify the server connection in Claude Code by running /mcp (the server will show up as smriti).

Available tools (19: 13 native + 6 AMP v1.0 aliases):

Tool

Description

smriti_encode

Store information in long-term memory (private=True keeps it out of team sync)

smriti_recall

Retrieve memories by natural-language query

smriti_get_context

Inject working memory into the current prompt

smriti_how_well_do_i_know

Confidence check on a topic

smriti_knowledge_gaps

List topics SMRITI knows it doesn't know

smriti_pin

Mark a memory as permanent (never decayed)

smriti_forget

Archive a memory

smriti_consolidate

Run a consolidation cycle

smriti_stats

System-wide statistics (includes private/shared memory counts)

smriti_get_suggestions

Proactive insights from background consolidation

smriti_create_private_room

Create a private semantic room β€” memories in it are excluded from team consolidation sync

smriti_open_ui

Launch the visual Memory Browser in the default web browser

smriti_sync_obsidian

Export the Semantic Palace to an Obsidian vault

AMP v1.0 aliases (interoperable with any AMP-conformant agent framework):

AMP Tool

Maps to

amp.encode

smriti_encode β€” with agent_id + force + private params, AMP response schema

amp.recall

smriti_recall β€” returns {results: [{id, content, score, timestamp, status}]}

amp.forget

smriti_forget β€” returns {status: "forgotten" | "not_found"}

amp.stats

smriti_stats β€” returns {memory_count, ...}

amp.pin

smriti_pin β€” returns {status: "pinned" | "not_found"}

amp.consolidate

smriti_consolidate β€” returns {status: "ok", memories_processed: int}

smriti-memcore is single-tenant β€” agent_id is accepted on all AMP verbs but ignored. Isolation is at the storage-path level.

LLM options β€” set during install or via environment variables:

Model

Provider

Requires

mistral (default)

Local Ollama

ollama pull mistral

claude-*

Anthropic

SMRITI_LLM_API_KEY

gpt-*

OpenAI

SMRITI_LLM_API_KEY

gemini*

Google

SMRITI_LLM_API_KEY


Installation (Python Library)

pip install smriti-memcore

With optional FAISS accelerated vector search:

pip install smriti-memcore[faiss]

Or install from source:

git clone https://github.com/smriti-memcore/smriti-memcore.git
cd smriti-memcore
pip install -e .

Prerequisites

SMRITI uses an LLM for reasoning tasks (consolidation, reflection, skill extraction). By default it connects to a local Ollama instance:

ollama pull mistral

Alternatively, you can use OpenAI, Anthropic, or Google Gemini β€” see Using Cloud LLM Providers below.


Using Cloud LLM Providers

SMRITI is provider-agnostic. Just change the llm_model and pass your API key:

from smriti import SMRITI, SmritiConfig

# ── OpenAI ──────────────────────────────────────────────
config = SmritiConfig(
    llm_model="gpt-4o",
    openai_api_key="sk-...",
)

# ── Anthropic ───────────────────────────────────────────
config = SmritiConfig(
    llm_model="claude-3-5-sonnet-20241022",
    anthropic_api_key="sk-ant-...",
)

# ── Google Gemini ───────────────────────────────────────
config = SmritiConfig(
    llm_model="gemini-1.5-flash",
    gemini_api_key="AIza...",
)

# ── Local Ollama (default) ──────────────────────────────
config = SmritiConfig(
    llm_model="mistral",  # or llama3, codellama, phi3, etc.
)

memory = SMRITI(config=config)

Routing is automatic based on the model name prefix: gpt-* β†’ OpenAI, claude* β†’ Anthropic, gemini* β†’ Gemini, everything else β†’ Ollama.


Quick Start

from smriti import SMRITI, SmritiConfig

# Initialize
config = SmritiConfig(
    storage_path="./my_agent_memory",
    llm_model="mistral",
)
memory = SMRITI(config=config)

# Encode information
memory.encode("User prefers Python for backend development.")
memory.encode("User is allergic to shellfish.", context="medical")

# Recall by natural-language query
results = memory.recall("What language does the user prefer?")
for mem in results:
    print(f"  [{mem.strength:.2f}] {mem.content}")

# Check what you know (and don't know)
confidence = memory.how_well_do_i_know("programming languages")
print(f"Confidence: {confidence.overall:.0%}")

# Run background consolidation
memory.consolidate()

# Persist to disk
memory.save()

Framework Integrations

SMRITI can be used natively inside standard agent frameworks.

LangChain

Use SmritiLangChainMemory to replace ConversationBufferMemory. This gives your agent the cost-savings of a capacity-bounded Working Memory while asynchronously archiving the conversation into the Semantic Palace.

from langchain.chains import ConversationChain
from smriti.integrations.langchain_memory import SmritiLangChainMemory
from smriti import SMRITI

# 1. Initialize SMRITI
smriti_engine = SMRITI(storage_path="./langchain_smriti_db")

# 2. Wrap it for LangChain
smriti_memory = SmritiLangChainMemory(smriti_client=smriti_engine, top_k=3)

# 3. Plug it into standard chains
conversation = ConversationChain(
    llm=my_llm,
    memory=smriti_memory,
)

conversation.predict(input="I prefer using PyTorch.")

See examples/langchain_agent.py or examples/quickstart.py for complete working code.

Claude Code (MCP Server)

See Quick Start β€” Claude Code (MCP) above for one-command setup.

Memory Browser UI

SMRITI ships with a native, zero-dependency visualizer for traversing the Semantic Palace graph.

smriti_ui --storage ~/.smriti/global --port 7799

Features:

  • Zero dependencies: Built entirely with Python's standard http.server and D3.js β€” no Node.js/NPM needed.

  • Backwards Compatible: Instantly works with your existing palace.json created by older versions of SMRITI. Just point --storage to your existing directory.

  • Interactive Graph: Navigate the Semantic Palace using a force-directed network view or clustered room topology.

  • Searchable Dashboard: Instantly filter your stored knowledge by content, room, and system state.

  • Real-time Statistics: Track average memory strength, composite salience, and architectural distribution.

(If using without pip installation, run python -m smriti_memcore.ui from the source root).

Obsidian Vault Integration

Export the Semantic Palace to an Obsidian vault so its graph view mirrors your memory graph.

How it maps:

Semantic Palace

Obsidian

Room

Palace/<topic-slug>.md note

Memory

Section inside room note (with strength/salience metadata)

Room ↔ Room edge

[[wikilink]] between room notes

Palace/_index.md

Overview table of all rooms and connections

Via MCP tool (Claude Code): After setting SMRITI_OBSIDIAN_PATH in your MCP server config, call the tool directly β€” no Bash needed:

smriti_sync_obsidian()
# or with an explicit path:
smriti_sync_obsidian(vault_path="~/path/to/your-vault/Palace")

Add to your MCP server env in ~/.claude.json:

"SMRITI_OBSIDIAN_PATH": "~/path/to/your-vault/Palace"

Via CLI (non-MCP / scripting):

smriti_palace_to_obsidian --vault ~/path/to/your-vault/Palace

Workflow: Re-run after each smriti_consolidate call to keep the vault in sync with updated rooms and connections. The Palace/ folder is fully regenerated each run β€” do not edit those files manually.

(If using without pip installation, run python -m smriti_memcore.palace_to_obsidian from the source root).


Key API

Method

Description

encode(content, context, source)

Ingest new information through the Attention Gate

recall(query, top_k)

Retrieve relevant memories via graph traversal

how_well_do_i_know(topic)

Meta-memory confidence check

consolidate(depth)

Run background consolidation ("full", "light", "defer")

save()

Persist all state to disk

pin(memory_id)

Mark a memory as permanent

forget(memory_id)

Gracefully forget a memory (leaves a tombstone)

stats()

System-wide statistics


Configuration

All parameters are optional and have sensible defaults:

from smriti import SmritiConfig

config = SmritiConfig(
    # Working Memory
    working_memory_slots=7,          # Miller's Law: 7 Β± 2

    # Retrieval scoring weights
    recency_weight=0.2,
    relevance_weight=0.4,
    strength_weight=0.2,
    salience_weight=0.2,

    # Forgetting
    decay_rate=0.99,                 # per-day temporal decay
    strength_hard_threshold=0.05,    # below this β†’ forget

    # Palace graph
    room_merge_threshold=0.85,       # similarity to auto-merge rooms

    # LLM provider (pick one)
    llm_model="mistral",                     # Ollama (default)
    # llm_model="gpt-4o",                    # OpenAI
    # llm_model="claude-3-5-sonnet-20241022",# Anthropic
    # llm_model="gemini-1.5-flash",          # Google
    ollama_base_url="http://localhost:11434",

    # Storage
    storage_path="./smriti_data",
)

What's New in v1.3.0

  • Private rooms β€” smriti_create_private_room(topic) creates a semantic room whose memories are excluded from team consolidation sync

  • private=True on encode β€” smriti_encode and amp.encode now accept private=True; Claude uses this when you say "remember this privately"

  • Visibility field on memories and rooms β€” "private" | "shared"; default is "shared". Private memories are still recalled by the owner β€” privacy only controls team sync eligibility

  • AMP spec updated β€” visibility field added to MemoryResult, private param added to amp.encode, visibility filter added to amp.recall filters schema

  • palace.json schema v3 β€” automatic migration; all existing memories and rooms default to "shared", and embeddings are stripped on save to reduce on-disk storage size by ~10x

  • Encoding discipline guidance β€” baked directly into MCP server instructions and tool docstrings to guide consumer LLMs to label hypotheses, cite evidence, and prune stale/wrong memories

What's New in v1.2.0

  • AMP v1.0 Full conformance β€” MCP server now exposes all 6 AMP verbs (amp.encode, amp.recall, amp.forget, amp.stats, amp.pin, amp.consolidate) alongside the existing smriti_* tools. Passes all 25 AMP compliance tests (Core + Full).

  • Zero breaking changes β€” all existing smriti_* tool calls continue to work unchanged. AMP tools are additive aliases.

What's New in v1.0.0

  • Consolidation robustness overhaul β€” fixed a critical bug where singleton episodes leaked in the buffer indefinitely, causing consolidation to report "no significant memories" even when important facts were present

  • Smarter salience scoring β€” the heuristic scorer now differentiates content types (personal facts, knowledge updates, instructions) instead of scoring everything the same

  • Better contradiction detection β€” Mistral no longer incorrectly discards memories that agree with existing ones

  • Validated across 4 models β€” benchmarked with gpt-4o-mini, Mistral 7B, CodeLlama 7B, and Llama 3.2 3B

See CHANGELOG.md for full details.


Benchmarks

LoCoMo (Multi-System Comparison)

SMRITI was benchmarked against four baseline architectures on the LoCoMo long-sequence dataset (28 dialog turns, 15 evaluation questions, consolidation enabled):

System

F1 Score

Latency

Tokens/Query

Consolidation

FullContext

0.345

1147ms

550

β€”

MemGPT-style

0.334

1397ms

478

β€”

NaiveRAG

0.312

1387ms

145

β€”

SMRITI v2

0.279

1317ms

146

41.2s (async)

Mem0-style

0.235

1088ms

106

β€”

Results with GPT-4o-mini. SMRITI consolidation runs asynchronously and does not block queries.

Local Model Comparison (v1.0.0)

All runs use the fixed consolidation pipeline with heuristic scoring:

Model

F1 Score

Exact Match

Latency

Best Category

CodeLlama 7B

0.317

0.200

5634ms

Temporal (0.682)

Mistral 7B

0.284

0.067

3181ms

Knowledge Update (0.516)

gpt-4o-mini

0.262

0.000

1271ms

Single-hop (0.350)

Llama 3.2 3B

0.184

0.067

1446ms

Multi-hop (0.134)

Key finding: CodeLlama 7B outperforms all models on temporal reasoning (F1=0.682) and achieves the highest exact-match rate (20%). Mistral 7B remains the best all-rounder with strong knowledge-update handling.

LongMemEval (Long-Term Interactive Memory)

SMRITI integrates an evaluation harness for the LongMemEval benchmark to test retrieval over 50+ chat sessions:

System Configuration

Exact Match Accuracy

Average Query Latency

Baseline (Full Context)

100.0%

11.98s

SMRITI Dual-Process

80.0%

0.98s

SMRITI restricts the LLM context to the 5 most relevant memories, resulting in a >12Γ— latency reduction compared to context-stuffing.

Vector Search Backend

SMRITI supports two vector search backends. FAISS is auto-detected when installed:

Backend

1K vectors

10K vectors

100K vectors

Memory (100K)

NumPy

22 Β΅s

179 Β΅s

2.75 ms

146.5 MB

FAISS

28 Β΅s

200 Β΅s

2.24 ms

979 B

At scale, FAISS is 1.2Γ— faster with 150,000Γ— less memory.

Reproducing Benchmarks

pip install -e ".[benchmarks]"

# Multi-system comparison (requires API key)
python benchmarks/run_benchmark.py --model gpt-4o-mini --systems smriti --consolidate --dataset locomo

# Local model comparison (requires Ollama)
python benchmarks/run_benchmark.py --model mistral --systems smriti --consolidate --dataset locomo
python benchmarks/run_benchmark.py --model codellama --systems smriti --consolidate --dataset locomo

# Vector backend comparison
python benchmarks/vector_benchmark.py

Project Structure

smriti-memcore/
β”œβ”€β”€ smriti/                 # Core library
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ core.py            # SMRITI orchestrator
β”‚   β”œβ”€β”€ models.py          # Data models & SmritiConfig
β”‚   β”œβ”€β”€ palace.py          # Semantic Palace graph
β”‚   β”œβ”€β”€ episode_buffer.py  # Append-only temporal log
β”‚   β”œβ”€β”€ working_memory.py  # Capacity-bounded priority queue
β”‚   β”œβ”€β”€ attention_gate.py  # Salience filter
β”‚   β”œβ”€β”€ retrieval.py       # Multi-factor retrieval engine
β”‚   β”œβ”€β”€ consolidation.py   # Async background processes
β”‚   β”œβ”€β”€ meta_memory.py     # Confidence mapping
β”‚   β”œβ”€β”€ vector_store.py    # Vector persistence
β”‚   β”œβ”€β”€ llm_interface.py   # Multi-provider LLM connector (Ollama/OpenAI/Anthropic/Gemini)
β”‚   β”œβ”€β”€ metrics.py         # Observability: counters, gauges, histograms, Prometheus export
β”‚   └── integrations/      # Framework adapters
β”‚       β”œβ”€β”€ langchain_memory.py  # LangChain BaseMemory component
β”‚       └── mcp_server.py        # Claude Code MCP server (19 tools: 13 smriti_* + 6 AMP aliases)
β”œβ”€β”€ install_smriti_mcp.sh   # One-command Claude Code setup
β”œβ”€β”€ tests/                 # 246 tests across 15 files
β”œβ”€β”€ baselines/             # Baseline implementations for comparison
β”œβ”€β”€ benchmarks/            # Benchmark harness & scripts
β”œβ”€β”€ examples/              # Usage examples
β”œβ”€β”€ paper/                 # IEEE research paper (LaTeX + Markdown)
β”‚   └── figures/           # Benchmark charts and UI diagrams
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ CHANGELOG.md
β”œβ”€β”€ LICENSE
└── README.md

Citation

If you use SMRITI in your research, please cite:

@article{tyagi2025smriti,
  title={SMRITI: A Scalable, Neuro-Inspired Architecture for Long-Term Event Memory in LLM Agents},
  author={Tyagi, Shivam},
  year={2025},
  doi={10.13140/RG.2.2.25477.82407}
}

License

MIT β€” see LICENSE for details.

A
license - permissive license
-
quality - not tested
C
maintenance

Maintenance

–Maintainers
–Response time
–Release cycle
–Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/smriti-memcore/smriti-memcore'

If you have feedback or need assistance with the MCP directory API, please join our Discord server