Mnemosyne
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@MnemosyneRemember that I prefer dark mode in all my projects."
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Mnemosyne
A persistent memory layer for LLM applications. Stores, retrieves, and injects relevant context from past conversations so your AI assistant remembers who you are and what you care about.
Exposes a Model Context Protocol (MCP) server that any MCP-compatible client (Claude Desktop, Cursor, etc.) can connect to.
How it works
User message
│
▼
┌─────────────────────────────────────────────────────┐
│ Session graph │
│ Load working memory (Redis) │
│ → Retrieve episodic memories (pgvector) │
│ → Retrieve long-term facts (pgvector) │
│ → Inject memory block into system prompt │
│ → Generate LLM response with memory context │
└─────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ Ingestion graph │
│ LLM extracts key facts from conversation │
│ → Auto-select chunk strategy │
│ → Embed chunks (OpenAI text-embedding-3) │
│ → Store in pgvector + extracted facts as │
│ long-term memories │
└─────────────────────────────────────────────────────┘Memory tiers
Tier | Storage | TTL | Purpose |
| Redis | 4 hours | In-session context (current conversation) |
| Postgres | Decays over time | Specific events and interactions |
| Postgres | Persistent | Stable facts, preferences, habits |
| Postgres | Persistent | General knowledge extracted across episodes |
Retrieval pipeline
Every query runs through hybrid search (dense vector + BM25 keyword), Reciprocal Rank Fusion, optional Cohere reranking, and token-budget trimming before memories are returned.
Related MCP server: AGI MCP Server
Prerequisites
Docker and Docker Compose
Python 3.11+
piporuv
Setup
1. Clone and enter the directory
git clone <repo-url>
cd mnemosyne2. Configure environment
cp .env.example .envOpen .env and fill in the required values:
# Required: OpenAI key for embeddings (always needed)
OPENAI_API_KEY=sk-...
# Required: choose one LLM provider for generation
LLM_PROVIDER=gemini # anthropic | openai | gemini | local
GEMINI_API_KEY=your-key # or ANTHROPIC_API_KEY / OPENAI_API_KEY
# Optional but recommended: Cohere for reranking
COHERE_API_KEY=
# Optional: Langfuse for observability (get keys at cloud.langfuse.com)
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
# Optional: encryption at rest for stored memories
ENCRYPT_KEY=LLM provider options:
| Default model | Free tier |
|
| Yes — set |
|
| No |
|
| No |
|
| Yes — requires Ollama running locally |
To override the model, set LLM_MODEL=gemini-2.0-flash (or any model ID).
3. Start infrastructure
make upStarts Postgres (with pgvector) on port 5432 and Redis on port 6379 via Docker.
4. Install Python dependencies
pip install -e ".[dev]"
# or with uv:
uv pip install -e ".[dev]"5. Run database migrations
make migrate6. Start the MCP server
python mcp/server.pyUsing with Claude Desktop
Add Mnemosyne as an MCP server so Claude Desktop automatically gains persistent memory.
Open ~/Library/Application Support/Claude/claude_desktop_config.json and add:
{
"mcpServers": {
"mnemosyne": {
"command": "python",
"args": ["/absolute/path/to/mnemosyne/mcp/server.py"],
"env": {
"PYTHONPATH": "/absolute/path/to/mnemosyne"
}
}
}
}Restart Claude Desktop. A hammer icon will appear in the input bar with 4 memory tools available.
What Claude can now do
Claude will call memory tools automatically. You can also prompt it directly:
"Remember that I prefer dark mode in all my projects."
"What do you know about my coding preferences?"
"Forget everything about my old job."
"What did we discuss last week about the auth system?"Available MCP tools
Tool | What it does |
| Save information from the current conversation |
| Search past memories by topic or query |
| Load all relevant memories at the start of a conversation |
| Delete a specific memory by ID |
Using with Cursor
Add to ~/.cursor/mcp.json (global) or .cursor/mcp.json (per project):
{
"mcpServers": {
"mnemosyne": {
"command": "python",
"args": ["/absolute/path/to/mnemosyne/mcp/server.py"]
}
}
}Using the Python API directly
import asyncio
from mnemosyne.agents.router import Intent, route
async def main():
# Store a memory
await route(
Intent.INGEST,
user_id="user-123",
content="The user prefers Python over JavaScript and uses VS Code.",
conversation_id="conv-abc",
tier="long_term",
)
# Ask a question — memory is retrieved and injected automatically
result = await route(
Intent.CHAT,
user_id="user-123",
conversation_id="conv-abc",
user_message="What editor should I set up for this project?",
)
print(result.response)
# → "Based on your preference for VS Code, here's how to set it up..."
# Retrieve raw memories without generating a response
result = await route(
Intent.RETRIEVE,
user_id="user-123",
query="programming language preferences",
)
for memory in result.final_memories:
print(f"[{memory.tier.value}] {memory.content} (score: {memory.score:.3f})")
asyncio.run(main())Local LLM with Ollama (no API keys)
Run entirely offline:
# Install Ollama from https://ollama.com, then pull a model
ollama pull llama3Set in .env:
LLM_PROVIDER=local
LOCAL_LLM_BASE_URL=http://localhost:11434
LOCAL_LLM_MODEL=llama3Note: you still need
OPENAI_API_KEYfor embeddings. To remove that dependency, swap the embedding provider to a local model inembedding/tiered.pyusing theLocalEmbedderclass.
Background jobs
Run these on a schedule (cron or similar) to keep memories healthy:
# Decay old memories — reduces score of stale episodic/long-term memories
# Recommended: run nightly
python jobs/decay_sweep.py
# Rebuild BM25 keyword search index
# Recommended: run after bulk ingestion
python jobs/index_rebuild.pyDevelopment
make test # run test suite
make lint # ruff + mypy
make down # stop Docker servicesProject structure
mnemosyne/
├── agents/ # LangGraph pipelines (ingestion, retrieval, session)
├── chunking/ # Auto-selecting text chunking strategies
├── config/ # Settings (pydantic-settings) and structured logging
├── embedding/ # OpenAI and local embedding providers, tiered routing
├── evals/ # Evaluation harness and golden set
├── infra/ # Dockerfile
├── jobs/ # Decay sweep and index rebuild cron jobs
├── llm/ # LLM providers: Anthropic, OpenAI, Gemini, Local (Ollama)
├── mcp/ # MCP server and tool definitions
├── memory/ # Memory types, decay, and access tracking
├── observability/ # Langfuse tracing, metrics, health check
├── prompts/ # YAML prompt templates
├── retrieval/ # Hybrid search, BM25, reranker, RRF, assembler
└── storage/ # Postgres (pgvector), Redis, S3, encryption, RLSArchitecture notes
No LLM required to run — if
LLM_PROVIDERis unset or the API key is missing, ingestion still stores raw chunks andsession_initstill returns the memory block. You can pass the memory block to your own LLM.Multi-user — each call takes a
user_id. Memories are isolated per user via Postgres row-level security.Pluggable reranking — Cohere reranking is optional. Without a
COHERE_API_KEYthe system falls back to RRF ordering.Chunking is automatic — the ingestion pipeline inspects content and picks the best strategy: fixed, recursive, sentence-based, or structural (for Markdown/code).
Mnemosyne
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/kharshita590/Mnemosyne'
If you have feedback or need assistance with the MCP directory API, please join our Discord server