Skip to main content
Glama

Mnemo

Extended memory for AI assistants via Gemini context caching.

Mnemo (Greek: memory) gives AI assistants like Claude access to large codebases, documentation sites, PDFs, and more by leveraging Gemini's 1M token context window and context caching features.

Why Mnemo?

Instead of complex RAG pipelines with embeddings and retrieval, Mnemo takes a simpler approach:

  • Load your entire codebase into Gemini's context cache

  • Query it with natural language

  • Let Claude orchestrate while Gemini holds the context

This gives you:

  • Perfect recall - no chunking or retrieval means no lost context

  • Lower latency - cached context is served quickly

  • Cost savings - cached tokens cost 75-90% less than regular input tokens

  • Simplicity - no vector databases, embeddings, or complex retrieval logic

What Can Mnemo Load?

Source

Local Server

Worker

GitHub repos (public)

GitHub repos (private)

Any URL (docs, articles)

PDF documents

JSON APIs

Local files/directories

Multi-page crawls

✅ unlimited

✅ 40 pages max

Deployment Options

Mnemo can be deployed in three ways depending on your needs.

Option 1: Local Server (Development & Full Features)

Best for development and when you need to load local files.

# Clone and install git clone https://github.com/logos-flux/mnemo cd mnemo bun install # Set your Gemini API key export GEMINI_API_KEY=your_key_here # Start the server bun run dev

Claude Code MCP config:

{ "mcpServers": { "mnemo": { "type": "http", "url": "http://localhost:8080/mcp" } } }

Deploy to your own Cloudflare account. You control your data and costs.

Prerequisites:

# Clone and install git clone https://github.com/logos-flux/mnemo cd mnemo/packages/cf-worker # Configure secrets bunx wrangler secret put GEMINI_API_KEY bunx wrangler secret put MNEMO_AUTH_TOKEN # Optional but recommended # Create D1 database bunx wrangler d1 create mnemo-cache # Deploy bunx wrangler deploy

Claude.ai MCP config:

{ "mcpServers": { "mnemo": { "type": "http", "url": "https://mnemo.<your-subdomain>.workers.dev/mcp", "headers": { "Authorization": "Bearer YOUR_AUTH_TOKEN" } } } }

Why use this? Claude.ai can't connect to localhost. The Worker gives you an external endpoint that Claude.ai can reach.


Option 3: Managed Hosting (VIP)

Don't want to manage infrastructure? We offer fully managed Mnemo hosting for select clients.

Includes:

  • Dedicated Worker deployment

  • Priority support

  • Custom domain

  • Usage monitoring

Contact: lf@logosflux.io for pricing and availability.


Usage Examples

# Load a GitHub repo curl -X POST http://localhost:8080/tools/context_load \ -H "Content-Type: application/json" \ -d '{"source": "https://github.com/honojs/hono", "alias": "hono"}' # Load a documentation site (crawls up to token target) curl -X POST http://localhost:8080/tools/context_load \ -H "Content-Type: application/json" \ -d '{"source": "https://hono.dev/docs", "alias": "hono-docs"}' # Load a PDF curl -X POST http://localhost:8080/tools/context_load \ -H "Content-Type: application/json" \ -d '{"source": "https://arxiv.org/pdf/2303.08774.pdf", "alias": "gpt4-paper"}' # Load a private repo (with GitHub token) curl -X POST http://localhost:8080/tools/context_load \ -H "Content-Type: application/json" \ -d '{"source": "https://github.com/owner/private-repo", "alias": "private", "githubToken": "ghp_xxx"}' # Load multiple sources into one cache curl -X POST http://localhost:8080/tools/context_load \ -H "Content-Type: application/json" \ -d '{"sources": ["https://github.com/owner/repo", "https://docs.example.com"], "alias": "combined"}' # Query the cache curl -X POST http://localhost:8080/tools/context_query \ -H "Content-Type: application/json" \ -d '{"alias": "hono", "query": "How do I add middleware?"}' # List active caches curl -X POST http://localhost:8080/tools/context_list \ -H "Content-Type: application/json" -d '{}' # Get usage stats with cost tracking curl -X POST http://localhost:8080/tools/context_stats \ -H "Content-Type: application/json" -d '{}' # Evict when done curl -X POST http://localhost:8080/tools/context_evict \ -H "Content-Type: application/json" \ -d '{"alias": "hono"}'

CLI

# Start server mnemo serve # Start MCP stdio transport (for Claude Desktop) mnemo stdio # Load a project mnemo load ./my-project my-proj # Query mnemo query my-proj "What's the main entry point?" # List caches mnemo list # Remove cache mnemo evict my-proj

MCP Tools

Tool

Description

context_load

Load GitHub repos, URLs, PDFs, or local dirs into Gemini cache

context_query

Query a cached context with natural language

context_list

List all active caches with token counts and expiry

context_evict

Remove a cache

context_stats

Get usage statistics with cost tracking

context_refresh

Reload a cache with fresh content

context_load Parameters

Parameter

Description

source

Single source: GitHub URL, any URL, or local path

sources

Multiple sources to combine into one cache

alias

Friendly name for this cache (1-64 chars)

ttl

Time to live in seconds (60-86400, default 3600)

githubToken

GitHub token for private repos

systemInstruction

Custom system prompt for queries

Configuration

Variable

Description

Default

GEMINI_API_KEY

Your Gemini API key

Required

MNEMO_PORT

Server port (local only)

8080

MNEMO_DIR

Data directory (local only)

~/.mnemo

MNEMO_AUTH_TOKEN

Auth token for protected endpoints

None

Authentication

When MNEMO_AUTH_TOKEN is configured, the /mcp and /tools/* endpoints require authentication:

# Set auth token (Workers) bunx wrangler secret put MNEMO_AUTH_TOKEN # Requests must include header: Authorization: Bearer your-token-here

Public endpoints (no auth required):

  • GET /health - Health check

  • GET / - Service info

  • GET /tools - List available tools

Costs

You always pay for Gemini API usage regardless of deployment option. Mnemo uses Gemini's context caching which is significantly cheaper than standard input:

Resource

Cost

Cache storage

~$4.50 per 1M tokens per hour

Cached input

75-90% discount vs regular input

Regular input

~$0.075 per 1M tokens (Flash)

Example: 100K token codebase cached for 1 hour with 10 queries ≈ $0.47

Cloudflare costs (self-hosted):

  • Workers: Free tier includes 100K requests/day

  • D1: Free tier includes 5M reads/day

  • Likely $0 for moderate usage

Architecture

┌─────────────────────────────────────────────────────────────┐ │ Mnemo │ ├─────────────────────────────────────────────────────────────┤ │ MCP Tools │ │ • context_load - Load into Gemini cache │ │ • context_query - Query cached context │ │ • context_list - Show active caches │ │ • context_evict - Remove cache │ │ • context_stats - Token usage, costs │ │ • context_refresh - Reload cache │ ├─────────────────────────────────────────────────────────────┤ │ Adapters (v0.2) │ │ • GitHub repos (via API) │ │ • URL loading (HTML, PDF, JSON, text) │ │ • Token-targeted crawling │ │ • robots.txt compliance │ ├─────────────────────────────────────────────────────────────┤ │ Packages │ │ • @mnemo/core - Gemini client, loaders, adapters │ │ • @mnemo/mcp-server - MCP protocol handling │ │ • @mnemo/cf-worker - Cloudflare Workers deployment │ │ • @mnemo/local - Bun-based local server │ └─────────────────────────────────────────────────────────────┘

License

MIT

Credits

Built by Logos Flux | Voltage Labs

-
security - not tested
A
license - permissive license
-
quality - not tested

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Logos-Flux/mnemo'

If you have feedback or need assistance with the MCP directory API, please join our Discord server