Loads and queries arXiv papers and other PDF documents by caching their content into Gemini's context for natural language interrogation.
Provides deployment option for hosting Mnemo as a Cloudflare Worker with D1 database integration for cache management and external endpoint access.
Loads and queries public and private GitHub repositories by caching entire codebases into Gemini's context, supporting natural language queries and multiple repository combinations.
Mnemo
Extended memory for AI assistants via Gemini context caching.
Mnemo (Greek: memory) gives AI assistants like Claude access to large codebases, documentation sites, PDFs, and more by leveraging Gemini's 1M token context window and context caching features.
Why Mnemo?
Instead of complex RAG pipelines with embeddings and retrieval, Mnemo takes a simpler approach:
Load your entire codebase into Gemini's context cache
Query it with natural language
Let Claude orchestrate while Gemini holds the context
This gives you:
Perfect recall - no chunking or retrieval means no lost context
Lower latency - cached context is served quickly
Cost savings - cached tokens cost 75-90% less than regular input tokens
Simplicity - no vector databases, embeddings, or complex retrieval logic
What Can Mnemo Load?
Source | Local Server | Worker |
GitHub repos (public) | ✅ | ✅ |
GitHub repos (private) | ✅ | ✅ |
Any URL (docs, articles) | ✅ | ✅ |
PDF documents | ✅ | ✅ |
JSON APIs | ✅ | ✅ |
Local files/directories | ✅ | ❌ |
Multi-page crawls | ✅ unlimited | ✅ 40 pages max |
Deployment Options
Mnemo can be deployed in three ways depending on your needs.
Option 1: Local Server (Development & Full Features)
Best for development and when you need to load local files.
Claude Code MCP config:
Option 2: Self-Hosted Cloudflare Worker (Recommended for Claude.ai)
Deploy to your own Cloudflare account. You control your data and costs.
Prerequisites:
Cloudflare account (free tier works)
Claude.ai MCP config:
Why use this? Claude.ai can't connect to localhost. The Worker gives you an external endpoint that Claude.ai can reach.
Option 3: Managed Hosting (VIP)
Don't want to manage infrastructure? We offer fully managed Mnemo hosting for select clients.
Includes:
Dedicated Worker deployment
Priority support
Custom domain
Usage monitoring
Contact: lf@logosflux.io for pricing and availability.
Usage Examples
CLI
MCP Tools
Tool | Description |
| Load GitHub repos, URLs, PDFs, or local dirs into Gemini cache |
| Query a cached context with natural language |
| List all active caches with token counts and expiry |
| Remove a cache |
| Get usage statistics with cost tracking |
| Reload a cache with fresh content |
context_load Parameters
Parameter | Description |
| Single source: GitHub URL, any URL, or local path |
| Multiple sources to combine into one cache |
| Friendly name for this cache (1-64 chars) |
| Time to live in seconds (60-86400, default 3600) |
| GitHub token for private repos |
| Custom system prompt for queries |
Configuration
Variable | Description | Default |
| Your Gemini API key | Required |
| Server port (local only) | 8080 |
| Data directory (local only) | ~/.mnemo |
| Auth token for protected endpoints | None |
Authentication
When MNEMO_AUTH_TOKEN is configured, the /mcp and /tools/* endpoints require authentication:
Public endpoints (no auth required):
GET /health- Health checkGET /- Service infoGET /tools- List available tools
Costs
You always pay for Gemini API usage regardless of deployment option. Mnemo uses Gemini's context caching which is significantly cheaper than standard input:
Resource | Cost |
Cache storage | ~$4.50 per 1M tokens per hour |
Cached input | 75-90% discount vs regular input |
Regular input | ~$0.075 per 1M tokens (Flash) |
Example: 100K token codebase cached for 1 hour with 10 queries ≈ $0.47
Cloudflare costs (self-hosted):
Workers: Free tier includes 100K requests/day
D1: Free tier includes 5M reads/day
Likely $0 for moderate usage
Architecture
License
MIT
Credits
Built by Logos Flux | Voltage Labs