cachly-mcp-server
Provides tools for managing cachly cache instances, enabling AI assistants to create, list, monitor, and delete cache instances, perform cache operations, and utilize semantic search capabilities.
Integrates with Keycloak for JWT-based authentication to cachly services, allowing AI assistants to securely manage cache instances and operations.
Enables caching and semantic search capabilities for OpenAI projects through cachly instances, reducing LLM API calls and improving response times.
Provides Redis-compatible cache operations including get/set/delete, key inspection, distributed locks, and streaming cache for LLM tokens through cachly instances.
cachly MCP Server
Manage your cachly.dev cache instances directly from GitHub Copilot, Claude, Cursor, Windsurf and any other MCP-compatible AI assistant.
π Zero-Touch Setup β One Command
Stop your AI from re-reading your entire codebase every time. One command enables context memory and configures all your editors automatically:
CACHLY_JWT=your-jwt npx @cachly-dev/mcp-server setupThe interactive wizard will:
Authenticate with your cachly account (or prompt for JWT)
Let you pick which cache instance to use as your AI Brain
Auto-detect Cursor, Windsurf, VS Code, Claude Code, and Continue.dev
Write the correct MCP config for every detected editor
Create/update
CLAUDE.md(idempotent β safe to re-run)
Result: 60% fewer file reads, instant context across sessions, zero re-discovery.
Non-interactive (CI / scripted setup)
CACHLY_JWT=your-jwt npx @cachly-dev/mcp-server init \
--instance-id your-instance-id \
--editor vscodeWhat you can do
Once connected, just talk to your AI assistant:
"Create a free cachly instance called my-app-cache"
"List all my cache instances"
"Get the connection string for instance abc-123"
"Delete my test-cache instance"Available Tools
π§ AI Brain β Session & Memory
Tool | Description |
| Single call returning full briefing: last session, relevant lessons, open failures, brain health. Call at the start of every session. |
| Save session summary, files changed, duration. Call at the end of every session. |
| Store structured lessons after any bug fix or deploy. Supports severity, file_paths, commands, tags. Deduplicates by topic. |
| Retrieve the best known solution for a topic (increments recall count). |
| Cache any analysis or architecture finding for future sessions. |
| Retrieve cached context by exact key (supports glob: |
| Semantic search across all cached context by meaning/keywords. |
| List all cached context entries. |
| Delete stale context. |
βοΈ Instance Management
Tool | Description |
| List all your cache instances |
| Create a new instance (free or paid tier) |
| Get details for a specific instance |
| Get the |
| Permanently delete an instance |
ποΈ Cache Operations
Tool | Description |
| Live cache operations |
| Key inspection |
| Memory, hit rate, ops/sec |
| Bulk pipeline operations |
| Distributed locks (Redlock-lite) |
| LLM token streaming cache |
π Semantic & AI
Tool | Description |
| Vector similarity search (Speed/Business tier) |
| Auto-classify prompt into semantic namespace |
| Pre-warm semantic cache with known Q&A pairs |
| Index local source files for AI semantic search |
| Check API health + JWT auth info |
Setup
Recommended: Zero-Touch via npx
CACHLY_JWT=your-jwt npx @cachly-dev/mcp-server setupNo install, no build step. The wizard auto-detects your editors and writes all config files.
Manual configuration
Get your JWT token at cachly.dev/settings β API Tokens.
Claude Code / Claude Desktop
{
"mcpServers": {
"cachly": {
"command": "npx",
"args": ["-y", "@cachly-dev/mcp-server"],
"env": { "CACHLY_JWT": "your-jwt-token-here" }
}
}
}Claude Code: add to
.claude/mcp.jsonin your projectClaude Desktop (macOS):
~/Library/Application Support/Claude/claude_desktop_config.json
GitHub Copilot (VS Code)
Add to .vscode/mcp.json:
{
"servers": {
"cachly": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@cachly-dev/mcp-server"],
"env": { "CACHLY_JWT": "your-jwt-token-here" }
}
}
}Then: Ctrl/Cmd+Shift+P β "MCP: List Servers" β start cachly.
Cursor
Add to .cursor/mcp.json:
{
"mcpServers": {
"cachly": {
"command": "npx",
"args": ["-y", "@cachly-dev/mcp-server"],
"env": { "CACHLY_JWT": "your-jwt-token-here" }
}
}
}Windsurf / Continue.dev
Same stdio/mcpServers format β add to their respective MCP config file.
Environment Variables
Variable | Required | Default | Description |
| β | β | Your Keycloak JWT from cachly.dev/settings |
| β |
| Override for local dev |
Example Session
User: Create a free cache instance for my OpenAI project
Copilot: I'll create a free cachly instance for you.
[calls create_instance(name="openai-cache", tier="free")]
β
Instance **openai-cache** (FREE) created and provisioning started!
ID: `a1b2c3d4-...`
Status: provisioning
Use `get_connection_string` to get your Redis URL in ~30 seconds.
User: Get the connection string
Copilot: [calls get_connection_string(instance_id="a1b2c3d4-...")]
Connection string for openai-cache:
redis://:password@my-node.cachly.dev:30101
Environment variable:
REDIS_URL="redis://:password@my-node.cachly.dev:30101"Local Development
# Run against local API
CACHLY_JWT=your-token CACHLY_API_URL=http://localhost:3001 npm run devReal-World Use Cases
π§ Stop Re-Reading Your Codebase β "Thinking Cache"
The Problem: Every time you ask Copilot about your codebase, it runs "Ich verschaffe mir einen Γberblick ΓΌber die Codebasis" and re-reads hundreds of files.
The Solution: Cache your AI's "thinking" results:
You: "Analyze the authentication architecture"
AI: [reads 47 files, takes 30 seconds]
The auth uses Keycloak with JWT tokens. The flow is:
1. User hits /sign-in β NextAuth redirect
2. Keycloak validates credentials
3. JWT returned, stored in session
[calls remember_context("auth_architecture", "The auth uses Keycloak...")]
π§ Context saved for future sessions.
--- Next day, new session ---
You: "How does auth work?"
AI: [calls smart_recall("auth")]
π§ Found cached context:
The auth uses Keycloak with JWT tokens...
(No file reading needed β instant answer!)The setup wizard (see top of this README) writes the session instructions automatically into your CLAUDE.md / .github/copilot-instructions.md β no manual editing needed. Just run npx @cachly-dev/mcp-server setup once and your AI assistant calls session_start at the start of every session automatically.
1. AI-Assisted Development β "Which caches am I using?"
You're building a microservice that uses multiple cachly instances. Ask your AI assistant directly:
You: "List all my cachly instances and their status"
AI: Using cachly_list_instances...
You have 3 instances:
1. prod-api (Pro, running) β redis://:***@prod.cachly.dev:30101
2. staging (Dev, running) β redis://:***@staging.cachly.dev:30102
3. ml-pipeline (Speed, running) β redis://:***@ml.cachly.dev:30103
Total MRR: β¬72/month2. Cache Debugging β "Why is my hit rate dropping?"
Your semantic cache hit rate dropped overnight. Debug it without leaving your editor:
You: "Check the analytics for my prod-api instance"
AI: Using cachly_semantic_stats...
β οΈ Hit rate dropped 23% in the last 24h (82% β 63%)
Anomaly detected: near-miss spike (+140%)
β 47 queries are hitting similarity 0.80-0.84 (just below your 0.85 threshold)
Recommendation: Lower threshold to 0.82 to capture these near-misses.
You: "Do it"
AI: Using cachly_set_threshold... β
Threshold set to 0.82 for namespace cachly:sem3. Deployment Cache Warmup β "Pre-fill the cache for the new release"
After deploying a new version, warm the semantic cache with common queries:
You: "Warm up the staging cache with our top 50 support questions"
AI: Using cachly_batch_index with your FAQ embeddings...
β
Indexed 50 entries in 340ms (batch pipeline)
Namespace: cachly:sem:qa
Your cache is ready β first users will get instant responses.4. Cost Monitoring β "How much am I saving?"
Track your LLM cost savings directly in your IDE:
You: "How much has cachly saved me this month?"
AI: Using cachly_semantic_stats...
π This month's savings:
- Total cache hits: 12,847
- Estimated savings: $384.21 (vs. direct LLM calls)
- Efficiency score: 84/100 (Grade: A)
- Best namespace: cachly:sem:qa (94% hit rate)5. CI Pipeline Integration β "Check cache health before deploy"
Add cache health checks to your deployment workflow:
You: "Check if any cache anomalies would block a deploy"
AI: Using cachly_analytics_anomalies...
β
No critical anomalies detected.
1 info-level notice: stale cache in namespace "translations"
(12 near-misses/24h, 0 new entries)
Recommendation: Run warmup after deploy for translations namespace.
Deploy is safe to proceed.License
MIT Β© cachly.dev
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/cachly-dev/cachly-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server