Which integrations are available for this server?

Provides tools for OpenAI Codex CLI to search and retrieve relevant code chunks from an indexed codebase, reducing token usage.

How do I use Code Context Engine?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Code Context Engine search for the UserService class definition" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

Code Context Engine

Official

by elara-labs

Overview Schema Related Servers Score Discussions

Python

Local

Use cases

	Use case	How CCE helps
💰	Reduce Claude Code costs	94% fewer input tokens per session
🔒	Keep code private	Everything local, no cloud indexing
🔄	Multi-editor teams	One index across Claude Code, Cursor, VS Code, Gemini CLI
🧠	Cross-session memory	Decisions and context survive restarts
⚡	Faster responses	Less context = faster Claude replies
📊	Track actual savings	Dollar amounts, not estimates

Related MCP server: mcp-code-indexer

Quick start

One command. 30 seconds.

uvx --from "code-context-engine[local]" cce init    # install + index + configure, one shot

Or if you prefer a persistent install:

uv tool install "code-context-engine[local]"    # or: pipx install "code-context-engine[local]"
cd /path/to/your/project
cce init

Restart your editor. Done. Every question now hits the index instead of re-reading files.

Already have Ollama? Skip [local] and use uv tool install code-context-engine instead. CCE auto-detects Ollama at localhost:11434 and uses nomic-embed-text.

Python 3.11+ and a C compiler (for tree-sitter grammars).

Platform	Setup
macOS	`xcode-select --install`
Ubuntu/Debian	`sudo apt install build-essential cmake`
Fedora/RHEL	`sudo dnf install gcc gcc-c++ cmake`
Windows	Visual Studio Build Tools (C++ workload) + CMake

Tested on macOS, Linux, Windows with Python 3.11/3.12/3.13.

cce init auto-detects your editor and writes the right config. To target a specific agent, use --agent claude, --agent codex, --agent copilot, --agent pi, or --agent all.

Editor	Config written	Instructions
Claude Code	`.mcp.json`	`CLAUDE.md`
VS Code / Copilot	`.vscode/mcp.json`	`.github/copilot-instructions.md`
Cursor	`.cursor/mcp.json`	`.cursorrules`
Gemini CLI	`.gemini/settings.json`	`GEMINI.md`
OpenAI Codex	`~/.codex/config.toml` (user-global, per-project section)	`AGENTS.md`
OpenCode	`opencode.json`
Tabnine	`.tabnine/agent/settings.json`	`TABNINE.md`
Pi	`.mcp.json`	`AGENTS.md`

Multiple editors in the same project? All get configured in one command.

Codex note: Codex CLI reads MCP servers from ~/.codex/config.toml only — it has no per-project config. cce init adds one [mcp_servers.cce-<project>-<hash>] section per project so multiple projects coexist; cce uninstall removes only the section for the current project.

Pi note: Pi does not support MCP natively. To use CCE with Pi, you need a pi MCP adapter extension (e.g. pi-mcp-adapter) that consumes the .mcp.json config and exposes CCE's tools to the Pi agent. cce init sets up both .mcp.json and AGENTS.md — Pi loads the latter automatically for startup instructions.

  my-project · 38 queries · last query 5m ago

  ⛁ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶  88% tokens saved

  Input savings   1.9M  tokens   $27.78
  Output savings  4.8k  tokens   $0.36
  ──────────────────────────────────────────
  Total saved   1.9M  tokens   $28.15

  Breakdown:
    retrieval              84%  ▰▰▰▰▰▰▰▰▰▰    1.8M   $26.76 · 12 calls
    chunk compression       3%  ▰▱▱▱▱▱▱▱▱▱   68.5k    $1.03 · 12 calls
    output compression*    <1%  ▰▱▱▱▱▱▱▱▱▱    4.8k    $0.36 · 12 calls

  Cost estimate based on Opus pricing (input $15.0/1M, output $75.0/1M)

Supports Anthropic, OpenAI, and Google model pricing. Configure via pricing.model in ~/.cce/config.yaml.

Why this matters

Input tokens are 85-95% of your Claude Code bill. CCE cuts them by 94% (benchmarked on FastAPI).

Without CCE:    Claude reads payments.py + shipping.py   = 45,000 tokens
With CCE:       context_search "payment flow"            =    800 tokens

	Without CCE	With CCE
Session startup	Re-reads files every time	Queries the index
Finding a function	Read entire 800-line file	Get the 40-line function
Cross-session memory	None	Decisions + code areas persisted
Token cost (Sonnet, medium project)	~$0.14/session	~$0.04/session

Benchmark: FastAPI (reproducible)

We benchmarked CCE against FastAPI (53 source files, 180K tokens) with 20 real coding questions. No cherry-picking, no synthetic queries.

Methodology: For each query, "without CCE" means reading the full content of every file the query touches. "With CCE" means the relevant chunks after compression.

Important baseline note: The 94% number is measured against full-file reads, not against what Claude Code actually does. In practice, Claude Code already uses grep, partial file reads, and targeted tools, so the real-world savings compared to normal Claude Code behavior will be lower than 94%. We use full-file as the baseline because it's reproducible and deterministic (no agent behavior variability). The benchmark measures CCE's retrieval efficiency, not a head-to-head comparison with Claude Code's built-in exploration.

Metric	Result
Retrieval savings	94% (83,681 → 4,927 tokens/query)
Compression (additional, on retrieved chunks)	89% (4,927 → 523 tokens/query)
Recall@10 (found the right files)	0.90
Latency p50	0.4ms
Queries tested	20

Per-Layer Savings (each measured independently)

Layer	What it does	Savings	Method
Retrieval	Full files → relevant code chunks	94%	measured
Chunk Compression	Raw chunks → signatures + docstrings	89%	measured
Grammar	Drops articles/fillers from memory text	13%	measured

Output compression (reducing Claude's reply length) provides additional savings (~65% estimated) but is not included in the headline number above.

Multi-language benchmarks

Repo	Language	Files	Retrieval savings	Recall@10
FastAPI	Python	53	94%	0.90
chi	Go	94	76%	0.67
fiber	Go (monorepo)	396	93%	0.07

Go's shorter files reduce the retrieval headroom (smaller baseline). Monorepos dilute recall at top-10 (fiber). Middleware queries with one-feature-per-file hit R=1.00 consistently.

Reproduce it yourself:

pip install code-context-engine
python benchmarks/run_benchmark.py --repo https://github.com/fastapi/fastapi.git --source-dir fastapi
python benchmarks/run_benchmark.py --repo https://github.com/go-chi/chi.git --source-dir .

Full results in benchmarks/results/. Queries and methodology in benchmarks/.

What you get

9 MCP tools that Claude uses automatically:

Tool	What it does
`context_search`	Hybrid vector + BM25 search with graph expansion
`expand_chunk`	Full source for a compressed result
`related_context`	Find code via graph edges (calls, imports)
`session_recall`	Recall decisions from past sessions
`record_decision`	Save a decision for future sessions
`record_code_area`	Record which files were worked in
`index_status`	Check index freshness
`reindex`	Re-index a file or the full project
`set_output_compression`	Adjust response verbosity (`off` / `lite` / `standard` / `max`)

Live dashboard with donut charts, file health, and session history:

cce dashboard

CCE Dashboard

Dollar estimates with multi-provider pricing (Anthropic, OpenAI, Google):

cce savings --all    # see savings across all projects

How it works

Index: Tree-sitter parses your code into semantic chunks (functions, classes, modules). Stored as vector embeddings locally.
Search: Claude calls context_search. Hybrid vector + BM25 retrieval finds the right chunks. Code graph adds related files automatically.
Compress: Chunks are truncated to signatures + docstrings (or LLM-summarized if Ollama is running).
Remember: Decisions and code areas persist across sessions via session_recall.
Track: Every query is logged. cce savings shows exactly how much you saved.

Re-indexing after edits takes under 1 second (96% embedding cache hit rate). Git hooks keep the index current automatically.

What makes CCE different

It saves where the money is

Output compression tools (like Caveman) save 20-75% on output tokens. Output is 5-15% of your bill. Net savings: ~11%.

CCE saves on input tokens (94% retrieval savings on FastAPI, reproducibly benchmarked). Input is 85-95% of your bill.

It actually understands your code

Not a text search. Tree-sitter AST parsing creates semantic chunks. Hybrid retrieval merges vector similarity with BM25 keyword matching via Reciprocal Rank Fusion. A confidence scorer blends similarity (50%), keyword match (30%), and recency (20%). Graph expansion walks CALLS/IMPORTS edges to pull in related code.

It remembers

record_decision("use JWT for auth", reason="session tokens flagged by legal") is stored in SQLite and surfaces via session_recall in the next session. No re-explaining your architecture.

It tracks real savings

Not estimates. Actual tokens served vs full-file baseline, broken down by buckets (retrieval, compression, output, memory, grammar). Dollar costs fetched from Anthropic's pricing page. Savings summary shown at every session start.

It is secure by default

Secret files (.env, *.pem, credentials.json) are never indexed. Content is scanned for AWS keys, GitHub tokens, Slack tokens, Stripe keys, JWTs, and generic credentials. PII (emails, IPs, SSNs, credit cards) is scrubbed from memory writes. All MCP file paths are validated against path traversal.

Under the hood

SHA-256 fingerprint per chunk, salted with model name. Re-index skips unchanged code. Binary float32 storage (10x smaller than JSON). Typical re-index: 96% cache hit, under 1 second.

Replaced LanceDB with sqlite-vec. Same cosine-distance quality, 99% smaller install. WAL mode + PRAGMA NORMAL for 80% write speedup. Vectors, FTS5, code graph, and compression cache all in three SQLite files.

Memory entries compressed without LLM calls. Drops articles, fillers, pronouns. Three levels (lite/full/ultra, 20-60% savings). Code, paths, URLs preserved byte-for-byte. Same input always yields same output.

5 Claude Code lifecycle hooks capture session context. Every hook runs curl ... || true, so a crashed server never blocks the user. SessionStart injects bootstrap context; others capture silently.

Dollar estimates in cce savings support 15+ models across Anthropic, OpenAI, and Google. Static pricing ships with CCE, live Anthropic pricing is fetched and cached 7 days. Configure pricing.model (e.g. gpt-4o, gemini-2.5-pro, sonnet) or override with pricing.input / pricing.output for custom rates.

7 buckets track every token saved: retrieval, chunk compression, output compression, memory recall, grammar, turn summarization, progressive disclosure. Survives restarts. Powers CLI and dashboard analytics.

CLI at a glance

cce init                    # Index + install hooks + register MCP
cce                         # Status banner
cce savings                 # Token savings with dollar estimates
cce savings --all           # All projects
cce dashboard               # Web dashboard with live charts
cce search "auth flow"      # Test a query
cce status                  # Index health + config
cce services                # Ollama + dashboard + MCP status
cce commands add-rule '...' # Project rules for Claude
cce uninstall               # Clean removal of all CCE artifacts

Run cce list for the full command reference.

Configuration

Zero-config by default. Override what you need in ~/.cce/config.yaml or .context-engine.yaml:

compression:
  level: standard          # minimal | standard | full
  output: standard         # off | lite | standard | max
  ollama_url: http://localhost:11434   # point at a remote Ollama if desired

retrieval:
  top_k: 20
  confidence_threshold: 0.5

pricing:
  model: opus              # opus | sonnet | haiku | gpt-4o | gemini-2.5-pro | ...
  # input: 15.0            # override $/1M input tokens
  # output: 75.0           # override $/1M output tokens

Remote Ollama: If you run Ollama on another machine in your network, set compression.ollama_url (e.g. http://nas.local:11434) or export CCE_OLLAMA_URL — the env var wins. CCE probes the endpoint and falls back to truncation-only compression when it's unreachable, so a flaky link won't break indexing.

Output Compression

CCE also compresses Claude's responses (same concept as Caveman):

Level	Style	Savings
`off`	Full output	0%
`lite`	No filler or hedging	~30%
`standard`	Fragments, drop articles	~65%
`max`	Telegraphic	~75%

Tell Claude: "switch to max compression" or "turn off compression". Code blocks and commands are never compressed.

Disk Footprint

Component	Size
Core install (Ollama backend)	~17 MB
With `[local]` extra (fastembed + ONNX)	~189 MB
Embedding model (one-time download)	~60 MB (fastembed) or managed by Ollama
Index per project (small/medium/large)	5-60 MB

No GPU required. With Ollama, embeddings are handled by the Ollama server. With the [local] extra, the embedding model runs on CPU via ONNX Runtime.

Supported Languages

AST-aware chunking (tree-sitter parsed, 11 extensions):

Language	Extensions
Python	`.py`
JavaScript	`.js`, `.jsx`
TypeScript	`.ts`, `.tsx`
PHP	`.php`
Go	`.go`
Rust	`.rs`
Java	`.java`
C#	`.cs`

Language-aware fallback chunking (40+ extensions):

Category	Languages
Web	HTML, CSS, SCSS, LESS, Vue, Svelte
Systems	C, C++, Zig, Nim
Mobile	Swift, Kotlin, Dart
Functional	Haskell, Scala, Clojure, Elixir, Erlang, F#
Scripting	Ruby, Perl, Lua, R, Bash/Zsh
Data/Config	JSON, YAML, TOML, XML, SQL, GraphQL, Protobuf
DevOps	Terraform, HCL, Dockerfile
Docs	Markdown

All other text files are chunked by line range. Binary files are skipped.

Documentation

Page	Content
How Much Are You Spending on AI Coding Tokens?	The math on input vs output tokens
What is CCE? (Complete Guide)	Setup, tools, how it works, FAQ
How to Save Claude Code Tokens	Cost breakdown and savings guide
Benchmark Deep Dive	Full FastAPI benchmark methodology
Comparison with Alternatives	CCE vs Cursor, Aider, Continue, Greptile
Examples	Real conversations with Claude
How It Works	Full 9-stage pipeline
CLI Reference	Every command with output
Configuration	All config options

FAQ

Does CCE affect response quality?

No. Quality stays the same or slightly improves.

CCE replaces "dump the entire file" with "search for the relevant function." The model still gets the code it needs (0.90 Recall@10 in benchmarks). Less irrelevant context means less noise competing for attention, which can improve the model's focus on your actual question.

How does output token savings work?

CCE writes output compression rules directly into your agent's instruction files (CLAUDE.md, AGENTS.md, .cursorrules, etc.) during cce init. These rules apply to the entire session, not just CCE tool responses, so every reply from the agent follows them.

Set the level in ~/.cce/config.yaml or .context-engine.yaml:

compression:
  output: max       # off | lite | standard | max

Then re-run cce init to update instruction files. Or change at runtime:

set_output_level output_level=max

Level	Savings	What it does
`off`	0%	No compression
`lite`	~25%	Removes filler/hedging/pleasantries + diff-only for code changes
`standard`	~70%	Drops articles, fragments, short synonyms + diff-only for code
`max`	~80%	Telegraphic style + diff-only for code

Default is standard. All levels include code output rules that tell the model to show only changed lines (not full file rewrites), which is where most output tokens go in coding sessions. The max level produces very terse prose (similar to "caveman mode"). Code blocks, paths, and commands are never compressed regardless of level.

Where do the savings come from?

Most savings are input tokens (what goes into the model):

Layer	Type	Typical savings
Retrieval	Input	94% (full files → relevant chunks)
Chunk compression	Input	89% (chunks → signatures)
Grammar compression	Input	13% (article/filler removal)
Turn summarization	Input	varies (session history)
Progressive disclosure	Input	varies (tool payloads)
Output compression	Output	25-80% (depends on level)

Output tokens cost 5x more per token (e.g. Opus: $15/1M input vs $75/1M output), so even a small output reduction has outsized cost impact.

Roadmap

Multi-repo benchmarks (FastAPI, chi, fiber)
More benchmarks (Django, Express)
Tree-sitter support for C, C++, Ruby, Swift, Kotlin
Docker support for remote mode

See CHANGELOG.md for shipped features.

Contributing

Contributions welcome. See https://github.com/elara-labs/code-context-engine/blob/main/CONTRIBUTING.md for setup.

License

MIT. See LICENSE.

Authors

Acknowledgments

Claude Code · MCP · sqlite-vec · Tree-sitter · fastembed · Ollama

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

1dResponse time

3dRelease cycle

20Releases (12mo)

Commit activity

Issues opened vs closed

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Related MCP Servers

GraphHub
Code Analysis Knowledge & Memory Developer Tools
slnquangtran
F
license
-
quality
C
maintenance
Transforms codebases into a knowledge graph for AI agents, enabling semantic search, impact analysis, and persistent session memory with up to 94% token savings.
Last updated 2026-04-25
2
17
mcp-code-indexer
Code Analysis Search Developer Tools
fluffypony
A
license
-
quality
D
maintenance
Enables AI agents to intelligently navigate and understand codebases by providing instant file descriptions, semantic search, and context-aware recommendations, eliminating the need to repeatedly scan files.
Last updated 2026-02-04
18
MIT
PureContext MCP
Developer Tools Code Analysis
goranocokoljic
A
license
-
quality
B
maintenance
Indexes codebases and lets AI agents retrieve precise code snippets (functions, classes, routes) instead of reading entire files, reducing token usage and improving accuracy.
Last updated 2026-06-29
39
7
MIT
LightRAG Code Brain MCP
RAG Systems Coding Agents Knowledge & Memory
Filmystar
A
license
B
quality
C
maintenance
Provides a durable memory layer for coding agents like Claude Code and Codex by indexing codebases and enabling RAG queries, reducing rediscovery tokens and providing senior-engineer orientation.
Last updated 2026-06-18
23
MIT

View all related MCP servers

Related MCP Connectors

Umtri
Give your AI agent a persistent map of your project's structure, dependencies, and bugs.
ref-tools-ref-tools-mcp
Provide your AI coding tools with token-efficient access to up-to-date technical documentation for…
AI Context Flow
Universal memory for AI agents and tools. Save, organize and search context anywhere.

View all MCP Connectors

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/elara-labs/code-context-engine'

If you have feedback or need assistance with the MCP directory API, please join our Discord server