entroly-context-engine
This server is a local AI context control plane that optimizes, verifies, and manages context for AI coding agents — reducing token usage while improving answer quality and trustworthiness.
Context Optimization & Memory
Store, deduplicate, and score context fragments automatically
Select the optimal subset of fragments for a token budget using multi-dimensional scoring (recency, frequency, semantic similarity, entropy)
Semantic search over stored fragments with feedback boosting
Predict and pre-load context likely needed next
Checkpoint/resume for crash recovery
Context Receipts (Audit Trail)
Generate auditable receipts documenting what context was selected, omitted, and why
Recover exact omitted content with integrity verification
Render human-readable Markdown reports from receipts
Hallucination Detection & Verification
Verify AI responses against context using a 4-signal fusion cascade (WITNESS, EICV, BIPT, Spectral) — 100% local
Verify, annotate, or suppress hallucinated claims in LLM output
Check generated code identifiers are grounded in real context and suggest alternatives
Security Scanning
SAST analysis with 55+ rules (SQL injection, hardcoded secrets, XSS, path traversal, etc.)
Detect prompt injection, Unicode steganography, base64 payloads, and role-spoofing
Codebase health analysis: clone detection, dead code, god files, architecture violations, naming conventions
CogOps Knowledge Vault
Maintain a durable belief system about your codebase with confidence scoring
Auto-extract entities from source code and docs into searchable belief artifacts
Verify beliefs for staleness and contradictions; find coverage gaps
Export beliefs as JSONL for LLM fine-tuning
Change-Driven & Epistemic Routing
Route queries through 5 canonical flows: Fast Answer, Verify, Compile, Change-Driven, Self-Improvement
Process code diffs with intent classification, code review, and blast radius analysis
Sync workspace file changes into belief and verification layers continuously
Learning & Adaptation
Record outcomes (test results, CI verdicts, edit acceptance, command exit codes) as reinforcement signals
Improve future context relevance scoring via PRISM/RAVS learning
Create, benchmark, and promote reusable skills from repeated capability gaps
Multimodal Ingestion
Ingest diagrams (Mermaid, PlantUML, DOT), voice/meeting transcripts, and git diffs as structured context
Cost Intelligence
Track token savings, compression ratios, cache alignment, and estimated cost savings via a live dashboard
What it does
Entroly is an auditable context control plane for AI agents. It decides what context to send, records what it left out, and produces a receipt you can inspect before trusting a hard multi-file answer.
Most compression tools shrink whatever text the agent already chose. Entroly starts one step earlier: it chooses the highest-value evidence first, compresses only after selection, keeps originals recoverable, then verifies the answer against the evidence.
Receipts - every selection run can explain selected chunks, omitted nearby evidence, dependency links, fingerprints, token ratio, and residual risks.
Select - ranks your repo or document set, then sends the answer-relevant context under a token budget.
Verify - WITNESS checks the model's answer against the evidence it was given and flags unsupported claims. $0, ~3 ms, no extra API call.
Route - sends easy, repeated tasks to a cheaper model and keeps the flagship for hard ones (opt-in, fail-closed).
Cache-align - keeps the injected prefix byte-stable so provider prefix caches can keep hitting where terms and API shape allow it.
Learn - improves which files it picks for your workflow from local feedback. No embeddings API, no training job.
Use it however you work: wrap your agent, run it as a proxy, plug it in as an MCP server, or import the library.
Why teams care
What usually breaks AI coding at scale | What Entroly adds |
Context windows fill with logs, duplicate files, and irrelevant chunks | Budgeted selection that favors answer-critical files, dependency links, failures, and anomalies |
Token savings look good but quality silently drops | Accuracy-retention benchmarks, receipts, and WITNESS verification |
Agents lose the exact line, stack trace, or omitted file they later need | Reversible compressed fragments and retrieval handles |
First-time setup depends on one IDE or one provider | CLI, SDK, MCP, proxy, npm, PyPI, Docker, and local simulation paths |
Enterprise teams need proof, not screenshots | Committed JSON artifacts, local self-tests, and reproducible commands |
Product surface
Entroly ships as a full local runtime, not one proxy command:
Surface | What users get |
CLI |
|
SDK |
|
MCP server | Context optimization, exact retrieval, receipts, recovery, feedback, security scans, codebase health, smart reads, belief verification, response verification |
Proxy | Anthropic/OpenAI-compatible local optimization path for API-key users and custom apps |
Node/WASM |
|
Trust layer | WITNESS, EICV, STAVE, receipt proofs, provenance checks, prompt-injection scanning, and local verification reports |
Memory/session intelligence | Memory OS, Memory Fabric, long-term memory, session digests, checkpoint relevance, cache-retention forecasting, and lifetime value tracking |
Multimodal intake | Diff, diagram, voice, image, and structured-context ingestion with provider-aware image token estimates and compliance-gated optimization |
Gateway/accounting | Provider capability planning, failover policy, redaction receipts, usage ledger, cache routing, spend math, and budget harnesses |
Knowledge vault/CogOps | Belief compilation, vault search, workspace change sync, epistemic routing, verification engines, and flow orchestration |
Framework/event gateways | LangChain helpers, AgentSkills export, Hermes, Slack, Discord, and Telegram gateway hooks for teams that want operational feedback loops |
Self-improvement | PRISM/RAVS feedback, autotune, skill crystallization, promotion gates, evolution logging, and budget-gated skill synthesis |
Observability | Dashboard, daemon supervisor, control plane, health reports, value tracker, release-surface checks, and local JSON proof reports |
Under the hood, the Python control plane is backed by a Rust/WASM engine with BM25, entropy scoring, SimHash dedup, dependency graphs, knapsack/IOS selection, EGSC caching, PRISM learning, SAST, QCCR, EICV, witness checks, CogOps, cache economics, and memory primitives.
See the full code-derived map in docs/product-surface.md.
Related MCP server: Portable MCP Toolkit
How it works (30 seconds)
your agent ──► Entroly (local) ──► LLM provider
│
├─ rank the repo (BM25 + entropy + dep-graph)
├─ select under budget (knapsack, reversible)
├─ emit receipt (included, omitted, risks)
├─ cache-align prefix (keep provider cache hot)
└─ verify the reply (WITNESS hallucination guard)Critical files go in full. Supporting files become signatures. Everything else becomes a reference you can expand on demand — so the model gets a broader view of your codebase in a smaller prompt. Nothing is lost: every compressed fragment is fully retrievable.
Get started (60 seconds)
The best first run is local and proof-driven. It should work before you connect an API key, proxy, paid model, or enterprise setup.
pip install -U entroly # or: npm i -g entroly · brew install juyterman1000/entroly/entroly1. Prove the package works on your machine:
entroly verify-claims # SDK import, indexing, optimization, exact recovery, engine mode
entroly simulate # local no-LLM savings estimate on your current repo2. Pick one integration path:
You are using | Run this | Why |
Claude Code subscription |
| Adds Entroly tools without proxy/API-key assumptions |
Cursor, VS Code, Windsurf, or another MCP client |
| Local MCP tools for context, receipts, recovery, and feedback |
Pay-as-you-go API keys or a custom app |
| Transparent Anthropic/OpenAI-compatible optimization path |
Python app |
| Direct SDK control |
Node/npm workflow |
| WASM runtime without a Python-first setup |
CI or release gate |
| Enforce prompt budgets before merge |
3. Best setup for Claude Code subscription users:
claude mcp add entroly -- entrolyClaude Code stays your client. Entroly adds local tools for compression, retrieval, receipts, and savings reports.
4. One command — auto-detects your IDE, wraps your agent, opens the dashboard:
cd /your/repo && entroly go5. Or wrap a specific agent:
entroly wrap claude # Claude Code
entroly wrap cursor # Cursor
entroly wrap codex # Codex CLI
entroly wrap aider # Aider6. Or run the proxy — best for pay-as-you-go API keys and custom apps:
entroly proxy # http://localhost:9377
ANTHROPIC_BASE_URL=http://localhost:9377 your-app
OPENAI_BASE_URL=http://localhost:9377/v1 your-app7. Or measure it on your own repo first:
entroly demo # before/after token + cost estimate
entroly simulate # local no-LLM savings estimate
entroly perf # local no-LLM savings + optimizer latency
entroly verify-claims # runs the packaged self-test, writes a JSON reportLocal-first: your code is indexed and selected on-device, never sent anywhere for analysis. Apache-2.0. No outbound analytics by default.
First-run success contract
Entroly should feel useful before you connect a paid model key:
entroly verify-claimsproves SDK import, local indexing, optimization, exact recovery, and native/pure-Python engine mode.entroly simulateshows the likely token reduction on your repo without making an LLM call.MCP setup works for Claude Code subscription users who do not want proxy/API-key mode.
Proxy mode is available when you control the provider key and want transparent request optimization.
npm/WASM is available for Node-first users, but Python remains the fullest CLI/SDK path.
If your repo is tiny or already under budget, Entroly should say so and pass through rather than invent fake savings.
Context Receipts
Entroly gives every AI answer a context receipt: what was used, what was omitted, why, and what risks remain. This is built for hard multi-document work such as contracts, policies, addenda, code reviews, and audit evidence where "top-k chunks" is not enough.
entroly ingest ./docs
entroly select --query "Does this contract have a change-of-control clause?" --budget 8000
entroly receipt .entroly/receipts/cr_example.json
entroly explain --why-omitted chk_example --receipt .entroly/receipts/cr_example.jsonThe receipt JSON includes selected chunks, omitted relevant chunks, ranking reasons, dependency links, source fingerprints, token ratio, warnings, and a reproducibility hash. The Markdown report is designed for human review before a compressed context is trusted.
Implementation notes:
Rust core (
entroly-core/src/context_receipts.rs) handles deterministic ingestion, BM25-style ranking, dependency scans, selection, and hashes when the native wheel is available.Python control plane (
entroly/context_receipts/) provides CLI wiring and a pure-Python fallback for source checkouts.The semantic/vector scorer and reranker are explicit extension points; the local MVP ships with lexical scoring and dependency heuristics, not a legal-accuracy guarantee.
Examples:
Proof
Every number below is reproducible and backed by a committed JSON artifact you can audit — not a screenshot.
Token savings (this repo, entroly verify-claims, local, no API):
Budget | Token reduction |
8K | 99.1% |
32K | 96.7% |
average across workloads | 87.0% |
Accuracy retention — does compression hurt answers? Measured with gpt-4o-mini; intervals are Wilson 95% CIs. Each row links its raw result file.
Benchmark | n | Budget | Baseline | With Entroly | Retention | Token savings |
20 | 2K | 100% | 100% | 100% | 99.5% | |
50 | 2K | 64% | 66% | 103% | 85.3% | |
50 | 500 | 100% | 100% | 100% | 79.3% | |
50 | 100 | 80% | 72% | 90% | 43.8% | |
20 | 50K | 85% | 85% | 100% | pass-through* |
*pass-through: context already fit the budget, so Entroly left it unchanged. Reproduce: python benchmarks/run_readme_benchmarks.py (needs OPENAI_API_KEY). Full table + MMLU/TruthfulQA in DETAILS.
Hallucination guard — HaluEval-QA, standard protocol, GPT-judge baseline on identical data:
System | Accuracy | AUROC | Cost / latency |
WITNESS + STAVE (default) | 85.8% | 0.844 | $0, ~3 ms/decision |
gpt-4o-mini (grounded judge) | 86.3% | — | LLM call |
gpt-3.5-turbo (HaluEval paper) | 62.6% | — | LLM call |
$0, zero-network verifier that statistically ties a strong LLM judge. Reproduce: python benchmarks/halueval_qa_faithful.py. Proof JSON.
Works with your stack
entroly wrap <agent> picks the best integration for each tool — proxy env-wrap for CLIs, auto-merged mcp.json for MCP-aware IDEs, or a copy-paste endpoint hint.
Wrap in one command: claude · cursor · codex · aider · gemini · windsurf · vscode · zed · cline · continue and 28 more.
Type | Agents |
CLI (env-wrap + exec) | Claude Code, Codex CLI, Aider, Gemini CLI, Qwen Code, OpenCode, Charm CRUSH, Hermes, Pi, Ollama |
MCP IDEs (auto-merge | Cursor, Windsurf, VS Code, Claude Desktop, Claude Code (MCP), Zed |
Copy-paste endpoint | Cline, Roo Code, Continue, Cody, Amp, Kiro, Qoder, Trae, Antigravity, Amazon Q, Verdent, JetBrains AI, Helix, Tabby, Twinny, Sublime, Emacs, Neovim, Fitten, Tabnine, Supermaven |
Any tool that supports a custom OPENAI_BASE_URL / ANTHROPIC_BASE_URL works via the proxy. Run entroly wrap (no agent) for the full grouped list. Use wrappers only with tools whose terms permit local proxies / custom endpoints.
As a library (LangChain, LlamaIndex, your own code):
from entroly import compress, compress_messages, optimize
compressed = compress(api_response, budget=2000) # query-agnostic
messages = compress_messages(messages, budget=30000) # whole conversation
context = optimize(fragments, budget=8000, query="fix the login bug") # task-conditionedIn CI — fail the build if a prompt blows the token budget:
- run: pip install entroly && entroly batch --budget 8000 --fail-over-budgetWhen to use it · when to skip
Great fit
Large repos where the agent only sees a few files at a time
Chatty, multi-turn agents (cache alignment compounds the savings)
Anywhere you want answers checked against evidence before you trust them
Teams trying to cut a real, growing AI bill
Skip it (it'll just pass through)
Tiny repos or short prompts that already fit the budget
Judgment-heavy tasks where you want the full flagship model every time
What's inside
Most people install Entroly for input-token compression. It actually ships 19 local cost-saving mechanisms across input, inference, output, verification, and learning — each one readable in the source with a committed benchmark where applicable.
# | Lever | Win | Source |
1 | Context compression (knapsack + 9 compressors + dep-graph) | 39–99% input tokens |
|
2 | WITNESS + STAVE hallucination gateway | AUROC 0.844, $0 |
|
3 | Cache Aligner | up to 90% off cached calls |
|
4 | Escalation cascade (conformally calibrated) | avoids most flagship calls |
|
5 | Conformal cascade | proven cost/coverage tradeoff |
|
6 | RAVS Bayesian router | routes easy tasks to cheaper models |
|
7 | Fast-path crystallized skills | 100% LLM cost saved on cache hits |
|
8 | Adaptive compression budget | right-sizes budget per query |
|
9 | Entropic conversation pruning | flattens history-growth cost |
|
10 | Shell-output compression | 60–95% on tool output |
|
11 | Response distillation | fewer output tokens billed |
|
12 | Local DeBERTa NLI (opt-in) | $0 offline NLI |
|
13 | EICV suppressor | stops bad info propagating |
|
14 | PRISM 5D adaptive weights | quality improves with use |
|
15 | Federation (opt-in) | amortized cold-start |
|
16 | Entropic Shell Codec | universal tool-output fallback |
|
17 | Semantic Resolution Protocol | 40–70% fewer tokens on file reads |
|
18 | Adversarial Context Firewall | blocks prompt-injection / poisoning |
|
19 | Witness-Verified Handoff | filters hallucinations between agents |
|
Most levers are multiplicative: input compression × cache alignment × cheaper-model routing × output distillation can leave well under 1% of the original input-token spend on the bill. Per-lever contribution shows up in the dashboard's Cost Intelligence panel. Full math and proofs in docs/DETAILS.md.
Python is the reference runtime; the Rust core (via PyO3) does the heavy compute at 50–100× Python speed, and the same engine ships to Node via WASM.
pip install entroly # core: MCP server + Python engine
pip install entroly[proxy] # + HTTP proxy
pip install entroly[native] # + Rust engine
pip install entroly[full] # everything
npm install -g entroly # WASM runtime, no Python needed
docker pull ghcr.io/juyterman1000/entroly:latestSingle binary, no Python — a standalone Rust proxy that auto-detects Anthropic/OpenAI/Gemini and stays cache-aligned:
cd entroly/entroly-core && cargo build --release --bin entroly-rs --features proxy
./target/release/entroly-rs proxy --upstream https://api.anthropic.comWITNESS — check answers before you trust them
entroly witness --context-file evidence.txt --output-file answer.txt --mode strict
entroly proxy --witness strict --witness-profile rag # suppress unsupported claims inlineProfiles tune false-positive behavior per workload (rag, qa, code fail closed; chat, summary warn). Every non-streaming response gets a proof certificate; the dashboard shows flagged claims, evidence snippets, and suppression counts. Optional offline DeBERTa NLI (ENTROLY_LOCAL_NLI=1) raises accuracy further at $0.
Why Entroly is different
The winning product is not the one that makes the prompt smallest. It is the one that helps the model do the best work for the fewest tokens.
Entroly is built around that trust contract: select the right evidence, compress supporting material, keep originals recoverable, emit a receipt, and verify the answer against the retained evidence.
Layer | Entroly answer |
Context engine | BM25 + entropy + dependency graph + knapsack/IOS selection under budget |
Compression/recovery | Evidence-Locked Compression, exact CCR handles, omitted-span retrieval store |
Trust | Context Receipts, WITNESS, EICV, STAVE, provenance, receipt proofs |
Gateway | Provider adapters, cache-aware routing, usage ledger, cost cortex, harness budgets |
Memory/session | Memory OS, Memory Fabric, long-term memory, checkpoint relevance, session digests, value tracking |
Multimodal | Diff, diagram, voice, image, and structured-context ingestion with provider-aware token estimates |
CogOps/vault | Belief compiler, vault search, epistemic router, flow orchestrator, verification engine, workspace change sync |
Learning | Feedback, PRISM/RAVS, archetype adaptation, cache and routing signals |
Self-improvement | Autotune, dreaming loops, reward crystallization, skill synthesis, promotion gates, rollback, optional federation |
Security | SAST, prompt-injection scanning, redaction policy, path containment |
Observability | Dashboard, daemon, control plane, health reports, usage accounting, local proof JSON |
Runtime | Python SDK/CLI/MCP plus Rust native engine and Node/WASM runtime |
The goal is same-quality or better model work at materially lower token cost.
Self-improving local runtime
Entroly has a guarded self-improvement loop. It is designed to learn from real outcomes without letting adaptation run wild.
Loop | What it does |
Feedback |
|
PRISM/RAVS | Online Bayesian weights and honest-outcome correction move selection toward what actually passes tests, CI, and user acceptance |
Autotune/dreaming | Idle/offline loops test weight perturbations against benchmark cases before promotion |
Reward crystallization | Repeated high-reward query families become reusable skills with statistical lower-bound checks |
Skill synthesis | Structural synthesis tries local, deterministic skill generation before any LLM fallback |
Promotion gate | Shadow policies must be non-inferior before promotion; rollback triggers on repair/retry/success regression |
Budget guardrail | Evolution is intended to stay token-negative by spending only a bounded fraction of measured lifetime savings |
Optional federation | Weight contributions can be shared only when explicitly enabled |
This is the important distinction: Entroly does not just remember context. It can learn which context-selection strategies, routes, and skills actually produce successful work.
Compared to
Entroly | Compression tools | Top-K / RAG | Raw truncation | |
Approach | Rank → select → compress | Compress whatever's given | Embedding retrieval | Cut off |
Token savings | 70–95% (large repos) | 50–70% | 30–50% | 0% |
Quality loss | None measured | 2–5% | Variable | High |
Needs embeddings API | No | Varies | Yes | No |
Reversible | Yes | Varies | Yes | No |
Learns over time | Yes (PRISM) | No | No | No |
Verifies the answer | Yes (WITNESS) | No | No | No |
Compressing a bad selection is still a bad selection. Entroly ranks first, then compresses — so the model gets structure, not just fewer tokens.
Docs & community
Command | What it does |
| One shot: detect IDE, wrap your agent, open the dashboard |
| Wrap a specific coding agent (38 supported) |
| Start the HTTP proxy on |
| Start the MCP server |
| Supervise proxy + dashboard + MCP + file watcher |
| Open the live metrics dashboard |
| Before/after token + cost estimate on your repo |
| Ingest documents into a local Context Receipt index |
| Select context under budget and write a Context Receipt |
| Render a Context Receipt as a Markdown report |
| Explain why a chunk was selected or omitted |
| Local no-LLM savings estimate with an explicit baseline |
| Local no-LLM savings and optimizer latency |
| Local comparison: Entroly vs raw context vs top-K |
| Codebase health grade (A–F) |
| Persistent cross-session cache stats |
| Model-routing cost-savings report |
| Check an answer against supplied evidence |
| Run the packaged self-test → JSON report |
Architecture & full spec — Rust modules, 3-resolution compression, provenance, RAG comparison, SDK, LangChain.
Product surface map — CLI, SDK, MCP, proxy, npm/WASM, verification, memory, and security surfaces.
First-run trust guide — exactly what a new user should run before wiring a paid model.
For teams — ROI, security, deployment one-pager.
Limitations — where Entroly helps, where it passes through, and what it does not guarantee.
Cookbook — copy-paste recipes for common workflows.
Maintenance
Latest Blog Posts
- Your AI Chatbot Just Exposed Your CEO's Salary to an InternBy Om-Shree-0709 on .Agent IdentityMCP SecurityOAuth Delegation
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/juyterman1000/entroly'
If you have feedback or need assistance with the MCP directory API, please join our Discord server