The MCP Memory Gateway is a context engineering server that captures agent feedback, enforces pre-action gates to block known mistakes, and injects relevant past context into AI coding agent sessions for improved reliability and continuity.
Feedback & Memory
Capture feedback (
capture_feedback,capture_memory_feedback): Record up/down signals with context, reasoning, and rubric scores; vague feedback is rejected with a clarification promptRecall past context (
recall,commerce_recall): Vector-search relevant past feedback, memories, and prevention rules for the current taskView summaries & analytics (
feedback_summary,feedback_stats,dashboard): Approval rate trends, gate enforcement stats, and prevention impact overviewsGenerate prevention rules (
prevention_rules,get_reliability_rules): Auto-generate blocking rules from repeated failure patterns
Pre-Action Gates & Safety
Satisfy gates (
satisfy_gate): Record evidence that a gate condition is met (e.g., PR threads checked) with a 5-minute TTLGate statistics (
gate_stats): See blocked/warned counts and top triggered gates
Session Continuity
Session handoff (
session_handoff): Write a primer capturing git state, last task, next step, and blockersSession primer (
session_primer): Restore context at session start from the most recent handoff
Workflow Planning & Diagnosis
List & plan intents (
list_intents,plan_intent): View available workflows and generate checkpointed execution plans with policy gatesDiagnose failures (
diagnose_failure): Root-cause analysis for failed or suspect workflow stepsBootstrap agents (
bootstrap_internal_agent): Normalize GitHub/Slack/Linear triggers into startup context with recall packs and worktree sandboxesDelegation handoffs (
start_handoff,complete_handoff): Manage sequential agent delegation with verification outcomes
Context Engineering
Context packs (
construct_context_pack,evaluate_context_pack): Build and evaluate bounded context packs for large projects, closing the retrieval learning loopContext provenance (
context_provenance): Audit trail of recent context and retrieval decisionsEstimate uncertainty (
estimate_uncertainty): Bayesian uncertainty estimates for risky tags before acting
Business Metrics
Business metrics (
get_business_metrics): Retrieve Revenue, Conversion, and Customer metrics from the Semantic LayerSemantic entity descriptions (
describe_semantic_entity,describe_reliability_entity): Canonical definitions and state of Customer, Revenue, or Funnel entities
Export & Fine-Tuning
Export DPO pairs (
export_dpo_pairs): Build preference pairs from promoted memories for model fine-tuningExport Databricks bundle (
export_databricks_bundle): Export RLHF logs and proof artifacts as a Databricks-ready analytics bundleGenerate skills (
generate_skill): Auto-generate Claude skill files (SKILL.md) from clustered failure patterns
MCP Memory Gateway
Pre-action gates that physically block AI coding agents from repeating known mistakes. Dual-memory recall (MemAlign-inspired principles + episodic context). Captures feedback, auto-promotes failures into prevention rules, and enforces them via PreToolUse hooks. Works with Claude Code, Codex, Gemini, Amp, Cursor.
Honest disclaimer: This is a context injection system, not RLHF. LLM weights are not updated by thumbs-up/down signals. What actually happens: feedback is validated, promoted to searchable memory, and recalled at session start so agents have project history they'd otherwise lose. That's genuinely valuable ā but it's context engineering, not reinforcement learning.
Works with any MCP-compatible agent: Claude, Codex, Gemini, Amp, Cursor, OpenCode.
Verification evidence for shipped features lives in docs/VERIFICATION_EVIDENCE.md.
Repo-local operator guides:
MCP Memory Gateway keeps one sharp agent on task. Continuity tools help you resume work. The resumed session stays sharper with recall, reliability rules, pre-action gates, session handoff primers, and verification layered on top of that continuity workflow without another planner or swarm.
Claude Workflow Hardening
If you are selling or deploying Claude-first delivery, the cleanest commercial wedge is not "AI employee" hype. It is a Workflow Hardening Sprint for one workflow with enough memory, gates, and proof to ship safely.
Use that motion when a buyer already has:
one workflow owner
one repeated failure pattern or rollout blocker
one buyer who needs proof before broader rollout
That maps cleanly to three offers:
Workflow Hardening Sprint for one production workflow with business value
code modernization guardrails for long-running migration and refactor sessions
hosted Pro at
$49 one-timewhen the team only needs synced memory, gates, and usage analytics
Use these assets in sales and partner conversations:
Claude Desktop Extensions
This repo already ships a Claude Desktop extension lane:
Claude metadata:
.claude-plugin/plugin.jsonClaude marketplace metadata:
.claude-plugin/marketplace.jsonClaude extension install and support guide:
.claude-plugin/README.mdClaude Desktop bundle builder:
npm run build:claude-mcpbClaude Desktop bundle launcher:
.claude-plugin/bundle/server/index.jsClaude Desktop bundle icon:
.claude-plugin/bundle/icon.pngInternal submission packet: docs/CLAUDE_DESKTOP_EXTENSION.md
Install locally today with:
claude mcp add rlhf -- npx -y mcp-memory-gateway serveBuild a submission-ready .mcpb locally with:
npm run build:claude-mcpbTreat Anthropic directory inclusion as a discoverability and trust lane, not as revenue proof or partner proof.
For paired phone + desktop workflows, keep Dispatch in a constrained remote-ops lane:
RLHF_MCP_PROFILE=dispatch claude mcp add rlhf -- npx -y mcp-memory-gateway serve
npx mcp-memory-gateway dispatchThat profile stays read-only: metrics, gates, diagnostics, planning, and recall. Use a dedicated worktree plus RLHF_MCP_PROFILE=default when the task graduates into code edits or memory writes. Guide: docs/guides/dispatch-ops.md.
Cursor Marketplace
This repo now ships a submission-ready Cursor plugin bundle:
Root marketplace manifest:
.cursor-plugin/marketplace.jsonPlugin directory:
plugins/cursor-marketplace/Plugin MCP config:
plugins/cursor-marketplace/.mcp.json
Use MCP Memory Gateway as the display name in Cursor Marketplace and Cursor Directory forms. Keep mcp-memory-gateway as the plugin slug and npm package name.
That package keeps the Cursor review surface intentionally small: one MCP server bundle that leads with Pre-Action Gates and keeps runtime enforcement close to the agent loop. The runtime launcher now targets mcp-memory-gateway@latest, so npm releases can flow into the plugin runtime without editing the config. Marketplace metadata, screenshots, and directory copy still require an explicit plugin refresh. Until the public listing is approved, Cursor users can still install locally with npx mcp-memory-gateway init.
Operational guidance for Cursor releases and promotion lives in docs/CURSOR_PLUGIN_OPERATIONS.md.
Visual Demo: Experience the Magic
Stop imagining and see the MCP Memory Gateway in action. This is the difference between an agent that repeats mistakes and one that actually improves.
1. The "Repeat Mistake" Cycle (Without Gateway)
Agent: I'll fix the bug and push directly to main.
User: No, you forgot to check the PR review thread again!
Agent: Sorry, I'll remember next time. (It won't).2. The "Agentic Memory" Cycle (With Gateway)
Watch how the Pre-Action Gates and Reasoning Traces physically block the failure:
User: Fix the bug and push.
Agent: I'll apply the fix... [Applying Edit]
Agent: Now I'll push to main... [Executing: git push]
š GATE BLOCKED: push-without-thread-check
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Reason : Rule promoted from 3+ previous failures.
Condition : No 'gh pr view' or thread check detected in current session.
Action : Blocked. Please check review threads first.
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Agent: My apologies. I see that I am blocked because I haven't checked
the PR threads. I'll do that now... [Executing: gh pr view]
Success! Agent finds a blocker in the thread, fixes it, and then pushes.3. Deep Troubleshooting with Reasoning Traces
Every captured signal now includes a Reasoning Trace, making "black-box" failures transparent:
# Capture feedback with the new --reasoning flag
npx mcp-memory-gateway capture --feedback=down \
--context="Agent skipped unit tests" \
--reasoning="The agent assumed the change was too small to break anything, but it regressed the auth flow." \
--tags="testing,regression"Now, when the agent starts its next session, it doesn't just see "Don't skip tests." It sees the logic that led to the failure, preventing the same cognitive trap.
Capture ā
capture_feedbackMCP tool accepts signals with structured context (vague "thumbs down" is rejected)Validate ā Rubric engine gates promotion ā requires specific failure descriptions, not vibes
Screen ā Memory-ingress firewall blocks secret-bearing or hostile feedback before any JSONL write (local scanner by default, ShieldCortex when installed)
Remember ā Promoted memories stored in local JSONL and kept searchable through the MCP layer
Distill ā Principle extraction distills NL feedback into reusable semantic principles (MemAlign-inspired)
Reject ā Vague or invalid signals are logged to the Rejection Ledger (
rejection-ledger.jsonl) with the reason and a revival condition so you know exactly how to re-submitPrevent ā Repeated failures auto-generate prevention rules (the actual value ā agents follow these when loaded)
Gate ā Pre-action blocking via PreToolUse hooks ā physically prevents known mistakes before they happen
Recall ā
recalltool injects relevant past context into current session (this is the mechanism that works)Matrix ā
enforcement_matrixtool exposes the full pipeline state: feedback counts, promotion rate, active gates, and top rejection reasonsSession Handoff ā
session_handoffcaptures git state, last task, next step, and blockers;session_primerrestores it at next session startExport ā DPO/KTO pairs for optional downstream fine-tuning (separate from runtime behavior)
Bridge ā JSONL file watcher auto-ingests signals from external sources (Amp plugins, hooks, scripts)
Optional ingress hardening:
RLHF_MEMORY_FIREWALL_PROVIDER=autoprefers ShieldCortex when the optional package is installed, then falls back to the local secret scanner.RLHF_MEMORY_FIREWALL_PROVIDER=shieldcortexforces the ShieldCortex path and degrades to the local scanner only if the package is unavailable.RLHF_MEMORY_FIREWALL_MODE=strict|balanced|permissivecontrols the ShieldCortex defence mode.
What Works vs. What Doesn't
ā Actually works | ā Does not work |
| Thumbs up/down changing agent behavior mid-session |
| LLM weight updates from feedback signals |
| Agents magically knowing what happened last session |
Prevention rules ā followed when loaded at session start | Feedback stats improving agent performance automatically |
Pre-action gates ā physically block known mistakes | "Learning curve" implying the agent itself learns |
Auto-promotion ā 3+ failures become blocking rules | Agents self-correcting without context injection |
Rejection Ledger ā tracks why feedback was rejected + how to fix it | Vague signals silently disappearing |
Enforcement Matrix ā one-call view of pipeline, gates, and rejections | Guessing whether the system is actually enforcing |
Quick Start
# Recommended: essential profile (5 high-ROI tools)
claude mcp add rlhf -- npx -y mcp-memory-gateway serve
codex mcp add rlhf -- npx -y mcp-memory-gateway serve
amp mcp add rlhf -- npx -y mcp-memory-gateway serve
gemini mcp add rlhf "npx -y mcp-memory-gateway serve"
# Or auto-detect all installed platforms
npx mcp-memory-gateway init
# Auto-wire PreToolUse hooks (blocks known mistakes before they happen)
npx mcp-memory-gateway init --agent claude-code
npx mcp-memory-gateway init --agent codex
npx mcp-memory-gateway init --agent gemini
# Audit readiness before a long-running workflow
npx mcp-memory-gateway doctorProfiles: Set
RLHF_MCP_PROFILE=essentialfor the lean 6-tool setup,RLHF_MCP_PROFILE=dispatchfor phone-safe remote ops, or leave unset for the full policy + observability surface. See MCP Tools for details.
Pair It With Continuity Tools
Project continuity and agent reliability are complementary, not interchangeable.
Use your editor, assistant, or resume workflow to regain context quickly.
Use MCP Memory Gateway as the reliability layer for recall, gates, and proof.
If an external tool can append structured JSONL entries with a source field, the built-in watcher can ingest them through the normal feedback pipeline:
{"source":"editor-brief","signal":"down","context":"Agent resumed without reading the migration notes","whatWentWrong":"Skipped the resume brief and edited the wrong table","whatToChange":"Read the project brief before schema changes","tags":["continuity","resume","database"]}npx mcp-memory-gateway watch --source editor-briefThat routes the event through validation, memory promotion, vector indexing, and export eligibility without adding a second integration stack.
Guide: docs/guides/continuity-tools-integration.md
Pre-Action Gates
Gates are the enforcement layer. They physically block tool calls that match known failure patterns ā no agent cooperation required.
Agent tries git push ā PreToolUse hook fires ā gates-engine checks rules ā BLOCKED (no PR thread check)How it works
init --agent claude-codeauto-wires a PreToolUse hook into your agent settingsThe hook pipes every Bash command through
gates-engine.jsGates match tool calls against regex patterns and block/warn
Auto-promotion: 3+ same-tag failures ā auto-creates a
warngate. 5+ ā upgrades toblock.
Built-in gates
Gate | Action | What it blocks |
| block |
|
| block |
|
| block |
|
| block | Direct push to develop/main/master |
| warn | Editing |
Custom gates
Define your own in config/gates/custom.json:
{
"version": 1,
"gates": [
{
"id": "no-npm-audit-fix",
"pattern": "npm audit fix --force",
"action": "block",
"message": "npm audit fix --force can break dependencies. Review manually."
}
]
}Gate satisfaction
Some gates have unless conditions. To satisfy a gate before pushing:
# Via MCP tool
satisfy_gate(gateId: "push-without-thread-check", evidence: "0/42 unresolved")
# Via CLI
node scripts/gate-satisfy.js --gate push-without-thread-check --evidence "0 unresolved"Evidence expires after 5 minutes (configurable TTL).
Dashboard
npx mcp-memory-gateway dashboardš RLHF Dashboard
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Approval Rate : 26% ā 45% (7-day trend ā)
Total Signals : 190 (15 positive, 43 negative)
š”ļø Gate Enforcement
Active Gates : 7 (4 manual, 3 auto-promoted)
Actions Blocked : 12 this week
Actions Warned : 8 this week
Top Blocked : push-without-thread-check (5Ć)
ā” Prevention Impact
Estimated Saves : 3.2 hours
Rules Active : 5 prevention rules
Last Promotion : pr-review (2 days ago)MCP Tools
Essential (high-ROI ā start here)
These 9 tools deliver the fastest path to feedback, recall, lesson search, and prevention. Use the essential profile for a lean setup:
RLHF_MCP_PROFILE=essential claude mcp add rlhf -- npx -y mcp-memory-gateway serveTool | Description |
| Accept up/down signal + context, validate, promote to memory |
| Vector-search past feedback and prevention rules for current task |
| Search promoted lessons and inspect the corrective action, prevention rules, and gates linked to each result |
| Search raw feedback logs, ContextFS memory, and prevention rules across local RLHF state |
| Generate prevention rules from repeated mistakes |
| Full pipeline state: feedback counts, promotion rate, active gates, rejection ledger |
| Approval rate, per-skill/tag breakdown, trend analysis |
| Human-readable recent feedback summary |
| Bayesian uncertainty estimate for risky tags before acting |
Free and self-hosted users can invoke search_lessons directly through MCP to search their RLHF memory and see what corrective action the system took in response to each lesson.
For broader retrieval across local feedback logs, ContextFS memory, and prevention rules, free and self-hosted users can also invoke search_rlhf through MCP or the authenticated GET /v1/search API surface.
Dispatch (remote ops, phone-safe)
Use the dispatch profile when Claude Dispatch or another remote desktop lane needs live business metrics, failure diagnosis, and sprint planning without code or memory mutations:
RLHF_MCP_PROFILE=dispatch claude mcp add rlhf -- npx -y mcp-memory-gateway serveTool | Description | When you need it |
| Recall relevant past failures and prevention rules | Remote planning before a desk session |
| Summarize recent feedback and operator notes | Quick remote review |
| Search raw RLHF state across feedback, ContextFS, and prevention rules | Cross-check local lessons before acting |
| Approval trend and failure-domain summary | Health checks from the phone |
| Root-cause report for blocked or failed runs | Incident triage away from the desk |
| Available workflow plans and approval requirements | Choose the next workflow safely |
| Generate a checkpointed plan without executing it | Prepare the next worktree session |
| Inspect recent context-pack and evidence decisions | Retrieval debugging |
| Gate enforcement statistics | Review what Pre-Action Gates are catching |
| Full RLHF dashboard | One-command system snapshot |
| Revenue, conversion, and customer metrics | Remote commercial readout |
| Explain Customer, Revenue, or Funnel state | Metrics interpretation |
| Read active prevention rules and success patterns | Review the current rule set |
| Alias for semantic entity definitions | Compatibility surface |
Full pipeline (advanced)
These highlighted tools support the broader local-first builder workflow. Use the default profile to enable the complete policy, context, and observability surface:
Tool | Description | When you need it |
| Build DPO preference pairs from promoted memories | Fine-tuning a model on your feedback |
| Export RLHF logs and proof artifacts as a Databricks-ready analytics bundle | Warehousing local feedback, attribution, and proof data for Databricks / Genie Code analysis |
| Bounded context pack from contextfs | Custom retrieval for large projects |
| Record context pack outcome (closes learning loop) | Measuring retrieval quality |
| Available action plan templates | Policy-gated workflows |
| Generate execution plan with policy checkpoints | Policy-gated workflows |
| Audit trail of context decisions | Debugging retrieval decisions |
| Record evidence that a gate condition is met | Unblocking gated actions (e.g., PR thread check) |
| Gate enforcement statistics (blocked/warned counts) | Monitoring gate effectiveness |
| Full RLHF dashboard (approval rate, gates, prevention) | Overview of system health |
| Compile workflow, gate, approval, and MCP-tool constraints into a root-cause report | Systematic debugging for failed or suspect agent runs |
| Write session primer with git state, last task, next step, blockers | Seamless context continuity across sessions |
| Read the most recent session handoff primer | Restoring context at session start |
CLI
npx mcp-memory-gateway init # Scaffold .rlhf/ + configure MCP
npx mcp-memory-gateway init --agent X # + auto-wire PreToolUse hooks (claude-code/codex/gemini)
npx mcp-memory-gateway init --wire-hooks # Wire hooks only (auto-detect agent)
npx mcp-memory-gateway serve # Start MCP server (stdio) + watcher
npx mcp-memory-gateway doctor # Audit runtime isolation, bootstrap context, and MCP permission tier
npx mcp-memory-gateway dispatch # Dispatch-safe remote ops brief
npx mcp-memory-gateway dashboard # Full RLHF dashboard with gate stats
npx mcp-memory-gateway north-star # North Star progress: proof-backed workflow runs
npx mcp-memory-gateway gate-stats # Gate enforcement statistics
npx mcp-memory-gateway status # Learning curve dashboard
npx mcp-memory-gateway watch # Watch .rlhf/ for external signals
npx mcp-memory-gateway capture # Capture feedback via CLI
npx mcp-memory-gateway lessons # Search lessons + linked corrective actions
npx mcp-memory-gateway stats # Analytics + Revenue-at-Risk
npx mcp-memory-gateway rules # Generate prevention rules
npx mcp-memory-gateway export-dpo # Export DPO training pairs
npx mcp-memory-gateway export-databricks # Export Databricks-ready analytics bundle
npx mcp-memory-gateway risk # Train/query boosted risk scorer
npx mcp-memory-gateway self-heal # Run self-healing diagnosticsHosted growth tracking
The landing page ships first-party telemetry plus optional GA4 and Google Search Console hooks.
export RLHF_PUBLIC_APP_ORIGIN='https://rlhf-feedback-loop-production.up.railway.app'
export RLHF_BILLING_API_BASE_URL='https://rlhf-feedback-loop-production.up.railway.app'
export RLHF_FEEDBACK_DIR='/data/feedback'
export RLHF_GA_MEASUREMENT_ID='G-XXXXXXXXXX' # optional
export RLHF_GOOGLE_SITE_VERIFICATION='token-value' # optionalPlausible stays on by default for lightweight page analytics.
GA4 is only injected when
RLHF_GA_MEASUREMENT_IDis set.Search Console verification meta is only injected when
RLHF_GOOGLE_SITE_VERIFICATIONis set.Hosted deployments should set
RLHF_FEEDBACK_DIR=/data/feedback(or another durable path) so telemetry, billing ledgers, and proof-backed workflow-run evidence survive restarts.npx mcp-memory-gateway dashboardnow shows whether traffic, SEO, funnel, and revenue instrumentation are actually configured and receiving events.
JSONL File Watcher
The serve command automatically starts a background watcher that monitors feedback-log.jsonl for entries written by external sources (Amp plugins, shell hooks, CI scripts). These entries are routed through the full captureFeedback() pipeline ā validation, memory promotion, vector indexing, and DPO eligibility.
# Standalone watcher
npx mcp-memory-gateway watch --source amp-plugin-bridge
# Process pending entries once and exit
npx mcp-memory-gateway watch --onceExternal sources write entries with a source field:
{"signal":"positive","context":"Agent fixed bug on first try","source":"amp-plugin-bridge","tags":["amp-ui-bridge"]}The watcher tracks its position via .rlhf/.watcher-offset for crash-safe, idempotent processing.
Architecture
Value tiers
Tier | Components | Impact |
Core (use now) |
| Captures mistakes, prevents repeats, constrains behavior |
Gates (use now) | Pre-action gates + auto-promotion + | Physically blocks known mistakes before they happen |
Analytics (use now) |
| Measures whether the agent is actually improving |
Fine-tuning (future) | DPO/KTO export, Thompson Sampling, context packs | Infrastructure for model fine-tuning ā valuable when you have a training pipeline |
~30% of the codebase delivers ~80% of the runtime value. The rest is forward-looking infrastructure for teams that export training data.
Pipeline
Seven-phase pipeline: Capture ā Validate ā Remember ā Distill ā Prevent ā Gate ā Export


Agent (Claude/Codex/Amp/Gemini)
ā
āāā MCP tool call āāā captureFeedback()
āāā REST API āāāāāāāāā captureFeedback()
āāā CLI āāāāāāāāāāāāāā captureFeedback()
āāā External write āāā JSONL āāā Watcher āāā captureFeedback()
ā
ā¼
āāāāāāāāāāāāāāāāāāā
ā Full Pipeline ā
ā ⢠Schema valid ā
ā ⢠Rubric gate ā
ā ⢠Memory promo ā
ā ⢠Vector index ā
ā ⢠Risk scoring ā
ā ⢠RLAIF audit ā
ā ⢠DPO eligible ā
āāāāāāāāāāāāāāāāāāāAgent Runner Contract
WORKFLOW.md: scope, proof-of-work, hard stops, and done criteria for isolated agent runs
.github/ISSUE_TEMPLATE/ready-for-agent.yml: bounded intake template for "Ready for Agent" tickets
.github/pull_request_template.md: proof-first handoff format for PRs
š Pro Pack ā Production Context Engineering Configs
Curated configuration pack for teams that want a faster production setup without inventing their own guardrails from scratch.
What You Get | Description |
Prevention Rules | 10 curated rules covering PR workflow, git hygiene, tool misuse, memory management |
Thompson Sampling Presets | 4 pre-tuned profiles: Conservative, Exploratory, Balanced, Strict |
Extended Constraints | 10 RLAIF self-audit constraints (vs 6 in free tier) |
Hook Templates | Ready-to-install Stop, UserPromptSubmit, PostToolUse hooks |
Reminder Templates | 8 production reminder templates with priority levels |
Current pricing and traction policy: Commercial Truth
Support the Project
If MCP Memory Gateway saves you time, consider supporting development:
ā Star the repo
ā¤ļø Sponsor on GitHub
ā Buy Me a Coffee
License
MIT. See LICENSE.