MCP Gateway

ARCHITECTURE.md•19 KiB

# Architecture `mcp-gateway` v2.4.0 -- Rust, 807 tests, ~178 tools across backends. ## System Overview The gateway sits between LLM clients (Claude Code, Cursor, etc.) and multiple MCP tool servers. Instead of loading all ~178 tool definitions into context (thousands of tokens), clients connect to the gateway and use a small set of **meta-tools** to discover and invoke tools on demand. This yields ~95% context token savings. ``` +-----------------------+ LLM Client ──POST /mcp──▶ mcp-gateway │ (Claude Code) │ │ │ MetaMcp │ │ ├─ SearchRanker │ │ ├─ KillSwitch │ │ ├─ IdempotencyCache │ │ ├─ ResponseCache │ │ └─ TransitionTracker│ │ │ │ BackendRegistry │ │ ├─ stdio backends │──▶ MCP servers (processes) │ ├─ HTTP backends │──▶ MCP servers (remote) │ └─ CapabilityBackend│──▶ REST APIs (direct) +-----------------------+ ``` The gateway exposes a single HTTP endpoint (`/mcp`) that speaks MCP JSON-RPC. Clients discover backends via `gateway_search_tools`, then invoke them via `gateway_invoke`. The gateway handles routing, caching, failsafes, and lifecycle management internally. ## Meta-Tools The gateway advertises up to 9 meta-tools to connecting clients. The base set of 4 is always present; the rest are conditional on configuration. | Tool | Always | Purpose | |------|--------|---------| | `gateway_list_servers` | yes | List all registered backends with status and circuit state | | `gateway_list_tools` | yes | List tools from one backend (or all). Uses cached tool lists | | `gateway_search_tools` | yes | Keyword search across all backends with ranked results | | `gateway_invoke` | yes | Call any tool on any backend. Handles caching, idempotency, kill switch | | `gateway_get_stats` | if stats enabled | Usage stats: invocations, cache hits, token savings, top tools, cost | | `gateway_webhook_status` | if webhooks enabled | List webhook endpoints and delivery stats | | `gateway_run_playbook` | yes | Execute a multi-step playbook as a single call | | `gateway_kill_server` | yes | Operator kill switch: immediately disable routing to a backend | | `gateway_revive_server` | yes | Re-enable a killed backend and reset its error budget | Defined in `src/gateway/meta_mcp_helpers.rs`, function `build_meta_tools()` (line 347). ## Tool Discovery Resolution Order When `gateway_search_tools` receives a query, the full pipeline is: ```mermaid flowchart TD A["Client query: 'batch research'"] --> B["Lowercase + split whitespace"] B --> C["For each backend: collect all tools"] C --> D["tool_matches_query() per tool"] D --> E["Match: name substring OR description substring"] E --> F["Synonym expansion (expand_synonyms)"] F --> G["Collect ALL matches across ALL backends"] G --> H["SearchRanker::rank()"] H --> I["score_text_relevance() per match"] I --> J["Usage boost: score * (1 + log2(uses+1) * 0.15)"] J --> K["Sort descending by final score"] K --> L["Truncate to limit (default 10)"] L --> M["annotate_differential()"] M --> N["Return results + suggestions if empty"] ``` ### Matching (`tool_matches_query`, `meta_mcp_helpers.rs:367`) A tool matches if **any** query word (or any of its synonyms) appears case-insensitively in the tool name or description. Multi-word queries use OR semantics for matching, but AND semantics score higher during ranking. ### Scoring tiers (`score_text_relevance`, `ranking.rs:175`) Text relevance is scored in discrete tiers: | Score | Condition | |-------|-----------| | 15 | All query words found in tool name | | 10+2N | All N words found in name+description combined | | 10 | Exact single-word name match | | 6+2N | N words match `[keywords: ...]` tags | | 4+2N | N words match `[schema: ...]` field names | | 3+2M | M of N words found (partial) | | 6 | Single word matches a schema field | | 5 | Name contains query as substring | | 2 | Description contains query as substring | Synonym-expanded matches apply a 0.8x multiplier. Schema matches are never synonym-discounted. ### Usage boost (`SearchRanker::rank`, `ranking.rs:383`) Final score = `text_relevance * (1 + log2(usage_count + 1) * 0.15)` Usage is multiplicative: it amplifies good matches but cannot promote irrelevant tools. ### Keyword auto-tagging (`autotag.rs`) MCP backend tools lack the `[keywords: ...]` tags that capability tools carry. `enrich_description()` auto-extracts up to 7 keywords from a tool's description: tokenize on non-alphabetic boundaries, lowercase, filter stopwords and short words, sort by descending length, truncate. Idempotent -- already-tagged descriptions pass through unchanged. ### Differential descriptions (`gateway/differential.rs`) When search results contain multiple tools from the same "family" (same server + same `snake_case` prefix, e.g. `gmail_search`, `gmail_send`, `gmail_batch_modify`), the `annotate_differential()` function computes what makes each tool unique by removing words shared across all family members. The `differential_description` field is appended alongside the original `description`. ### Zero-result suggestions (`meta_mcp_helpers.rs:396`) When no tools match, `build_suggestions()` scans all keyword tags for prefix or substring overlap with the query and returns up to 5 sorted suggestions. ## Server Lifecycle State Machine ```mermaid stateDiagram-v2 [*] --> Registered: Gateway::new() registers backend Registered --> WarmStarting: tokio::spawn warm-start task WarmStarting --> Active: backend.start() + get_tools() succeeds WarmStarting --> Failed: start or prefetch fails Failed --> Active: manual retry via gateway_list_tools(server=X) Active --> CircuitOpen: failure_threshold exceeded CircuitOpen --> HalfOpen: reset_timeout elapses HalfOpen --> Active: success_threshold reached HalfOpen --> CircuitOpen: failure in half-open Active --> Killed: KillSwitch::kill() or error budget auto-kill Killed --> Active: KillSwitch::revive() (resets error budget) Active --> HealthCheckFailed: ping fails HealthCheckFailed --> Active: next ping succeeds ``` ### Registration (`gateway/server.rs:48`) `Gateway::new()` iterates `config.enabled_backends()`, creates `Backend` instances with transport config and failsafe settings, registers them in `BackendRegistry`. ### Warm-start (`gateway/server.rs:348`) A spawned task connects each backend and prefetches its tool list into cache. If `config.meta_mcp.warm_start` is empty, ALL backends are warmed. ### Health check (`gateway/server.rs:397`) A periodic task sends `ping` to every running backend at `health_config.interval`. Failures are logged but do not directly kill the backend (the circuit breaker and error budget handle that). ### Circuit breaker (`failsafe/circuit_breaker.rs`) Per-backend `CircuitBreaker` with three states: `Closed` (allowing), `Open` (blocking), `HalfOpen` (testing). Configured via `failure_threshold`, `success_threshold`, `reset_timeout`. The `Failsafe` wrapper (`failsafe/mod.rs`) combines circuit breaker, rate limiter, retry policy, and health tracker. ### Kill switch (`kill_switch.rs`) `KillSwitch` provides two mechanisms sharing a single `DashSet<String>` of killed server names: 1. **Operator control** -- `kill(server)` / `revive(server)`, exposed as meta-tools. 2. **Error budget** -- sliding-window error-rate tracker (`BudgetWindow`). When a backend's error rate exceeds `ErrorBudgetConfig::threshold` (default 0.5) within the window (default 100 calls / 5 minutes), it is auto-killed. An 80% warning is logged before the threshold is reached. `revive()` resets the window. ## Request Flow ```mermaid sequenceDiagram participant C as LLM Client participant R as Router participant M as MetaMcp participant KS as KillSwitch participant IC as IdempotencyCache participant RC as ResponseCache participant B as Backend C->>R: POST /mcp {tools/call: gateway_invoke} R->>R: Auth middleware + input sanitization R->>R: Acquire inflight semaphore permit R->>M: handle_tools_call("gateway_invoke", args, session_id) M->>M: Parse server, tool, arguments M->>KS: is_killed(server)? alt server killed KS-->>M: true M-->>C: Error: "disabled by operator kill switch" end M->>IC: enforce(idempotency_key) alt cached result IC-->>M: GuardOutcome::CachedResult M-->>C: Cached response else in-flight IC-->>M: Error 409 M-->>C: Duplicate request error else proceed IC-->>M: GuardOutcome::Proceed (marked in-flight) end M->>RC: cache.get(server:tool:args_hash)? alt cache hit RC-->>M: cached value M-->>C: Cached response + predictions end M->>M: Record stats + ranker usage M->>B: dispatch_to_backend(server, tool, arguments) B-->>M: result or error M->>KS: record_success/record_failure (error budget) alt failure triggers auto-kill KS->>KS: killed.insert(server) end M->>RC: cache.set(key, result, ttl) M->>IC: mark_completed(key, result) M->>M: record_and_predict(session_id, tool_key) M-->>C: Result + predicted_next[] ``` ### Session tracking Each SSE connection gets a `Mcp-Session-Id` header. The `session_id` is forwarded to `invoke_tool` for per-session transition tracking. ### Idempotency (`idempotency.rs`) Three-state machine: `InFlight(Instant)` -> `Completed(Value, Instant)`, or removed on failure. Key derivation: `SHA-256(server:tool || \0 || canonical_json(arguments))`. TTLs: in-flight 5 minutes, completed 24 hours. Background cleanup task evicts stale entries. ### Transition tracking (`transition.rs`) `TransitionTracker` learns `tool_A -> tool_B` transition frequencies across sessions. After each successful invocation, `predict_next(tool, 0.30, 3)` returns candidates with >= 30% confidence and >= 3 observations. Predictions are appended as `predicted_next` in the response JSON. ## Failure Cascade Behavior ```mermaid flowchart LR F["Backend call fails"] --> CB["Circuit breaker records failure"] CB --> EB["Error budget records failure"] EB --> C1{"error_rate >= threshold?"} C1 -- "no" --> W{"error_rate >= 80% of threshold?"} W -- "yes" --> WL["WARN: approaching auto-kill"] W -- "no" --> Done["Return error to caller"] C1 -- "yes" --> AK["Auto-kill: server added to killed set"] AK --> LOG["WARN: error budget exhausted"] AK --> Block["All subsequent gateway_invoke calls rejected"] Block --> OP["Operator must call gateway_revive_server"] OP --> Reset["Error budget window cleared, server re-enabled"] ``` The circuit breaker and error budget operate at different granularities: - **Circuit breaker** (`failsafe/circuit_breaker.rs`): per-backend, configured per-backend via `FailsafeConfig`. Operates at the transport level -- blocks requests to a backend whose connection is failing. Resets automatically after `reset_timeout`. - **Error budget** (`kill_switch.rs`): per-backend sliding window at the gateway level. Tracks business-logic failures (tool errors, not just transport). Requires manual `gateway_revive_server` to recover. On dispatch failure, the idempotency entry is removed (`idem_cache.remove(key)`) so the call becomes retryable. ## Module Map | Module | Path | Description | |--------|------|-------------| | `autotag` | `src/autotag.rs` | Auto-extracts keyword tags from MCP tool descriptions for search indexing | | `backend` | `src/backend/` | Backend connection management, tool/resource/prompt caching, transport dispatch | | `cache` | `src/cache.rs` | TTL-based response cache with SHA-256 keying and max-size eviction | | `capability` | `src/capability/` | YAML-defined REST API tools: definition, parsing, execution, schema validation, hot-reload | | `cli` | `src/cli.rs` | Clap-based CLI: subcommands for serve, validate, discover, capability management | | `config` | `src/config.rs` | Figment-based configuration from YAML + env vars | | `discovery` | `src/discovery/` | Auto-discovery of MCP servers from common config locations and running processes | | `error` | `src/error.rs` | Unified error types with JSON-RPC error code mapping | | `failsafe` | `src/failsafe/` | Circuit breaker, rate limiter, retry policy, health tracker per backend | | `gateway` | `src/gateway/` | HTTP server, router, Meta-MCP handler, streaming, webhooks, auth, differential | | `idempotency` | `src/idempotency.rs` | Prevents duplicate side effects on LLM retries via SHA-256 keyed state machine | | `kill_switch` | `src/kill_switch.rs` | Operator kill switch + per-backend sliding-window error budget with auto-kill | | `oauth` | `src/oauth/` | OAuth 2.0 Authorization Code + PKCE flow for backend authentication | | `playbook` | `src/playbook.rs`, `src/playbook/` | Multi-step tool chains: YAML-defined, variable interpolation, error strategies | | `protocol` | `src/protocol/` | MCP JSON-RPC types for versions 2024-10-07 through 2025-11-25 | | `ranking` | `src/ranking.rs` | Search scoring: text relevance tiers, synonym expansion, usage-based boost, persistence | | `registry` | `src/registry.rs` | Community capability registry: discovery and installation from local/GitHub sources | | `secrets` | `src/secrets.rs` | Credential resolution: `{keychain.SERVICE}`, `{env.VAR}`, `{oauth:provider}` | | `security` | `src/security/` | Input sanitization (XSS), SSRF URL validation, tool access policy enforcement | | `stats` | `src/stats.rs` | Usage statistics: invocations, cache hits, per-tool counts, token savings calculation | | `transform` | `src/transform.rs`, `src/transform/` | Response transform pipeline: field selection, stripping noise from API responses | | `transition` | `src/transition.rs` | Session-scoped tool sequence learning and predictive next-tool suggestions | | `transport` | `src/transport/` | MCP transport implementations: stdio (process spawn), HTTP (Streamable HTTP + SSE) | | `validator` | `src/validator/` | MCP server design validator: checks agent-UX compliance against best practices | ## Key Data Structures ### `MetaMcp` (`gateway/meta_mcp.rs:51`) Central request handler. Holds `Arc` references to all subsystems: ```rust pub struct MetaMcp { backends: Arc<BackendRegistry>, // MCP server connections capabilities: RwLock<Option<Arc<CapabilityBackend>>>, // Direct REST API tools cache: Option<Arc<ResponseCache>>, // Response cache for invoke idempotency_cache: Option<Arc<IdempotencyCache>>, // Dedup LLM retries stats: Option<Arc<UsageStats>>, // Usage tracking ranker: Option<Arc<SearchRanker>>, // Usage-weighted search ranking transition_tracker: RwLock<Option<Arc<TransitionTracker>>>, // Predictive prefetch playbook_engine: RwLock<PlaybookEngine>, // Multi-step tool chains kill_switch: Arc<KillSwitch>, // Operator disable + error budget error_budget_config: RwLock<ErrorBudgetConfig>, // Auto-kill thresholds webhook_registry: RwLock<Option<Arc<..>>>, // Inbound webhook routing log_level: RwLock<LoggingLevel>, // Gateway-wide log level } ``` ### `KillSwitch` (`kill_switch.rs:32`) Lock-free concurrent reads via `DashSet` (checked on every `gateway_invoke`). Per-backend `BudgetWindow` tracks success/failure in a `VecDeque<(Instant, bool)>` with size cap and time-based eviction. ```rust pub struct KillSwitch { killed: DashSet<String>, // O(1) is_killed() check budgets: DashMap<String, Arc<Mutex<BudgetWindow>>>, // Sliding window per backend } ``` ### `IdempotencyCache` (`idempotency.rs:91`) `DashMap<String, IdempotencyState>` where state is `InFlight(Instant)` or `Completed(Value, Instant)`. Key derivation via `SHA-256(tool_name + \0 + canonical_json(args))`. Background cleanup task via `spawn_cleanup_task()`. ### `TransitionTracker` (`transition.rs:54`) Double-keyed `DashMap`: outer key = "from_tool", inner key = "to_tool", value = `AtomicU64` count. Per-session last-tool tracked separately. Predictions require both minimum count (3) and minimum confidence (30%). Persisted to `~/.mcp-gateway/transitions.json` on shutdown. ### `SearchRanker` (`ranking.rs:328`) `DashMap<String, AtomicU64>` of usage counts keyed by `"server:tool"`. Persisted to `~/.mcp-gateway/usage.json`. Loaded on startup, saved on graceful shutdown. ### `SchemaValidator` (`capability/schema_validator.rs`) Validates LLM-supplied arguments against capability YAML schemas before HTTP dispatch. Five-step pipeline: required params, unknown params, type validation with coercion (string-to-int, string-to-bool), enum values, string/numeric constraints. Error messages are structured for LLM consumption. ### `ResponseCache` (`cache.rs`) `DashMap<String, CachedResponse>` with per-entry TTL. Keys computed from `SHA-256(server:tool:canonical_args)`. Supports bounded `max_entries` with eviction. Atomic hit/miss/eviction counters. ## Persistence | Data | File | Loaded | Saved | |------|------|--------|-------| | Search ranking usage counts | `~/.mcp-gateway/usage.json` | Startup | Graceful shutdown | | Transition frequencies | `~/.mcp-gateway/transitions.json` | Startup | Graceful shutdown | Both use JSON serialization. The transition tracker's `load()` merges with in-memory state, allowing hot-reload without losing in-flight session data. ## Configuration Loaded via Figment (`config.rs`): YAML file + environment variable overrides. Key sections: - `server` -- host, port, shutdown timeout - `backends` -- map of backend name to transport config (stdio command or HTTP URL) - `meta_mcp` -- enabled, warm_start list, cache TTL - `failsafe` -- circuit breaker, rate limit, retry, health check settings - `capabilities` -- enabled, directories, backend name - `playbooks` -- enabled, directories - `webhooks` -- enabled, base path - `auth` -- bearer token, API keys - `security` -- sanitize_input, ssrf_protection, tool_policy - `cache` -- enabled, default_ttl, max_entries - `streaming` -- enabled, auto_subscribe backends ## Graceful Shutdown 1. SIGINT/SIGTERM received 2. Broadcast shutdown signal to health check and idle checker tasks 3. Save ranker usage data and transition tracking data to disk 4. Drain in-flight requests: acquire all 10,000 semaphore permits (with timeout from `config.server.shutdown_timeout`) 5. Stop all backends (`BackendRegistry::stop_all()`) Implemented in `gateway/server.rs:441-491`. ## Dependencies Runtime: `axum` (HTTP), `tokio` (async), `reqwest` (HTTP client), `tower` (middleware), `serde_json` (serialization), `dashmap` (concurrent maps), `parking_lot` (fast locks), `governor` (rate limiting), `notify` (file watching for hot-reload), `figment` (config), `sha2`/`hmac` (hashing/signatures). No `unsafe` code (enforced via `#![forbid(unsafe_code)]`).

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/MikkoParkkola/mcp-gateway'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

ARCHITECTURE.md•19 KiB