Allows retrieving git diffs and status information for code change tracking and version control integration within AI-assisted development workflows.
Implements GitHub Copilot-style persistent TODO list tracking with Redis and database storage for managing development tasks across coding sessions.
Provides integration with OpenAI models (GPT-4o, GPT-4o Mini, o1) for intelligent task routing and code generation, with automatic layer selection based on task complexity and cost optimization.
Provides cold storage for persistent conversation history, messages, context summaries, LLM call logs with token/cost tracking, and analytics data.
Implements hot storage layer for caching LLM responses, context summaries, routing hints, and TODO lists with TTL-based expiration for stateless context management.
Enables execution of Vitest unit and integration tests directly through the MCP interface, with support for watch mode and specific test path targeting.
AI MCP Gateway
Cost-Optimized Multi-Model Orchestrator with Stateless Architecture
An intelligent Model Context Protocol (MCP) server and HTTP API that orchestrates multiple AI models (free and paid) with dynamic N-layer routing, cross-checking, cost optimization, and stateless context management via Redis + PostgreSQL.
β¨ Features
Core Features
π― Smart Routing: Dynamic N-layer routing based on task complexity and quality requirements
π° Cost Optimization: Prioritizes free/cheap models, escalates only when necessary
β Cross-Checking: Multiple models review each other's work for higher quality
π§ Code Agent: Specialized AI agent for coding tasks with TODO-driven workflow
π§ͺ Test Integration: Built-in Vitest and Playwright test runners
π Metrics & Logging: Track costs, tokens, and performance
π Self-Improvement: Documents patterns, bugs, and routing heuristics
π οΈ Extensible: Easy to add new models, providers, and tools
NEW: Stateless Architecture
ποΈ Redis Cache Layer: Hot storage for LLM responses, context summaries, routing hints
πΎ PostgreSQL Database: Cold storage for conversations, messages, LLM calls, analytics
π HTTP API Mode: Stateless REST API with
/v1/route,/v1/code-agent,/v1/chatendpointsπ¦ Context Management: Two-tier context with hot (Redis) + cold (DB) layers
π Handoff Packages: Optimized inter-layer communication for model escalation
π TODO Tracking: Persistent GitHub Copilot-style TODO lists with Redis/DB storage
π Table of Contents
π Quick Start
Prerequisites
Node.js >= 20.0.0
npm or pnpm (recommended)
API keys for desired providers (OpenRouter, Anthropic, OpenAI)
Optional: Redis (for caching)
Optional: PostgreSQL (for persistence)
Installation
Build
ποΈ Architecture
Stateless Design
The AI MCP Gateway is designed as a stateless application with external state management:
Two-Tier Context Management
Hot Layer (Redis)
Context summaries (
conv:summary:{conversationId})Recent messages cache (
conv:messages:{conversationId})LLM response cache (
llm:cache:{model}:{hash})TODO lists (
todo:list:{conversationId})TTL: 30-60 minutes
Cold Layer (PostgreSQL)
Full conversation history
All messages with metadata
Context summaries (versioned)
LLM call logs (tokens, cost, duration)
Routing rules and analytics
Persistent storage
π Dual Mode Operation
The gateway supports two modes:
1. MCP Mode (stdio)
Standard Model Context Protocol server for desktop clients.
Configure in Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):
2. HTTP API Mode
Stateless REST API for web services and integrations.
API runs on http://localhost:3000 (configurable via API_PORT).
π HTTP API Usage
Endpoints
POST /v1/route
Intelligent model selection and routing.
Response:
POST /v1/code-agent
Specialized coding assistant.
POST /v1/chat
General chat endpoint with context.
GET /v1/context/:conversationId
Retrieve conversation context.
GET /health
Health check endpoint.
Response:
} }
ποΈ Architecture
High-Level Overview
Key Components
1. MCP Server (src/mcp/)
Handles MCP protocol communication
Registers and dispatches tools
Manages request/response lifecycle
2. Routing Engine (src/routing/)
Classifies tasks by type, complexity, quality
Selects optimal model layer
Orchestrates cross-checking between models
Auto-escalates when needed
3. LLM Clients (src/tools/llm/)
Unified interface for multiple providers
Handles API calls, token counting, cost calculation
Supports: OpenRouter, Anthropic, OpenAI, local models
4. Tools (src/tools/)
Code Agent: Main AI coding assistant
Testing: Vitest and Playwright runners
File System: Read/write/list operations
Git: Diff and status operations
5. Logging & Metrics (src/logging/)
Winston-based structured logging
Cost tracking and alerts
Performance metrics
π οΈ Available MCP Tools
The gateway exposes 14 MCP tools for various operations:
Code & Development Tools
Tool | Description | Key Parameters |
| AI coding assistant with TODO tracking |
,
,
|
Testing Tools
Tool | Description | Key Parameters |
| Execute Vitest unit/integration tests |
,
|
| Execute Playwright E2E tests |
|
File System Tools
Tool | Description | Key Parameters |
| Read file contents |
,
|
| Write file contents |
,
|
| List directory contents |
,
|
Git Tools
Tool | Description | Key Parameters |
| Show git diff |
|
| Show git status | - |
NEW: Cache Tools (Redis)
Tool | Description | Key Parameters |
| Get value from Redis cache |
|
| Set value in Redis cache |
,
,
|
| Delete key from Redis cache |
|
NEW: Database Tools (PostgreSQL)
Tool | Description | Key Parameters |
| Execute SQL query |
,
|
| Insert row into table |
,
|
| Update rows in table |
,
,
|
Tool Usage Examples
Using Redis cache:
Querying database:
π¦ Context Management
How Context Works
Conversation Initialization
Client sends
conversationIdwith each requestGateway checks Redis for existing context summary
Falls back to DB if Redis miss
Creates new conversation if not exists
Context Storage
Summary: Compressed project context (stack, architecture, decisions)
Messages: Recent messages (last 50 in Redis, all in DB)
TODO Lists: Persistent task tracking
Metadata: User, project, timestamps
Context Compression
When context grows large (>50 messages):
System generates new summary
Keeps only recent 5-10 messages in detail
Older messages summarized into context
Reduces token usage while maintaining relevance
Context Handoff
When escalating between layers:
Creates handoff package with:
Context summary
Current task
Previous attempts
Known issues
Request to higher layer
Optimized for minimal tokens
Database Schema
βοΈ Configuration
Environment Variables
Create a .env file (use .env.example as template):
Model Configuration
Edit src/config/models.ts to:
Add/remove models
Adjust layer assignments
Update pricing
Enable/disable models
Example:
π Usage
Using the Code Agent
The Code Agent is the primary tool for coding tasks:
Response includes:
Generated code
Routing summary (which models were used)
Token usage and cost
Quality assessment
Running Tests
File Operations
Git Operations
π οΈ Available Tools
Tool Name | Description | Input |
| AI coding assistant with multi-model routing | task, context, quality |
| Run Vitest unit/integration tests | testPath (optional) |
| Run Playwright E2E tests | testPath (optional) |
| Read file contents | path |
| Write file contents | path, content |
| List directory contents | path |
| Get git diff | path (optional), staged (bool) |
| Get git status | none |
ποΈ Model Layers
Layer L0 - Free/Cheapest
Models: Mistral 7B Free, Qwen 2 7B Free, OSS Local
Cost: $0
Use for: Simple tasks, drafts, code review
Capabilities: Basic code, general knowledge
Layer L1 - Low Cost
Models: Gemini Flash 1.5, GPT-4o Mini
Cost: ~$0.08-0.75 per 1M tokens
Use for: Standard coding tasks, refactoring
Capabilities: Code, reasoning, vision
Layer L2 - Mid-tier
Models: Claude 3 Haiku, GPT-4o
Cost: ~$1.38-12.5 per 1M tokens
Use for: Complex tasks, high-quality requirements
Capabilities: Advanced code, reasoning, vision
Layer L3 - Premium
Models: Claude 3.5 Sonnet, OpenAI o1
Cost: ~$18-60 per 1M tokens
Use for: Critical tasks, architecture design
Capabilities: SOTA performance, deep reasoning
π» Development
Project Structure
Scripts
π§ͺ Testing
Unit Tests
Integration Tests
Integration tests verify interactions between components:
Regression Tests
Regression tests prevent previously fixed bugs from reoccurring:
E2E Tests
End-to-end tests using Playwright:
π Self-Improvement
The gateway includes a self-improvement system:
1. Bug Tracking (docs/ai-common-bugs-and-fixes.md)
Documents encountered bugs
Includes root causes and fixes
Links to regression tests
2. Pattern Learning (docs/ai-orchestrator-notes.md)
Tracks successful patterns
Records optimization opportunities
Documents lessons learned
3. Routing Refinement (docs/ai-routing-heuristics.md)
Defines routing rules
Documents when to escalate
Model capability matrix
Adding to Self-Improvement Docs
When you discover a bug or pattern:
Document it in the appropriate file
Create a regression test in
tests/regression/Update routing heuristics if needed
Run tests to verify the fix
π€ Contributing
Contributions are welcome! Please:
Fork the repository
Create a feature branch
Make your changes with tests
Update documentation
Submit a pull request
Adding a New Model
Update
src/config/models.ts:{ id: 'new-model-id', provider: 'provider-name', // ... config }Add provider client if needed in
src/tools/llm/Update
docs/ai-routing-heuristics.md
Adding a New Tool
Create tool in
src/tools/yourtool/index.ts:export const yourTool = { name: 'your_tool', description: '...', inputSchema: { ... }, handler: async (args) => { ... } };Register in
src/mcp/server.tsAdd tests in
tests/unit/
π License
MIT License - see LICENSE file for details
π Acknowledgments
Model Context Protocol by Anthropic
OpenRouter for unified LLM access
All the amazing open-source LLM providers
π Support
Issues: GitHub Issues
Discussions: GitHub Discussions
Documentation: Wiki
πΊοΈ Roadmap
Token usage analytics dashboard
Caching layer for repeated queries
More LLM providers (Google AI, Cohere, etc.)
Streaming response support
Web UI for configuration and monitoring
Batch processing optimizations
Advanced prompt templates
A/B testing framework
Made with β€οΈ for efficient AI orchestration
This server cannot be installed