Allows retrieving git diffs and status information for code change tracking and version control integration within AI-assisted development workflows.
Implements GitHub Copilot-style persistent TODO list tracking with Redis and database storage for managing development tasks across coding sessions.
Provides integration with OpenAI models (GPT-4o, GPT-4o Mini, o1) for intelligent task routing and code generation, with automatic layer selection based on task complexity and cost optimization.
Provides cold storage for persistent conversation history, messages, context summaries, LLM call logs with token/cost tracking, and analytics data.
Implements hot storage layer for caching LLM responses, context summaries, routing hints, and TODO lists with TTL-based expiration for stateless context management.
Enables execution of Vitest unit and integration tests directly through the MCP interface, with support for watch mode and specific test path targeting.
AI MCP Gateway
Intelligent Multi-Model Orchestrator with Cost Optimization & Admin Dashboard
A production-ready Model Context Protocol (MCP) server and HTTP API Gateway that orchestrates multiple AI models with intelligent N-layer routing, budget tracking, task-specific model selection, real-time monitoring, and comprehensive admin dashboard.
β¨ Key Features
π― Intelligent Routing
Dynamic N-Layer Routing: Automatically routes requests to appropriate model tier (L0-L3) based on complexity
Task-Specific Models: Dedicated model configurations for chat, code, analysis, and project creation
Escalation Control: Manual confirmation required for paid model escalation (configurable)
OpenRouter Fallback: Automatically fetches top-ranked free models when L0 is unconfigured
π° Cost Optimization
Budget Tracking: Set per-project budgets with automatic enforcement
Free-First Strategy: Prioritizes free models (L0), escalates only when necessary
Real-time Cost Monitoring: Live tracking and alerts via dashboard and
/healthendpointLayer Limits: Configure maximum escalation tier per project
π Admin Dashboard (NEW)
Real-time Monitoring: Live metrics for requests, costs, tokens, and latency
Analytics Dashboard: Time-series charts, model usage breakdown, cost analysis
Provider Management: Enable/disable providers, configure API keys, health monitoring
Model Management: Add/remove models, enable/disable layers dynamically
Alert System: Custom alerts with multi-channel notifications (Email, Slack, Webhook)
Token Management: Create, view, and manage API gateway tokens
Docker Logs: Real-time log viewer with filtering and search
Settings Panel: Comprehensive system configuration interface
π§ Advanced Capabilities
Cross-Checking: Multiple models validate each other's outputs for critical tasks
Provider Health Monitoring: Automatic failover when providers are unavailable
Context Management: Redis + PostgreSQL for efficient state management
Code Agent: Specialized AI for coding with TODO-driven workflow
Test Integration: Built-in Vitest and Playwright test runners
π Multi-Client Architecture
HTTP API Gateway: RESTful endpoints for any client (CLI, Web, Telegram, CI/CD)
Admin Dashboard: Modern React-based web UI for system management (port 5173)
MCP Server Mode: Native support for MCP clients (Claude Desktop, VSCode)
CLI Tool: Powerful command-line interface with project scaffolding
Docker Ready: Full containerization with docker-compose
π Table of Contents
π Complete Documentation Index - Navigate all guides and references
π Quick Start
Docker (Recommended) π³
Fastest way to get started with full stack (Gateway + Dashboard + Redis + PostgreSQL):
Or using Makefile:
Services included:
ai-mcp-gateway- API Gateway (port 3000)ai-mcp-dashboard- Admin Dashboard (port 5173)ai-mcp-postgres- PostgreSQL 15 (port 5432)ai-mcp-redis- Redis 7 (port 6379)
See DOCKER-QUICKSTART.md for details.
π Admin Dashboard
Modern web-based admin interface for monitoring and managing the AI Gateway.
Access Dashboard
Features
8 Main Pages:
π Dashboard - Real-time system monitoring
Total requests, costs, tokens, latency
Layer status (L0-L3)
Service health (Database, Redis, Providers)
Auto-refresh every 5 seconds
π Analytics - Deep insights and trends
Time-series charts (1h/24h/7d/30d)
Model usage breakdown
Cost analysis by layer
Error tracking
Performance metrics with trends
π Gateway Tokens - API token management
Create/delete tokens
Show/hide token values
Copy to clipboard
Usage examples
π¦ Docker Logs - Real-time log viewer
Container filtering
Search and filter
Pause/resume streaming
Download logs
Color-coded log levels
π Providers - Provider management
Enable/disable providers
Configure API keys
Set base URLs
Health monitoring
Save configurations
π€ Models - Layer and model management
Enable/disable layers (L0-L3)
Add/remove models dynamically
Edit mode with inline forms
Real-time feedback
π Alerts - Alert system
Create custom alerts (cost, latency, errors, uptime)
Multi-channel notifications (Email, Slack, Webhook)
Enable/disable alerts
Flexible conditions
βοΈ Settings - System configuration
General settings (log level, default layer)
Routing features (cross-check, auto-escalate)
Cost management
Layer control
Task-specific models
Tech Stack
React 19.2.0 + TypeScript 5.9.3
Vite 7.2.5 (Rolldown)
Tailwind CSS 3.4.0
React Router 7
Axios + Lucide React
See admin-dashboard/FEATURES.md for complete documentation.
Option 2: Local Development
Prerequisites:
Node.js >= 20.0.0
npm or pnpm (recommended)
API keys for desired providers (OpenRouter, Anthropic, OpenAI)
Optional: Redis (for caching)
Optional: PostgreSQL (for persistence)
Installation:
Build:
ποΈ Architecture
Stateless Design
The AI MCP Gateway is designed as a stateless application with external state management:
Two-Tier Context Management
Hot Layer (Redis)
Context summaries (
conv:summary:{conversationId})Recent messages cache (
conv:messages:{conversationId})LLM response cache (
llm:cache:{model}:{hash})TODO lists (
todo:list:{conversationId})TTL: 30-60 minutes
Cold Layer (PostgreSQL)
Full conversation history
All messages with metadata
Context summaries (versioned)
LLM call logs (tokens, cost, duration)
Routing rules and analytics
Persistent storage
π Dual Mode Operation
The gateway supports two modes:
1. MCP Mode (stdio)
Standard Model Context Protocol server for desktop clients.
Configure in Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):
2. HTTP API Mode
Stateless REST API for web services and integrations.
Access API at http://localhost:3000.
π₯οΈ CLI Tool
A powerful command-line interface for interacting with the MCP Gateway, inspired by Claude CLI.
Installation
Quick Start
Features
π€ Interactive Chat - Real-time conversation with AI
π Code Analysis - Expert code reviews and suggestions
π§ Diff Generation - Unified patches for code changes
π¨ Syntax Highlighting - Colored terminal output
π Pipe Support - Works with Unix pipes
π Context Aware - Includes git status and workspace files
See cli/README.md and cli/QUICKSTART.md for complete documentation.
API runs on http://localhost:3000 (configurable via API_PORT).
π HTTP API Usage
Endpoints
POST /v1/route
Intelligent model selection and routing.
Response:
POST /v1/code-agent
Specialized coding assistant.
POST /v1/chat
General chat endpoint with context.
GET /v1/context/:conversationId
Retrieve conversation context.
GET /health
Health check endpoint.
Response:
GET /v1/server-stats
Real-time server statistics.
Response:
See SERVER-STATS-GUIDE.md for detailed monitoring guide.
POST /v1/mcp-cli
Handle CLI tool requests (chat, code, diff modes).
Response:
Modes:
chat- Interactive conversationcode- Code analysis/reviewdiff- Generate unified diff patches
See cli/README.md for CLI tool documentation.
π³ Docker Deployment
The project includes complete Docker support for easy deployment:
Quick Deploy
Documentation
DOCKER-QUICKSTART.md - Quick reference guide
DOCKER-DEPLOYMENT.md - Comprehensive deployment guide with:
Multi-stage builds
Production best practices
Environment configuration
Scaling and monitoring
Backup/restore procedures
Troubleshooting tips
Docker Files
Dockerfile- Multi-stage build (optimized for production)docker-compose.yml- Full stack (Gateway + Redis + PostgreSQL + Ollama)docker-compose.dev.yml- Simplified development setup.env.docker.example- Environment variable templateMakefile- Convenience commands for Docker operations "args": ["/path/to/ai-mcp-gateway/dist/index.js"] } } }
ποΈ Architecture
High-Level Overview
Key Components
1. MCP Server (src/mcp/)
Handles MCP protocol communication
Registers and dispatches tools
Manages request/response lifecycle
2. Routing Engine (src/routing/)
Classifies tasks by type, complexity, quality
Selects optimal model layer
Orchestrates cross-checking between models
Auto-escalates when needed
3. LLM Clients (src/tools/llm/)
Unified interface for multiple providers
Handles API calls, token counting, cost calculation
Supports: OpenRouter, Anthropic, OpenAI, local models
4. Tools (src/tools/)
Code Agent: Main AI coding assistant
Testing: Vitest and Playwright runners
File System: Read/write/list operations
Git: Diff and status operations
5. Logging & Metrics (src/logging/)
Winston-based structured logging
Cost tracking and alerts
Performance metrics
π οΈ Available MCP Tools
The gateway exposes 14 MCP tools for various operations:
Code & Development Tools
Tool | Description | Key Parameters |
| AI coding assistant with TODO tracking |
,
,
|
Testing Tools
Tool | Description | Key Parameters |
| Execute Vitest unit/integration tests |
,
|
| Execute Playwright E2E tests |
|
File System Tools
Tool | Description | Key Parameters |
| Read file contents |
,
|
| Write file contents |
,
|
| List directory contents |
,
|
Git Tools
Tool | Description | Key Parameters |
| Show git diff |
|
| Show git status | - |
NEW: Cache Tools (Redis)
Tool | Description | Key Parameters |
| Get value from Redis cache |
|
| Set value in Redis cache |
,
,
|
| Delete key from Redis cache |
|
NEW: Database Tools (PostgreSQL)
Tool | Description | Key Parameters |
| Execute SQL query |
,
|
| Insert row into table |
,
|
| Update rows in table |
,
,
|
Tool Usage Examples
Using Redis cache:
Querying database:
π¦ Context Management
How Context Works
Conversation Initialization
Client sends
conversationIdwith each requestGateway checks Redis for existing context summary
Falls back to DB if Redis miss
Creates new conversation if not exists
Context Storage
Summary: Compressed project context (stack, architecture, decisions)
Messages: Recent messages (last 50 in Redis, all in DB)
TODO Lists: Persistent task tracking
Metadata: User, project, timestamps
Context Compression
When context grows large (>50 messages):
System generates new summary
Keeps only recent 5-10 messages in detail
Older messages summarized into context
Reduces token usage while maintaining relevance
Context Handoff
When escalating between layers:
Creates handoff package with:
Context summary
Current task
Previous attempts
Known issues
Request to higher layer
Optimized for minimal tokens
Database Schema
βοΈ Configuration
Environment Variables
Create a .env.docker file (use .env.docker.example as template):
Model Layers
The gateway organizes models into 4 tiers:
Layer | Cost | Use Case | Examples |
L0 | Free | Simple tasks, drafts, complexity detection | Llama 3.3 70B, Qwen 2.5 Coder, DeepSeek Coder (all free) |
L1 | Cheap | Standard coding, chat, reviews | GPT-4o-mini, Claude Haiku |
L2 | Mid | Complex logic, architecture, debugging | GPT-4o, Claude Sonnet |
L3 | Premium | Critical systems, production code | o1-preview, Claude Opus |
Key Features
1. Task-Specific Models (NEW!)
Define preferred models for different task types:
CHAT_MODELS: General conversation and questionsCODE_MODELS: Code generation and refactoringANALYZE_MODELS: Code analysis and debuggingCREATE_PROJECT_MODELS: Project scaffolding
The router automatically selects the best model from your preferred list based on the task type.
2. OpenRouter Fallback (NEW!)
When L0 has no configured models, the system automatically:
Fetches available free models from OpenRouter API
Ranks them by context window size and capabilities
Uses top 5 for routing
Logs the fallback operation
3. Escalation Control (NEW!)
When ENABLE_AUTO_ESCALATE=false:
System detects when a higher tier is needed
Prompts user for confirmation before using paid models
Shows reason for escalation
Allows manual approval/rejection
4. Budget Tracking (CLI Feature)
When creating projects via CLI:
The CLI tracks cumulative costs and stops generation if budget is exceeded. }, contextWindow: 100000, enabled: true, }
Response includes:
Generated code
Routing summary (which models were used)
Token usage and cost
Quality assessment
Running Tests
File Operations
Git Operations
π οΈ Available Tools
Tool Name | Description | Input |
| AI coding assistant with multi-model routing | task, context, quality |
| Run Vitest unit/integration tests | testPath (optional) |
| Run Playwright E2E tests | testPath (optional) |
| Read file contents | path |
| Write file contents | path, content |
| List directory contents | path |
| Get git diff | path (optional), staged (bool) |
| Get git status | none |
ποΈ Model Layers
Layer L0 - Free/Cheapest
Models: Mistral 7B Free, Qwen 2 7B Free, OSS Local
Cost: $0
Use for: Simple tasks, drafts, code review
Capabilities: Basic code, general knowledge
Layer L1 - Low Cost
Models: Gemini Flash 1.5, GPT-4o Mini
Cost: ~$0.08-0.75 per 1M tokens
Use for: Standard coding tasks, refactoring
Capabilities: Code, reasoning, vision
Layer L2 - Mid-tier
Models: Claude 3 Haiku, GPT-4o
Cost: ~$1.38-12.5 per 1M tokens
Use for: Complex tasks, high-quality requirements
Capabilities: Advanced code, reasoning, vision
Layer L3 - Premium
Models: Claude 3.5 Sonnet, OpenAI o1
Cost: ~$18-60 per 1M tokens
Use for: Critical tasks, architecture design
Capabilities: SOTA performance, deep reasoning
π» Development
Project Structure
Scripts
π§ͺ Testing
Unit Tests
Integration Tests
Integration tests verify interactions between components:
Regression Tests
Regression tests prevent previously fixed bugs from reoccurring:
E2E Tests
End-to-end tests using Playwright:
π Self-Improvement
The gateway includes a self-improvement system:
1. Bug Tracking (docs/ai-common-bugs-and-fixes.md)
Documents encountered bugs
Includes root causes and fixes
Links to regression tests
2. Pattern Learning (docs/ai-orchestrator-notes.md)
Tracks successful patterns
Records optimization opportunities
Documents lessons learned
3. Routing Refinement (docs/ai-routing-heuristics.md)
Defines routing rules
Documents when to escalate
Model capability matrix
Adding to Self-Improvement Docs
When you discover a bug or pattern:
Document it in the appropriate file
Create a regression test in
tests/regression/Update routing heuristics if needed
Run tests to verify the fix
π€ Contributing
Contributions are welcome! Please:
Fork the repository
Create a feature branch
Make your changes with tests
Update documentation
Submit a pull request
Adding a New Model
Update
src/config/models.ts:{ id: 'new-model-id', provider: 'provider-name', // ... config }Add provider client if needed in
src/tools/llm/Update
docs/ai-routing-heuristics.md
Adding a New Tool
Create tool in
src/tools/yourtool/index.ts:export const yourTool = { name: 'your_tool', description: '...', inputSchema: { ... }, handler: async (args) => { ... } };Register in
src/mcp/server.tsAdd tests in
tests/unit/
π License
MIT License - see LICENSE file for details
π Acknowledgments
Model Context Protocol by Anthropic
OpenRouter for unified LLM access
All the amazing open-source LLM providers
π Support
Issues: GitHub Issues
Discussions: GitHub Discussions
Documentation: See comprehensive guides in this repository
πΊοΈ Roadmap
β Redis caching layer (implemented)
β PostgreSQL persistence (implemented)
β HTTP API mode (implemented)
β CLI tool (implemented)
β Docker deployment (implemented)
Token usage analytics dashboard
More LLM providers (Google AI, Cohere, etc.)
Streaming response support
Web UI for configuration and monitoring
Advanced prompt templates
A/B testing framework for routing strategies
Made with β€οΈ for efficient AI orchestration