README.mdβ’26.4 kB
# AI MCP Gateway
**Cost-Optimized Multi-Model Orchestrator with Stateless Architecture**
An intelligent Model Context Protocol (MCP) server and HTTP API that orchestrates multiple AI models (free and paid) with dynamic N-layer routing, cross-checking, cost optimization, and stateless context management via Redis + PostgreSQL.
[](https://www.typescriptlang.org/)
[](https://nodejs.org/)
[](https://modelcontextprotocol.io/)
[](LICENSE)
---
## β¨ Features
### Core Features
- π― **Smart Routing**: Dynamic N-layer routing based on task complexity and quality requirements
- π° **Cost Optimization**: Prioritizes free/cheap models, escalates only when necessary
- β
**Cross-Checking**: Multiple models review each other's work for higher quality
- π§ **Code Agent**: Specialized AI agent for coding tasks with TODO-driven workflow
- π§ͺ **Test Integration**: Built-in Vitest and Playwright test runners
- π **Metrics & Logging**: Track costs, tokens, and performance
- π **Self-Improvement**: Documents patterns, bugs, and routing heuristics
- π οΈ **Extensible**: Easy to add new models, providers, and tools
### NEW: Stateless Architecture
- ποΈ **Redis Cache Layer**: Hot storage for LLM responses, context summaries, routing hints
- πΎ **PostgreSQL Database**: Cold storage for conversations, messages, LLM calls, analytics
- π **HTTP API Mode**: Stateless REST API with `/v1/route`, `/v1/code-agent`, `/v1/chat` endpoints
- π¦ **Context Management**: Two-tier context with hot (Redis) + cold (DB) layers
- π **Handoff Packages**: Optimized inter-layer communication for model escalation
- π **TODO Tracking**: Persistent GitHub Copilot-style TODO lists with Redis/DB storage
---
## π Table of Contents
- [Quick Start](#quick-start)
- [Architecture](#architecture)
- [Dual Mode Operation](#dual-mode-operation)
- [Configuration](#configuration)
- [HTTP API Usage](#http-api-usage)
- [Available Tools](#available-tools)
- [Model Layers](#model-layers)
- [Context Management](#context-management)
- [Development](#development)
- [Testing](#testing)
- [Contributing](#contributing)
---
## π Quick Start
### Prerequisites
- Node.js >= 20.0.0
- npm or pnpm (recommended)
- API keys for desired providers (OpenRouter, Anthropic, OpenAI)
- **Optional**: Redis (for caching)
- **Optional**: PostgreSQL (for persistence)
### Installation
```bash
# Clone the repository
git clone https://github.com/yourusername/ai-mcp-gateway.git
cd ai-mcp-gateway
# Install dependencies
npm install
# Copy environment template
cp .env.example .env
# Edit .env and add your API keys and database settings
nano .env
```
### Build
```bash
# Build the project
npm run build
# Or run in development mode
npm run dev
```
---
## ποΈ Architecture
### Stateless Design
The AI MCP Gateway is designed as a **stateless application** with external state management:
```
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β AI MCP Gateway (Stateless) β
β ββββββββββββββββ ββββββββββββββββ β
β β MCP Server β β HTTP API β β
β β (stdio) β β (REST) β β
β ββββββββ¬ββββββββ ββββββββ¬ββββββββ β
β β β β
β βββββββββββ¬ββββββββββββ β
β β β
β βββββββββββΌβββββββββββ β
β β Routing Engine β β
β β Context Manager β β
β βββββββββββ¬βββββββββββ β
βββββββββββββββββββββΌββββββββββββββββββββββββββββββ
β
βββββββββββββΌββββββββββββ
β β β
ββββββΌβββββ βββββΌβββββ βββββΌβββββ
β Redis β β DB β β LLMs β
β (Hot) β β(Cold) β β β
βββββββββββ ββββββββββ ββββββββββ
```
### Two-Tier Context Management
1. **Hot Layer (Redis)**
- Context summaries (`conv:summary:{conversationId}`)
- Recent messages cache (`conv:messages:{conversationId}`)
- LLM response cache (`llm:cache:{model}:{hash}`)
- TODO lists (`todo:list:{conversationId}`)
- TTL: 30-60 minutes
2. **Cold Layer (PostgreSQL)**
- Full conversation history
- All messages with metadata
- Context summaries (versioned)
- LLM call logs (tokens, cost, duration)
- Routing rules and analytics
- Persistent storage
---
## π Dual Mode Operation
The gateway supports two modes:
### 1. MCP Mode (stdio)
Standard Model Context Protocol server for desktop clients.
```bash
npm run start:mcp
# or
npm start
```
Configure in Claude Desktop (`~/Library/Application Support/Claude/claude_desktop_config.json`):
```json
{
"mcpServers": {
"ai-mcp-gateway": {
"command": "node",
"args": ["/path/to/ai-mcp-gateway/dist/index.js"]
}
}
}
```
### 2. HTTP API Mode
Stateless REST API for web services and integrations.
```bash
npm run start:api
# or
MODE=api npm start
```
API runs on `http://localhost:3000` (configurable via `API_PORT`).
---
## π HTTP API Usage
### Endpoints
#### POST /v1/route
Intelligent model selection and routing.
```bash
curl -X POST http://localhost:3000/v1/route \
-H "Content-Type: application/json" \
-d '{
"conversationId": "conv-123",
"message": "Explain async/await in JavaScript",
"userId": "user-1",
"qualityLevel": "normal"
}'
```
Response:
```json
{
"result": {
"response": "Async/await is...",
"model": "anthropic/claude-sonnet-4",
"provider": "anthropic"
},
"routing": {
"summary": "L0 -> primary model",
"fromCache": false
},
"context": {
"conversationId": "conv-123"
},
"performance": {
"durationMs": 1234,
"tokens": { "input": 50, "output": 200 },
"cost": 0.002
}
}
```
#### POST /v1/code-agent
Specialized coding assistant.
```bash
curl -X POST http://localhost:3000/v1/code-agent \
-H "Content-Type: application/json" \
-d '{
"conversationId": "conv-123",
"task": "Create a React component for user profile",
"files": ["src/components/UserProfile.tsx"]
}'
```
#### POST /v1/chat
General chat endpoint with context.
```bash
curl -X POST http://localhost:3000/v1/chat \
-H "Content-Type: application/json" \
-d '{
"conversationId": "conv-123",
"message": "What did we discuss earlier?"
}'
```
#### GET /v1/context/:conversationId
Retrieve conversation context.
```bash
curl http://localhost:3000/v1/context/conv-123
```
#### GET /health
Health check endpoint.
```bash
curl http://localhost:3000/health
```
Response:
```json
{
"status": "ok",
"redis": true,
"database": true,
"timestamp": "2025-11-22T06:42:00.000Z"
}
```
"args": ["/path/to/ai-mcp-gateway/dist/index.js"]
}
}
}
```
### Start the Server
```bash
# Run the built server
pnpm start
# Or use the binary directly
node dist/index.js
```
---
## ποΈ Architecture
### High-Level Overview
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MCP Client β
β (Claude Desktop, VS Code, etc.) β
βββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββ
β MCP Protocol
βββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββ
β AI MCP Gateway Server β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Tools Registry β β
β β β’ code_agent β’ run_vitest β β
β β β’ run_playwright β’ fs_read/write β β
β β β’ git_diff β’ git_status β β
β ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββββββΌβββββββββββββββββββββββββββββββ β
β β Routing Engine β β
β β β’ Task classification β β
β β β’ Layer selection (L0βL1βL2βL3) β β
β β β’ Cross-check orchestration β β
β β β’ Auto-escalation β β
β ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββββββΌβββββββββββββββββββββββββββββββ β
β β LLM Clients β β
β β β’ OpenRouter β’ Anthropic β β
β β β’ OpenAI β’ OSS Local β β
β ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββΌββββββββββββββββ
β β β
βββββββββΌβββββββ βββββββΌβββββββ βββββββΌβββββββ
β Free Models β β Paid Modelsβ βLocal Modelsβ
β (Layer L0) β β(Layer L1-L3)β β (Layer L0)β
ββββββββββββββββ ββββββββββββββ ββββββββββββββ
```
### Key Components
#### 1. **MCP Server** (`src/mcp/`)
- Handles MCP protocol communication
- Registers and dispatches tools
- Manages request/response lifecycle
#### 2. **Routing Engine** (`src/routing/`)
- Classifies tasks by type, complexity, quality
- Selects optimal model layer
- Orchestrates cross-checking between models
- Auto-escalates when needed
#### 3. **LLM Clients** (`src/tools/llm/`)
- Unified interface for multiple providers
- Handles API calls, token counting, cost calculation
- Supports: OpenRouter, Anthropic, OpenAI, local models
#### 4. **Tools** (`src/tools/`)
- **Code Agent**: Main AI coding assistant
- **Testing**: Vitest and Playwright runners
- **File System**: Read/write/list operations
- **Git**: Diff and status operations
#### 5. **Logging & Metrics** (`src/logging/`)
- Winston-based structured logging
- Cost tracking and alerts
- Performance metrics
---
## π οΈ Available MCP Tools
The gateway exposes 14 MCP tools for various operations:
### Code & Development Tools
| Tool | Description | Key Parameters |
|------|-------------|----------------|
| `code_agent` | AI coding assistant with TODO tracking | `task`, `context`, `quality` |
### Testing Tools
| Tool | Description | Key Parameters |
|------|-------------|----------------|
| `run_vitest` | Execute Vitest unit/integration tests | `testPath`, `watch` |
| `run_playwright` | Execute Playwright E2E tests | `testPath` |
### File System Tools
| Tool | Description | Key Parameters |
|------|-------------|----------------|
| `fs_read` | Read file contents | `path`, `encoding` |
| `fs_write` | Write file contents | `path`, `content` |
| `fs_list` | List directory contents | `path`, `recursive` |
### Git Tools
| Tool | Description | Key Parameters |
|------|-------------|----------------|
| `git_diff` | Show git diff | `staged` |
| `git_status` | Show git status | - |
### **NEW: Cache Tools (Redis)**
| Tool | Description | Key Parameters |
|------|-------------|----------------|
| `redis_get` | Get value from Redis cache | `key` |
| `redis_set` | Set value in Redis cache | `key`, `value`, `ttl` |
| `redis_del` | Delete key from Redis cache | `key` |
### **NEW: Database Tools (PostgreSQL)**
| Tool | Description | Key Parameters |
|------|-------------|----------------|
| `db_query` | Execute SQL query | `sql`, `params` |
| `db_insert` | Insert row into table | `table`, `data` |
| `db_update` | Update rows in table | `table`, `where`, `data` |
### Tool Usage Examples
**Using Redis cache:**
```json
{
"tool": "redis_set",
"arguments": {
"key": "user:profile:123",
"value": {"name": "John", "role": "admin"},
"ttl": 3600
}
}
```
**Querying database:**
```json
{
"tool": "db_query",
"arguments": {
"sql": "SELECT * FROM conversations WHERE user_id = $1 LIMIT 10",
"params": ["user-123"]
}
}
```
---
## π¦ Context Management
### How Context Works
1. **Conversation Initialization**
- Client sends `conversationId` with each request
- Gateway checks Redis for existing context summary
- Falls back to DB if Redis miss
- Creates new conversation if not exists
2. **Context Storage**
- **Summary**: Compressed project context (stack, architecture, decisions)
- **Messages**: Recent messages (last 50 in Redis, all in DB)
- **TODO Lists**: Persistent task tracking
- **Metadata**: User, project, timestamps
3. **Context Compression**
- When context grows large (>50 messages):
- System generates new summary
- Keeps only recent 5-10 messages in detail
- Older messages summarized into context
- Reduces token usage while maintaining relevance
4. **Context Handoff**
- When escalating between layers:
- Creates handoff package with:
- Context summary
- Current task
- Previous attempts
- Known issues
- Request to higher layer
- Optimized for minimal tokens
### Database Schema
```sql
-- Conversations
CREATE TABLE conversations (
id TEXT PRIMARY KEY,
user_id TEXT,
project_id TEXT,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
metadata JSONB DEFAULT '{}'::jsonb
);
-- Messages
CREATE TABLE messages (
id SERIAL PRIMARY KEY,
conversation_id TEXT REFERENCES conversations(id),
role TEXT NOT NULL,
content TEXT NOT NULL,
metadata JSONB DEFAULT '{}'::jsonb,
created_at TIMESTAMP DEFAULT NOW()
);
-- Context summaries
CREATE TABLE context_summaries (
id SERIAL PRIMARY KEY,
conversation_id TEXT REFERENCES conversations(id),
summary TEXT NOT NULL,
version INTEGER DEFAULT 1,
created_at TIMESTAMP DEFAULT NOW()
);
-- LLM call logs
CREATE TABLE llm_calls (
id SERIAL PRIMARY KEY,
conversation_id TEXT REFERENCES conversations(id),
model_id TEXT NOT NULL,
layer TEXT NOT NULL,
input_tokens INTEGER DEFAULT 0,
output_tokens INTEGER DEFAULT 0,
estimated_cost DECIMAL(10, 6) DEFAULT 0,
duration_ms INTEGER,
success BOOLEAN DEFAULT true,
created_at TIMESTAMP DEFAULT NOW()
);
-- TODO lists
CREATE TABLE todo_lists (
id SERIAL PRIMARY KEY,
conversation_id TEXT REFERENCES conversations(id),
todo_data JSONB NOT NULL,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
```
---
## βοΈ Configuration
### Environment Variables
Create a `.env` file (use `.env.example` as template):
```bash
# MCP Server
MCP_SERVER_NAME=ai-mcp-gateway
MCP_SERVER_VERSION=0.1.0
# API Keys
OPENROUTER_API_KEY=sk-or-v1-...
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
# OSS/Local Models (optional)
OSS_MODEL_ENDPOINT=http://localhost:11434
OSS_MODEL_ENABLED=false
# Redis
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=
REDIS_DB=0
# PostgreSQL
DATABASE_URL=postgresql://user:pass@localhost:5432/ai_mcp_gateway
DB_HOST=localhost
DB_PORT=5432
DB_NAME=ai_mcp_gateway
DB_USER=postgres
DB_PASSWORD=
DB_SSL=false
# HTTP API
API_PORT=3000
API_HOST=0.0.0.0
API_CORS_ORIGIN=*
# Logging
LOG_LEVEL=info
LOG_FILE=logs/ai-mcp-gateway.log
# Routing Configuration
DEFAULT_LAYER=L0
ENABLE_CROSS_CHECK=true
ENABLE_AUTO_ESCALATE=true
MAX_ESCALATION_LAYER=L2
# Cost Tracking
ENABLE_COST_TRACKING=true
COST_ALERT_THRESHOLD=1.00
# Mode
MODE=mcp # or 'api' for HTTP server
```
### Model Configuration
Edit `src/config/models.ts` to:
- Add/remove models
- Adjust layer assignments
- Update pricing
- Enable/disable models
Example:
```typescript
{
id: 'my-custom-model',
provider: 'openrouter',
apiModelName: 'provider/model-name',
layer: 'L1',
relativeCost: 5,
pricePer1kInputTokens: 0.001,
pricePer1kOutputTokens: 0.002,
capabilities: {
code: true,
general: true,
reasoning: true,
},
contextWindow: 100000,
enabled: true,
}
```
---
## π Usage
### Using the Code Agent
The Code Agent is the primary tool for coding tasks:
```typescript
// Example MCP client call
{
"tool": "code_agent",
"arguments": {
"task": "Create a TypeScript function to validate email addresses",
"context": {
"language": "typescript",
"requirements": [
"Use regex pattern",
"Handle edge cases",
"Include unit tests"
]
},
"quality": "high"
}
}
```
**Response includes:**
- Generated code
- Routing summary (which models were used)
- Token usage and cost
- Quality assessment
### Running Tests
```typescript
// Run Vitest tests
{
"tool": "run_vitest",
"arguments": {
"testPath": "tests/unit/mytest.test.ts"
}
}
// Run Playwright E2E tests
{
"tool": "run_playwright",
"arguments": {
"testPath": "tests/e2e/login.spec.ts"
}
}
```
### File Operations
```typescript
// Read file
{
"tool": "fs_read",
"arguments": {
"path": "/path/to/file.ts"
}
}
// Write file
{
"tool": "fs_write",
"arguments": {
"path": "/path/to/output.ts",
"content": "console.log('Hello');"
}
}
// List directory
{
"tool": "fs_list",
"arguments": {
"path": "/path/to/directory"
}
}
```
### Git Operations
```typescript
// Get diff
{
"tool": "git_diff",
"arguments": {
"staged": false
}
}
// Get status
{
"tool": "git_status",
"arguments": {}
}
```
---
## π οΈ Available Tools
| Tool Name | Description | Input |
| ----------------- | -------------------------------------------- | ------------------------------ |
| `code_agent` | AI coding assistant with multi-model routing | task, context, quality |
| `run_vitest` | Run Vitest unit/integration tests | testPath (optional) |
| `run_playwright` | Run Playwright E2E tests | testPath (optional) |
| `fs_read` | Read file contents | path |
| `fs_write` | Write file contents | path, content |
| `fs_list` | List directory contents | path |
| `git_diff` | Get git diff | path (optional), staged (bool) |
| `git_status` | Get git status | none |
---
## ποΈ Model Layers
### Layer L0 - Free/Cheapest
- **Models**: Mistral 7B Free, Qwen 2 7B Free, OSS Local
- **Cost**: $0
- **Use for**: Simple tasks, drafts, code review
- **Capabilities**: Basic code, general knowledge
### Layer L1 - Low Cost
- **Models**: Gemini Flash 1.5, GPT-4o Mini
- **Cost**: ~$0.08-0.75 per 1M tokens
- **Use for**: Standard coding tasks, refactoring
- **Capabilities**: Code, reasoning, vision
### Layer L2 - Mid-tier
- **Models**: Claude 3 Haiku, GPT-4o
- **Cost**: ~$1.38-12.5 per 1M tokens
- **Use for**: Complex tasks, high-quality requirements
- **Capabilities**: Advanced code, reasoning, vision
### Layer L3 - Premium
- **Models**: Claude 3.5 Sonnet, OpenAI o1
- **Cost**: ~$18-60 per 1M tokens
- **Use for**: Critical tasks, architecture design
- **Capabilities**: SOTA performance, deep reasoning
---
## π» Development
### Project Structure
```
ai-mcp-gateway/
βββ src/
β βββ index.ts # Entry point
β βββ config/ # Configuration
β β βββ env.ts
β β βββ models.ts
β βββ mcp/ # MCP server
β β βββ server.ts
β β βββ types.ts
β βββ routing/ # Routing engine
β β βββ router.ts
β β βββ cost.ts
β βββ tools/ # MCP tools
β β βββ codeAgent/
β β βββ llm/
β β βββ testing/
β β βββ fs/
β β βββ git/
β βββ logging/ # Logging & metrics
β βββ logger.ts
β βββ metrics.ts
βββ tests/ # Tests
β βββ unit/
β βββ integration/
β βββ regression/
βββ docs/ # Documentation
β βββ ai-orchestrator-notes.md
β βββ ai-routing-heuristics.md
β βββ ai-common-bugs-and-fixes.md
βββ playwright/ # E2E tests
βββ package.json
βββ tsconfig.json
βββ vitest.config.ts
βββ playwright.config.ts
```
### Scripts
```bash
# Development
pnpm dev # Watch mode with auto-rebuild
pnpm build # Build for production
pnpm start # Run built server
# Testing
pnpm test # Run all Vitest tests
pnpm test:watch # Run tests in watch mode
pnpm test:ui # Run tests with UI
pnpm test:e2e # Run Playwright E2E tests
# Code Quality
pnpm type-check # TypeScript type checking
pnpm lint # ESLint
pnpm format # Prettier
```
---
## π§ͺ Testing
### Unit Tests
```bash
# Run all unit tests
pnpm test
# Run specific test file
pnpm vitest tests/unit/routing.test.ts
# Watch mode
pnpm test:watch
```
### Integration Tests
Integration tests verify interactions between components:
```bash
pnpm vitest tests/integration/
```
### Regression Tests
Regression tests prevent previously fixed bugs from reoccurring:
```bash
pnpm vitest tests/regression/
```
### E2E Tests
End-to-end tests using Playwright:
```bash
pnpm test:e2e
```
---
## π Self-Improvement
The gateway includes a self-improvement system:
### 1. **Bug Tracking** (`docs/ai-common-bugs-and-fixes.md`)
- Documents encountered bugs
- Includes root causes and fixes
- Links to regression tests
### 2. **Pattern Learning** (`docs/ai-orchestrator-notes.md`)
- Tracks successful patterns
- Records optimization opportunities
- Documents lessons learned
### 3. **Routing Refinement** (`docs/ai-routing-heuristics.md`)
- Defines routing rules
- Documents when to escalate
- Model capability matrix
### Adding to Self-Improvement Docs
When you discover a bug or pattern:
1. **Document it** in the appropriate file
2. **Create a regression test** in `tests/regression/`
3. **Update routing heuristics** if needed
4. **Run tests** to verify the fix
---
## π€ Contributing
Contributions are welcome! Please:
1. Fork the repository
2. Create a feature branch
3. Make your changes with tests
4. Update documentation
5. Submit a pull request
### Adding a New Model
1. Update `src/config/models.ts`:
```typescript
{
id: 'new-model-id',
provider: 'provider-name',
// ... config
}
```
2. Add provider client if needed in `src/tools/llm/`
3. Update `docs/ai-routing-heuristics.md`
### Adding a New Tool
1. Create tool in `src/tools/yourtool/index.ts`:
```typescript
export const yourTool = {
name: 'your_tool',
description: '...',
inputSchema: { ... },
handler: async (args) => { ... }
};
```
2. Register in `src/mcp/server.ts`
3. Add tests in `tests/unit/`
---
## π License
MIT License - see [LICENSE](LICENSE) file for details
---
## π Acknowledgments
- [Model Context Protocol](https://modelcontextprotocol.io/) by Anthropic
- [OpenRouter](https://openrouter.ai/) for unified LLM access
- All the amazing open-source LLM providers
---
## π Support
- **Issues**: [GitHub Issues](https://github.com/yourusername/ai-mcp-gateway/issues)
- **Discussions**: [GitHub Discussions](https://github.com/yourusername/ai-mcp-gateway/discussions)
- **Documentation**: [Wiki](https://github.com/yourusername/ai-mcp-gateway/wiki)
---
## πΊοΈ Roadmap
- [ ] Token usage analytics dashboard
- [ ] Caching layer for repeated queries
- [ ] More LLM providers (Google AI, Cohere, etc.)
- [ ] Streaming response support
- [ ] Web UI for configuration and monitoring
- [ ] Batch processing optimizations
- [ ] Advanced prompt templates
- [ ] A/B testing framework
---
**Made with β€οΈ for efficient AI orchestration**