AI MCP Gateway

Cost-Optimized Multi-Model Orchestrator with Stateless Architecture

An intelligent Model Context Protocol (MCP) server and HTTP API that orchestrates multiple AI models (free and paid) with dynamic N-layer routing, cross-checking, cost optimization, and stateless context management via Redis + PostgreSQL.

TypeScript Node.js MCP License

✨ Features

Core Features

🎯 Smart Routing: Dynamic N-layer routing based on task complexity and quality requirements
💰 Cost Optimization: Prioritizes free/cheap models, escalates only when necessary
✅ Cross-Checking: Multiple models review each other's work for higher quality
🔧 Code Agent: Specialized AI agent for coding tasks with TODO-driven workflow
🧪 Test Integration: Built-in Vitest and Playwright test runners
📊 Metrics & Logging: Track costs, tokens, and performance
🔄 Self-Improvement: Documents patterns, bugs, and routing heuristics
🛠️ Extensible: Easy to add new models, providers, and tools

NEW: Stateless Architecture

🗄️ Redis Cache Layer: Hot storage for LLM responses, context summaries, routing hints
💾 PostgreSQL Database: Cold storage for conversations, messages, LLM calls, analytics
🌐 HTTP API Mode: Stateless REST API with /v1/route, /v1/code-agent, /v1/chat endpoints
📦 Context Management: Two-tier context with hot (Redis) + cold (DB) layers
🔗 Handoff Packages: Optimized inter-layer communication for model escalation
📝 TODO Tracking: Persistent GitHub Copilot-style TODO lists with Redis/DB storage

📋 Table of Contents

🚀 Quick Start

Prerequisites

Node.js >= 20.0.0
npm or pnpm (recommended)
API keys for desired providers (OpenRouter, Anthropic, OpenAI)
Optional: Redis (for caching)
Optional: PostgreSQL (for persistence)

Installation

# Clone the repository git clone https://github.com/yourusername/ai-mcp-gateway.git cd ai-mcp-gateway # Install dependencies npm install # Copy environment template cp .env.example .env # Edit .env and add your API keys and database settings nano .env

Build

# Build the project npm run build # Or run in development mode npm run dev

🏗️ Architecture

Stateless Design

The AI MCP Gateway is designed as a stateless application with external state management:

┌─────────────────────────────────────────────────┐ │ AI MCP Gateway (Stateless) │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ MCP Server │ │ HTTP API │ │ │ │ (stdio) │ │ (REST) │ │ │ └──────┬───────┘ └──────┬───────┘ │ │ │ │ │ │ └─────────┬───────────┘ │ │ │ │ │ ┌─────────▼──────────┐ │ │ │ Routing Engine │ │ │ │ Context Manager │ │ │ └─────────┬──────────┘ │ └───────────────────┼─────────────────────────────┘ │ ┌───────────┼───────────┐ │ │ │ ┌────▼────┐ ┌───▼────┐ ┌───▼────┐ │ Redis │ │ DB │ │ LLMs │ │ (Hot) │ │(Cold) │ │ │ └─────────┘ └────────┘ └────────┘

Two-Tier Context Management

Hot Layer (Redis)
- Context summaries (conv:summary:{conversationId})
- Recent messages cache (conv:messages:{conversationId})
- LLM response cache (llm:cache:{model}:{hash})
- TODO lists (todo:list:{conversationId})
- TTL: 30-60 minutes
Cold Layer (PostgreSQL)
- Full conversation history
- All messages with metadata
- Context summaries (versioned)
- LLM call logs (tokens, cost, duration)
- Routing rules and analytics
- Persistent storage

🔄 Dual Mode Operation

The gateway supports two modes:

1. MCP Mode (stdio)

Standard Model Context Protocol server for desktop clients.

npm run start:mcp # or npm start

Configure in Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):

{ "mcpServers": { "ai-mcp-gateway": { "command": "node", "args": ["/path/to/ai-mcp-gateway/dist/index.js"] } } }

2. HTTP API Mode

Stateless REST API for web services and integrations.

npm run start:api # or MODE=api npm start

API runs on http://localhost:3000 (configurable via API_PORT).

🌐 HTTP API Usage

Endpoints

POST /v1/route

Intelligent model selection and routing.

curl -X POST http://localhost:3000/v1/route \ -H "Content-Type: application/json" \ -d '{ "conversationId": "conv-123", "message": "Explain async/await in JavaScript", "userId": "user-1", "qualityLevel": "normal" }'

Response:

{ "result": { "response": "Async/await is...", "model": "anthropic/claude-sonnet-4", "provider": "anthropic" }, "routing": { "summary": "L0 -> primary model", "fromCache": false }, "context": { "conversationId": "conv-123" }, "performance": { "durationMs": 1234, "tokens": { "input": 50, "output": 200 }, "cost": 0.002 } }

POST /v1/code-agent

Specialized coding assistant.

curl -X POST http://localhost:3000/v1/code-agent \ -H "Content-Type: application/json" \ -d '{ "conversationId": "conv-123", "task": "Create a React component for user profile", "files": ["src/components/UserProfile.tsx"] }'

POST /v1/chat

General chat endpoint with context.

curl -X POST http://localhost:3000/v1/chat \ -H "Content-Type: application/json" \ -d '{ "conversationId": "conv-123", "message": "What did we discuss earlier?" }'

GET /v1/context/:conversationId

Retrieve conversation context.

curl http://localhost:3000/v1/context/conv-123

GET /health

Health check endpoint.

curl http://localhost:3000/health

Response:

{ "status": "ok", "redis": true, "database": true, "timestamp": "2025-11-22T06:42:00.000Z" }

"args": ["/path/to/ai-mcp-gateway/dist/index.js"] }

} }

### Start the Server ```bash # Run the built server pnpm start # Or use the binary directly node dist/index.js

🏗️ Architecture

High-Level Overview

┌─────────────────────────────────────────────────────────┐ │ MCP Client │ │ (Claude Desktop, VS Code, etc.) │ └───────────────────────┬─────────────────────────────────┘ │ MCP Protocol ┌───────────────────────▼─────────────────────────────────┐ │ AI MCP Gateway Server │ │ │ │ ┌─────────────────────────────────────────────────┐ │ │ │ Tools Registry │ │ │ │ • code_agent • run_vitest │ │ │ │ • run_playwright • fs_read/write │ │ │ │ • git_diff • git_status │ │ │ └──────────────────┬──────────────────────────────┘ │ │ │ │ │ ┌──────────────────▼──────────────────────────────┐ │ │ │ Routing Engine │ │ │ │ • Task classification │ │ │ │ • Layer selection (L0→L1→L2→L3) │ │ │ │ • Cross-check orchestration │ │ │ │ • Auto-escalation │ │ │ └──────────────────┬──────────────────────────────┘ │ │ │ │ │ ┌──────────────────▼──────────────────────────────┐ │ │ │ LLM Clients │ │ │ │ • OpenRouter • Anthropic │ │ │ │ • OpenAI • OSS Local │ │ │ └──────────────────┬──────────────────────────────┘ │ └───────────────────────┼─────────────────────────────────┘ │ ┌───────────────┼───────────────┐ │ │ │ ┌───────▼──────┐ ┌─────▼──────┐ ┌─────▼──────┐ │ Free Models │ │ Paid Models│ │Local Models│ │ (Layer L0) │ │(Layer L1-L3)│ │ (Layer L0)│ └──────────────┘ └────────────┘ └────────────┘

Key Components

1. MCP Server (`src/mcp/`)

Handles MCP protocol communication
Registers and dispatches tools
Manages request/response lifecycle

2. Routing Engine (`src/routing/`)

Classifies tasks by type, complexity, quality
Selects optimal model layer
Orchestrates cross-checking between models
Auto-escalates when needed

3. LLM Clients (`src/tools/llm/`)

Unified interface for multiple providers
Handles API calls, token counting, cost calculation
Supports: OpenRouter, Anthropic, OpenAI, local models

4. Tools (`src/tools/`)

Code Agent: Main AI coding assistant
Testing: Vitest and Playwright runners
File System: Read/write/list operations
Git: Diff and status operations

5. Logging & Metrics (`src/logging/`)

Winston-based structured logging
Cost tracking and alerts
Performance metrics

🛠️ Available MCP Tools

The gateway exposes 14 MCP tools for various operations:

Code & Development Tools

Tool

Description

Key Parameters

code_agent

AI coding assistant with TODO tracking

task

context

quality

Testing Tools

Tool

Description

Key Parameters

run_vitest

Execute Vitest unit/integration tests

testPath

watch

run_playwright

Execute Playwright E2E tests

testPath

File System Tools

Tool	Description	Key Parameters
`fs_read`	Read file contents	`path` , `encoding`
`fs_write`	Write file contents	`path` , `content`
`fs_list`	List directory contents	`path` , `recursive`

Git Tools

Tool	Description	Key Parameters
`git_diff`	Show git diff	`staged`
`git_status`	Show git status	-

NEW: Cache Tools (Redis)

Tool	Description	Key Parameters
`redis_get`	Get value from Redis cache	`key`
`redis_set`	Set value in Redis cache	`key` , `value` , `ttl`
`redis_del`	Delete key from Redis cache	`key`

NEW: Database Tools (PostgreSQL)

Tool	Description	Key Parameters
`db_query`	Execute SQL query	`sql` , `params`
`db_insert`	Insert row into table	`table` , `data`
`db_update`	Update rows in table	`table` , `where` , `data`

Tool Usage Examples

Using Redis cache:

{ "tool": "redis_set", "arguments": { "key": "user:profile:123", "value": {"name": "John", "role": "admin"}, "ttl": 3600 } }

Querying database:

{ "tool": "db_query", "arguments": { "sql": "SELECT * FROM conversations WHERE user_id = $1 LIMIT 10", "params": ["user-123"] } }

📦 Context Management

How Context Works

Conversation Initialization
- Client sends conversationId with each request
- Gateway checks Redis for existing context summary
- Falls back to DB if Redis miss
- Creates new conversation if not exists
Context Storage
- Summary: Compressed project context (stack, architecture, decisions)
- Messages: Recent messages (last 50 in Redis, all in DB)
- TODO Lists: Persistent task tracking
- Metadata: User, project, timestamps
Context Compression
- When context grows large (>50 messages):
  - System generates new summary
  - Keeps only recent 5-10 messages in detail
  - Older messages summarized into context
- Reduces token usage while maintaining relevance
Context Handoff
- When escalating between layers:
  - Creates handoff package with:
    - Context summary
    - Current task
    - Previous attempts
    - Known issues
    - Request to higher layer
  - Optimized for minimal tokens

Database Schema

-- Conversations CREATE TABLE conversations ( id TEXT PRIMARY KEY, user_id TEXT, project_id TEXT, created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW(), metadata JSONB DEFAULT '{}'::jsonb ); -- Messages CREATE TABLE messages ( id SERIAL PRIMARY KEY, conversation_id TEXT REFERENCES conversations(id), role TEXT NOT NULL, content TEXT NOT NULL, metadata JSONB DEFAULT '{}'::jsonb, created_at TIMESTAMP DEFAULT NOW() ); -- Context summaries CREATE TABLE context_summaries ( id SERIAL PRIMARY KEY, conversation_id TEXT REFERENCES conversations(id), summary TEXT NOT NULL, version INTEGER DEFAULT 1, created_at TIMESTAMP DEFAULT NOW() ); -- LLM call logs CREATE TABLE llm_calls ( id SERIAL PRIMARY KEY, conversation_id TEXT REFERENCES conversations(id), model_id TEXT NOT NULL, layer TEXT NOT NULL, input_tokens INTEGER DEFAULT 0, output_tokens INTEGER DEFAULT 0, estimated_cost DECIMAL(10, 6) DEFAULT 0, duration_ms INTEGER, success BOOLEAN DEFAULT true, created_at TIMESTAMP DEFAULT NOW() ); -- TODO lists CREATE TABLE todo_lists ( id SERIAL PRIMARY KEY, conversation_id TEXT REFERENCES conversations(id), todo_data JSONB NOT NULL, created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW() );

⚙️ Configuration

Environment Variables

Create a .env file (use .env.example as template):

# MCP Server MCP_SERVER_NAME=ai-mcp-gateway MCP_SERVER_VERSION=0.1.0 # API Keys OPENROUTER_API_KEY=sk-or-v1-... ANTHROPIC_API_KEY=sk-ant-... OPENAI_API_KEY=sk-... # OSS/Local Models (optional) OSS_MODEL_ENDPOINT=http://localhost:11434 OSS_MODEL_ENABLED=false # Redis REDIS_HOST=localhost REDIS_PORT=6379 REDIS_PASSWORD= REDIS_DB=0 # PostgreSQL DATABASE_URL=postgresql://user:pass@localhost:5432/ai_mcp_gateway DB_HOST=localhost DB_PORT=5432 DB_NAME=ai_mcp_gateway DB_USER=postgres DB_PASSWORD= DB_SSL=false # HTTP API API_PORT=3000 API_HOST=0.0.0.0 API_CORS_ORIGIN=* # Logging LOG_LEVEL=info LOG_FILE=logs/ai-mcp-gateway.log # Routing Configuration DEFAULT_LAYER=L0 ENABLE_CROSS_CHECK=true ENABLE_AUTO_ESCALATE=true MAX_ESCALATION_LAYER=L2 # Cost Tracking ENABLE_COST_TRACKING=true COST_ALERT_THRESHOLD=1.00 # Mode MODE=mcp # or 'api' for HTTP server

Model Configuration

Edit src/config/models.ts to:

Add/remove models
Adjust layer assignments
Update pricing
Enable/disable models

Example:

{ id: 'my-custom-model', provider: 'openrouter', apiModelName: 'provider/model-name', layer: 'L1', relativeCost: 5, pricePer1kInputTokens: 0.001, pricePer1kOutputTokens: 0.002, capabilities: { code: true, general: true, reasoning: true, }, contextWindow: 100000, enabled: true, }

📖 Usage

Using the Code Agent

The Code Agent is the primary tool for coding tasks:

// Example MCP client call { "tool": "code_agent", "arguments": { "task": "Create a TypeScript function to validate email addresses", "context": { "language": "typescript", "requirements": [ "Use regex pattern", "Handle edge cases", "Include unit tests" ] }, "quality": "high" } }

Response includes:

Generated code
Routing summary (which models were used)
Token usage and cost
Quality assessment

Running Tests

// Run Vitest tests { "tool": "run_vitest", "arguments": { "testPath": "tests/unit/mytest.test.ts" } } // Run Playwright E2E tests { "tool": "run_playwright", "arguments": { "testPath": "tests/e2e/login.spec.ts" } }

File Operations

// Read file { "tool": "fs_read", "arguments": { "path": "/path/to/file.ts" } } // Write file { "tool": "fs_write", "arguments": { "path": "/path/to/output.ts", "content": "console.log('Hello');" } } // List directory { "tool": "fs_list", "arguments": { "path": "/path/to/directory" } }

Git Operations

// Get diff { "tool": "git_diff", "arguments": { "staged": false } } // Get status { "tool": "git_status", "arguments": {} }

🛠️ Available Tools

Tool Name	Description	Input
`code_agent`	AI coding assistant with multi-model routing	task, context, quality
`run_vitest`	Run Vitest unit/integration tests	testPath (optional)
`run_playwright`	Run Playwright E2E tests	testPath (optional)
`fs_read`	Read file contents	path
`fs_write`	Write file contents	path, content
`fs_list`	List directory contents	path
`git_diff`	Get git diff	path (optional), staged (bool)
`git_status`	Get git status	none

🎚️ Model Layers

Layer L0 - Free/Cheapest

Models: Mistral 7B Free, Qwen 2 7B Free, OSS Local
Cost: $0
Use for: Simple tasks, drafts, code review
Capabilities: Basic code, general knowledge

Layer L1 - Low Cost

Models: Gemini Flash 1.5, GPT-4o Mini
Cost: ~$0.08-0.75 per 1M tokens
Use for: Standard coding tasks, refactoring
Capabilities: Code, reasoning, vision

Layer L2 - Mid-tier

Models: Claude 3 Haiku, GPT-4o
Cost: ~$1.38-12.5 per 1M tokens
Use for: Complex tasks, high-quality requirements
Capabilities: Advanced code, reasoning, vision

Layer L3 - Premium

Models: Claude 3.5 Sonnet, OpenAI o1
Cost: ~$18-60 per 1M tokens
Use for: Critical tasks, architecture design
Capabilities: SOTA performance, deep reasoning

💻 Development

Project Structure

ai-mcp-gateway/ ├── src/ │ ├── index.ts # Entry point │ ├── config/ # Configuration │ │ ├── env.ts │ │ └── models.ts │ ├── mcp/ # MCP server │ │ ├── server.ts │ │ └── types.ts │ ├── routing/ # Routing engine │ │ ├── router.ts │ │ └── cost.ts │ ├── tools/ # MCP tools │ │ ├── codeAgent/ │ │ ├── llm/ │ │ ├── testing/ │ │ ├── fs/ │ │ └── git/ │ └── logging/ # Logging & metrics │ ├── logger.ts │ └── metrics.ts ├── tests/ # Tests │ ├── unit/ │ ├── integration/ │ └── regression/ ├── docs/ # Documentation │ ├── ai-orchestrator-notes.md │ ├── ai-routing-heuristics.md │ └── ai-common-bugs-and-fixes.md ├── playwright/ # E2E tests ├── package.json ├── tsconfig.json ├── vitest.config.ts └── playwright.config.ts

Scripts

# Development pnpm dev # Watch mode with auto-rebuild pnpm build # Build for production pnpm start # Run built server # Testing pnpm test # Run all Vitest tests pnpm test:watch # Run tests in watch mode pnpm test:ui # Run tests with UI pnpm test:e2e # Run Playwright E2E tests # Code Quality pnpm type-check # TypeScript type checking pnpm lint # ESLint pnpm format # Prettier

🧪 Testing

Unit Tests

# Run all unit tests pnpm test # Run specific test file pnpm vitest tests/unit/routing.test.ts # Watch mode pnpm test:watch

Integration Tests

Integration tests verify interactions between components:

pnpm vitest tests/integration/

Regression Tests

Regression tests prevent previously fixed bugs from reoccurring:

pnpm vitest tests/regression/

E2E Tests

End-to-end tests using Playwright:

pnpm test:e2e

🔄 Self-Improvement

The gateway includes a self-improvement system:

1. Bug Tracking (`docs/ai-common-bugs-and-fixes.md`)

Documents encountered bugs
Includes root causes and fixes
Links to regression tests

2. Pattern Learning (`docs/ai-orchestrator-notes.md`)

Tracks successful patterns
Records optimization opportunities
Documents lessons learned

3. Routing Refinement (`docs/ai-routing-heuristics.md`)

Defines routing rules
Documents when to escalate
Model capability matrix

Adding to Self-Improvement Docs

When you discover a bug or pattern:

Document it in the appropriate file
Create a regression test in tests/regression/
Update routing heuristics if needed
Run tests to verify the fix

🤝 Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Make your changes with tests
Update documentation
Submit a pull request

Adding a New Model

Update src/config/models.ts:
{ id: 'new-model-id', provider: 'provider-name', // ... config }
Add provider client if needed in src/tools/llm/
Update docs/ai-routing-heuristics.md

Adding a New Tool

Create tool in src/tools/yourtool/index.ts:
export const yourTool = { name: 'your_tool', description: '...', inputSchema: { ... }, handler: async (args) => { ... } };
Register in src/mcp/server.ts
Add tests in tests/unit/

📄 License

MIT License - see LICENSE file for details

🙏 Acknowledgments

Model Context Protocol by Anthropic
OpenRouter for unified LLM access
All the amazing open-source LLM providers

📞 Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Documentation: Wiki

🗺️ Roadmap

Token usage analytics dashboard
Caching layer for repeated queries
More LLM providers (Google AI, Cohere, etc.)
Streaming response support
Web UI for configuration and monitoring
Batch processing optimizations
Advanced prompt templates
A/B testing framework

Made with ❤️ for efficient AI orchestration

AI MCP Gateway

✨ Features

Core Features

NEW: Stateless Architecture

📋 Table of Contents

🚀 Quick Start

Prerequisites

Installation

Build

🏗️ Architecture

Stateless Design

Two-Tier Context Management

🔄 Dual Mode Operation

1. MCP Mode (stdio)

2. HTTP API Mode

🌐 HTTP API Usage

Endpoints

POST /v1/route

POST /v1/code-agent

POST /v1/chat

GET /v1/context/:conversationId

GET /health

🏗️ Architecture

High-Level Overview

Key Components

1. MCP Server (src/mcp/)

2. Routing Engine (src/routing/)

3. LLM Clients (src/tools/llm/)

4. Tools (src/tools/)

5. Logging & Metrics (src/logging/)

🛠️ Available MCP Tools

Code & Development Tools

Testing Tools

File System Tools

Git Tools

NEW: Cache Tools (Redis)

NEW: Database Tools (PostgreSQL)

Tool Usage Examples

📦 Context Management

How Context Works

Database Schema

⚙️ Configuration

Environment Variables

Model Configuration

📖 Usage

Using the Code Agent

Running Tests

File Operations

Git Operations

🛠️ Available Tools

🎚️ Model Layers

Layer L0 - Free/Cheapest

Layer L1 - Low Cost

Layer L2 - Mid-tier

Layer L3 - Premium

💻 Development

Project Structure

Scripts

🧪 Testing

Unit Tests

Integration Tests

Regression Tests

E2E Tests

🔄 Self-Improvement

1. Bug Tracking (docs/ai-common-bugs-and-fixes.md)

2. Pattern Learning (docs/ai-orchestrator-notes.md)

3. Routing Refinement (docs/ai-routing-heuristics.md)

Adding to Self-Improvement Docs

🤝 Contributing

Adding a New Model

Adding a New Tool

📄 License

🙏 Acknowledgments

📞 Support

🗺️ Roadmap

Related Resources

New MCP Servers

MCP directory API

1. MCP Server (`src/mcp/`)

2. Routing Engine (`src/routing/`)

3. LLM Clients (`src/tools/llm/`)

4. Tools (`src/tools/`)

5. Logging & Metrics (`src/logging/`)

1. Bug Tracking (`docs/ai-common-bugs-and-fixes.md`)

2. Pattern Learning (`docs/ai-orchestrator-notes.md`)

3. Routing Refinement (`docs/ai-routing-heuristics.md`)