README.mdโข6.45 kB
# ๐ง MCP Server (Model Compute Paradigm)
A modular, production-ready FastAPI server built to route and orchestrate multiple AI/LLM-powered models behind a unified, scalable interface. It supports **streaming chat**, **LLM-based routing**, and **multi-model pipelines** (like analyze โ summarize โ recommend) โ all asynchronously and fully Dockerized.
---
## ๐ฏ Project Score (Production Readiness)
| Capability | Status | Details |
|-------------------------------|------------|-------------------------------------------------------------------------------|
| ๐ง Multi-Model Orchestration | โ
Complete | Dynamic routing between `chat`, `summarize`, `sentiment`, `recommend` |
| ๐ค LLM-Based Task Router | โ
Complete | GPT-powered routing via `"auto"` task type |
| ๐ Async FastAPI + Concurrency | โ
Complete | Async/await + concurrent task execution with simulated/model API delays |
| ๐ GPT Streaming Support | โ
Complete | `text/event-stream` chunked responses for chat endpoints |
| ๐งช Unit + Mocked API Tests | โ
Complete | Pytest-based test suite with mocked `run()` responses |
| ๐ณ Dockerized + Clean Layout | โ
Complete | Python 3.13 base image, no Conda dependency, production-ready Dockerfile |
| ๐ฆ Metadata-Driven Registry | โ
Complete | Model metadata loaded from external YAML config |
| ๐ Rate Limiting & Retry | โณ In Progress | Handles 429 retry loop; rate limiting controls WIP |
| ๐งช CI + Docs | โณ Next | GitHub Actions + Swagger/Redoc planned |
---
## ๐งฉ Why This Project? (Motivation)
Modern ML/LLM deployments often involve:
- Multiple task types and model backends (OpenAI, HF, local, REST)
- Routing decisions based on input intent
- Combining outputs of multiple models (e.g., `summarize` + `recommend`)
- Handling 429 retries, async concurrency, streaming responses
๐ง However, building such an **LLM backend API server** that is:
- Async + concurrent
- Streamable
- Pluggable (via metadata)
- Testable
- Dockerized
โฆ is **non-trivial** and not easily found in one single place.
---
## ๐ก What Weโve Built (Solution)
This repo is a **production-ready PoC** of an MCP (Model-Compute Paradigm) architecture:
- โ
**FastAPI-based microserver** to handle multiple tasks via `/task` endpoint
- โ
Task router that can:
- ๐ Dispatch to specific model types (`chat`, `sentiment`, `summarize`, `recommend`)
- ๐ค Use an LLM to infer which task to run (`auto`)
- ๐ง Run multiple models in sequence (`analyze`)
- โ
GPT streaming via `text/event-stream`
- โ
Async/await enabled architecture for concurrency
- โ
Clean modular code for easy extension
- โ
Dockerized for deployment
- โ
Tested using Pytest with mocking
---
## ๐ ๏ธ Use Cases
| Use Case | MCP Server Support |
|----------------------------------------|-----------------------------------------------|
| Build your own ChatGPT-style API | โ
`chat` task with streaming |
| Build intelligent task router | โ
`auto` task with GPT-powered intent parsing |
| Build AI pipelines (like RAG/RL) | โ
`analyze` task with sequential execution |
| Swap between OpenAI/HuggingFace APIs | โ
Via `model_registry.yaml` config |
| Add custom models (e.g., OCR, vision) | โ
Just add a new module + registry entry |
---
## ๐ Features
- โ
**Async FastAPI** server
- ๐ง **Task-based Model Routing** (`chat`, `sentiment`, `recommender`, `summarize`)
- ๐ **Model Registry** from YAML/JSON
- ๐ **Automatic Retry** and **Rate Limit Handling** for APIs
- ๐ **Streaming Responses** for Chat
- ๐งช **Unit Tests + Mocked API Calls**
- ๐ณ **Dockerized** for production deployment
- ๐ฆ Modular structure, ready for CI/CD
---
## ๐ Architecture Overview
```plaintext
โโโโโโโโโโโโโโ
โ Frontend โ
โโโโโโโฌโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโ YAML/JSON
โ FastAPI โโโโโโโ Model Registry
โ Server โ โ
โโโโโโโฌโโโโโโโ โผ
โโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโ
โ โ โ
โผ โผ โผ
[chat] [sentiment] [recommender]
GPT-4 HF pipeline stub logic / API
---
๐ Setup
๐ฆ Install dependencies
git clone https://github.com/YOUR_USERNAME/mcp-server.git
cd mcp-server
---
# Optional: create virtualenv
python -m venv .venv
source .venv/bin/activate # or .venv\Scripts\activate on Windows
or
conda create -n <env_name>
conda activate <env_name>
pip install -r requirements.txt
โถ๏ธ Run the server
uvicorn app:app --reload
Access the docs at: http://localhost:8000/docs
๐งช Running Tests
pytest tests/
Unit tests mock external API calls using unittest.mock.AsyncMock.
๐ณ Docker Support
๐จ Build image
docker build -t mcp-server .
๐ Run container
docker run -p 8000:8000 mcp-server
๐งฐ Example API Request
curl -X POST http://localhost:8000/task \
-H "Content-Type: application/json" \
-d '{
"type": "chat",
"input": "What are the benefits of restorative yoga?"
}'
๐ Directory Structure
mcp/
โโโ app.py # FastAPI entry
โโโ models/ # ML models (chat, sentiment, etc.)
โโโ agent/
โ โโโ task_router.py # Task router
โ โโโ model_registry.py # Registry loader
โโโ registry/models.yaml # YAML registry of model metadata
โโโ tests/ # Unit tests
โโโ Dockerfile
โโโ requirements.txt
โโโ README.md
โโโ .env / .gitignore
๐ค Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what youโd like to change.
๐ License
MIT
โจ Author
Built by Sriram Kumar Reddy Challa