Provides streaming chat capabilities using OpenAI's GPT models with automatic retry logic and rate limit handling for conversational AI tasks.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@ML Task Router MCP Serversummarize this article about climate change"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
π§ MCP Server (Model Compute Paradigm)
A modular, production-ready FastAPI server built to route and orchestrate multiple AI/LLM-powered models behind a unified, scalable interface. It supports streaming chat, LLM-based routing, and multi-model pipelines (like analyze β summarize β recommend) β all asynchronously and fully Dockerized.
π― Project Score (Production Readiness)
Capability | Status | Details |
π§ Multi-Model Orchestration | β Complete | Dynamic routing between |
π€ LLM-Based Task Router | β Complete | GPT-powered routing via |
π Async FastAPI + Concurrency | β Complete | Async/await + concurrent task execution with simulated/model API delays |
π GPT Streaming Support | β Complete |
|
π§ͺ Unit + Mocked API Tests | β Complete | Pytest-based test suite with mocked |
π³ Dockerized + Clean Layout | β Complete | Python 3.13 base image, no Conda dependency, production-ready Dockerfile |
π¦ Metadata-Driven Registry | β Complete | Model metadata loaded from external YAML config |
π Rate Limiting & Retry | β³ In Progress | Handles 429 retry loop; rate limiting controls WIP |
π§ͺ CI + Docs | β³ Next | GitHub Actions + Swagger/Redoc planned |
π§© Why This Project? (Motivation)
Modern ML/LLM deployments often involve:
Multiple task types and model backends (OpenAI, HF, local, REST)
Routing decisions based on input intent
Combining outputs of multiple models (e.g.,
summarize+recommend)Handling 429 retries, async concurrency, streaming responses
π§ However, building such an LLM backend API server that is:
Async + concurrent
Streamable
Pluggable (via metadata)
Testable
Dockerized β¦ is non-trivial and not easily found in one single place.
π‘ What Weβve Built (Solution)
This repo is a production-ready PoC of an MCP (Model-Compute Paradigm) architecture:
β FastAPI-based microserver to handle multiple tasks via
/taskendpointβ Task router that can:
π Dispatch to specific model types (
chat,sentiment,summarize,recommend)π€ Use an LLM to infer which task to run (
auto)π§ Run multiple models in sequence (
analyze)
β GPT streaming via
text/event-streamβ Async/await enabled architecture for concurrency
β Clean modular code for easy extension
β Dockerized for deployment
β Tested using Pytest with mocking
π οΈ Use Cases
Use Case | MCP Server Support |
Build your own ChatGPT-style API | β
|
Build intelligent task router | β
|
Build AI pipelines (like RAG/RL) | β
|
Swap between OpenAI/HuggingFace APIs | β
Via |
Add custom models (e.g., OCR, vision) | β Just add a new module + registry entry |
π Features
β Async FastAPI server
π§ Task-based Model Routing (
chat,sentiment,recommender,summarize)π Model Registry from YAML/JSON
π Automatic Retry and Rate Limit Handling for APIs
π Streaming Responses for Chat
π§ͺ Unit Tests + Mocked API Calls
π³ Dockerized for production deployment
π¦ Modular structure, ready for CI/CD