Skip to main content
Glama

ML Task Router MCP Server

README.mdโ€ข6.45 kB
# ๐Ÿง  MCP Server (Model Compute Paradigm) A modular, production-ready FastAPI server built to route and orchestrate multiple AI/LLM-powered models behind a unified, scalable interface. It supports **streaming chat**, **LLM-based routing**, and **multi-model pipelines** (like analyze โ†’ summarize โ†’ recommend) โ€“ all asynchronously and fully Dockerized. --- ## ๐ŸŽฏ Project Score (Production Readiness) | Capability | Status | Details | |-------------------------------|------------|-------------------------------------------------------------------------------| | ๐Ÿง  Multi-Model Orchestration | โœ… Complete | Dynamic routing between `chat`, `summarize`, `sentiment`, `recommend` | | ๐Ÿค– LLM-Based Task Router | โœ… Complete | GPT-powered routing via `"auto"` task type | | ๐Ÿ” Async FastAPI + Concurrency | โœ… Complete | Async/await + concurrent task execution with simulated/model API delays | | ๐Ÿ”Š GPT Streaming Support | โœ… Complete | `text/event-stream` chunked responses for chat endpoints | | ๐Ÿงช Unit + Mocked API Tests | โœ… Complete | Pytest-based test suite with mocked `run()` responses | | ๐Ÿณ Dockerized + Clean Layout | โœ… Complete | Python 3.13 base image, no Conda dependency, production-ready Dockerfile | | ๐Ÿ“ฆ Metadata-Driven Registry | โœ… Complete | Model metadata loaded from external YAML config | | ๐Ÿ” Rate Limiting & Retry | โณ In Progress | Handles 429 retry loop; rate limiting controls WIP | | ๐Ÿงช CI + Docs | โณ Next | GitHub Actions + Swagger/Redoc planned | --- ## ๐Ÿงฉ Why This Project? (Motivation) Modern ML/LLM deployments often involve: - Multiple task types and model backends (OpenAI, HF, local, REST) - Routing decisions based on input intent - Combining outputs of multiple models (e.g., `summarize` + `recommend`) - Handling 429 retries, async concurrency, streaming responses ๐Ÿ”ง However, building such an **LLM backend API server** that is: - Async + concurrent - Streamable - Pluggable (via metadata) - Testable - Dockerized โ€ฆ is **non-trivial** and not easily found in one single place. --- ## ๐Ÿ’ก What Weโ€™ve Built (Solution) This repo is a **production-ready PoC** of an MCP (Model-Compute Paradigm) architecture: - โœ… **FastAPI-based microserver** to handle multiple tasks via `/task` endpoint - โœ… Task router that can: - ๐Ÿ” Dispatch to specific model types (`chat`, `sentiment`, `summarize`, `recommend`) - ๐Ÿค– Use an LLM to infer which task to run (`auto`) - ๐Ÿง  Run multiple models in sequence (`analyze`) - โœ… GPT streaming via `text/event-stream` - โœ… Async/await enabled architecture for concurrency - โœ… Clean modular code for easy extension - โœ… Dockerized for deployment - โœ… Tested using Pytest with mocking --- ## ๐Ÿ› ๏ธ Use Cases | Use Case | MCP Server Support | |----------------------------------------|-----------------------------------------------| | Build your own ChatGPT-style API | โœ… `chat` task with streaming | | Build intelligent task router | โœ… `auto` task with GPT-powered intent parsing | | Build AI pipelines (like RAG/RL) | โœ… `analyze` task with sequential execution | | Swap between OpenAI/HuggingFace APIs | โœ… Via `model_registry.yaml` config | | Add custom models (e.g., OCR, vision) | โœ… Just add a new module + registry entry | --- ## ๐Ÿš€ Features - โœ… **Async FastAPI** server - ๐Ÿง  **Task-based Model Routing** (`chat`, `sentiment`, `recommender`, `summarize`) - ๐Ÿ“„ **Model Registry** from YAML/JSON - ๐Ÿ” **Automatic Retry** and **Rate Limit Handling** for APIs - ๐Ÿ”„ **Streaming Responses** for Chat - ๐Ÿงช **Unit Tests + Mocked API Calls** - ๐Ÿณ **Dockerized** for production deployment - ๐Ÿ“ฆ Modular structure, ready for CI/CD --- ## ๐Ÿ— Architecture Overview ```plaintext โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Frontend โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ–ผ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” YAML/JSON โ”‚ FastAPI โ”‚โ—„โ”€โ”€โ”€โ”€โ” Model Registry โ”‚ Server โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ–ผ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ โ”‚ โ”‚ โ–ผ โ–ผ โ–ผ [chat] [sentiment] [recommender] GPT-4 HF pipeline stub logic / API --- ๐Ÿ›  Setup ๐Ÿ“ฆ Install dependencies git clone https://github.com/YOUR_USERNAME/mcp-server.git cd mcp-server --- # Optional: create virtualenv python -m venv .venv source .venv/bin/activate # or .venv\Scripts\activate on Windows or conda create -n <env_name> conda activate <env_name> pip install -r requirements.txt โ–ถ๏ธ Run the server uvicorn app:app --reload Access the docs at: http://localhost:8000/docs ๐Ÿงช Running Tests pytest tests/ Unit tests mock external API calls using unittest.mock.AsyncMock. ๐Ÿณ Docker Support ๐Ÿ”จ Build image docker build -t mcp-server . ๐Ÿš€ Run container docker run -p 8000:8000 mcp-server ๐Ÿงฐ Example API Request curl -X POST http://localhost:8000/task \ -H "Content-Type: application/json" \ -d '{ "type": "chat", "input": "What are the benefits of restorative yoga?" }' ๐Ÿ” Directory Structure mcp/ โ”œโ”€โ”€ app.py # FastAPI entry โ”œโ”€โ”€ models/ # ML models (chat, sentiment, etc.) โ”œโ”€โ”€ agent/ โ”‚ โ”œโ”€โ”€ task_router.py # Task router โ”‚ โ””โ”€โ”€ model_registry.py # Registry loader โ”œโ”€โ”€ registry/models.yaml # YAML registry of model metadata โ”œโ”€โ”€ tests/ # Unit tests โ”œโ”€โ”€ Dockerfile โ”œโ”€โ”€ requirements.txt โ”œโ”€โ”€ README.md โ””โ”€โ”€ .env / .gitignore ๐Ÿค Contributing Pull requests are welcome. For major changes, please open an issue first to discuss what youโ€™d like to change. ๐Ÿ“„ License MIT โœจ Author Built by Sriram Kumar Reddy Challa

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/sriramkreddy10/mcp_server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server