Skip to main content
Glama

🧠 MCP Server (Model Compute Paradigm)

A modular, production-ready FastAPI server built to route and orchestrate multiple AI/LLM-powered models behind a unified, scalable interface. It supports streaming chat, LLM-based routing, and multi-model pipelines (like analyze β†’ summarize β†’ recommend) – all asynchronously and fully Dockerized.


🎯 Project Score (Production Readiness)

Capability

Status

Details

🧠 Multi-Model Orchestration

βœ… Complete

Dynamic routing between chat, summarize, sentiment, recommend

πŸ€– LLM-Based Task Router

βœ… Complete

GPT-powered routing via "auto" task type

πŸ” Async FastAPI + Concurrency

βœ… Complete

Async/await + concurrent task execution with simulated/model API delays

πŸ”Š GPT Streaming Support

βœ… Complete

text/event-stream chunked responses for chat endpoints

πŸ§ͺ Unit + Mocked API Tests

βœ… Complete

Pytest-based test suite with mocked run() responses

🐳 Dockerized + Clean Layout

βœ… Complete

Python 3.13 base image, no Conda dependency, production-ready Dockerfile

πŸ“¦ Metadata-Driven Registry

βœ… Complete

Model metadata loaded from external YAML config

πŸ” Rate Limiting & Retry

⏳ In Progress

Handles 429 retry loop; rate limiting controls WIP

πŸ§ͺ CI + Docs

⏳ Next

GitHub Actions + Swagger/Redoc planned


🧩 Why This Project? (Motivation)

Modern ML/LLM deployments often involve:

  • Multiple task types and model backends (OpenAI, HF, local, REST)

  • Routing decisions based on input intent

  • Combining outputs of multiple models (e.g., summarize + recommend)

  • Handling 429 retries, async concurrency, streaming responses

πŸ”§ However, building such an LLM backend API server that is:

  • Async + concurrent

  • Streamable

  • Pluggable (via metadata)

  • Testable

  • Dockerized … is non-trivial and not easily found in one single place.


πŸ’‘ What We’ve Built (Solution)

This repo is a production-ready PoC of an MCP (Model-Compute Paradigm) architecture:

  • βœ… FastAPI-based microserver to handle multiple tasks via /task endpoint

  • βœ… Task router that can:

    • πŸ” Dispatch to specific model types (chat, sentiment, summarize, recommend)

    • πŸ€– Use an LLM to infer which task to run (auto)

    • 🧠 Run multiple models in sequence (analyze)

  • βœ… GPT streaming via text/event-stream

  • βœ… Async/await enabled architecture for concurrency

  • βœ… Clean modular code for easy extension

  • βœ… Dockerized for deployment

  • βœ… Tested using Pytest with mocking


πŸ› οΈ Use Cases

Use Case

MCP Server Support

Build your own ChatGPT-style API

βœ… chat task with streaming

Build intelligent task router

βœ… auto task with GPT-powered intent parsing

Build AI pipelines (like RAG/RL)

βœ… analyze task with sequential execution

Swap between OpenAI/HuggingFace APIs

βœ… Via model_registry.yaml config

Add custom models (e.g., OCR, vision)

βœ… Just add a new module + registry entry


πŸš€ Features

  • βœ… Async FastAPI server

  • 🧠 Task-based Model Routing (chat, sentiment, recommender, summarize)

  • πŸ“„ Model Registry from YAML/JSON

  • πŸ” Automatic Retry and Rate Limit Handling for APIs

  • πŸ”„ Streaming Responses for Chat

  • πŸ§ͺ Unit Tests + Mocked API Calls

  • 🐳 Dockerized for production deployment

  • πŸ“¦ Modular structure, ready for CI/CD


πŸ— Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Frontend β”‚ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” YAML/JSON β”‚ FastAPI │◄────┐ Model Registry β”‚ Server β”‚ β”‚ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ β–Ό β–Ό β–Ό [chat] [sentiment] [recommender] GPT-4 HF pipeline stub logic / API --- πŸ›  Setup πŸ“¦ Install dependencies git clone https://github.com/YOUR_USERNAME/mcp-server.git cd mcp-server --- # Optional: create virtualenv python -m venv .venv source .venv/bin/activate # or .venv\Scripts\activate on Windows or conda create -n <env_name> conda activate <env_name> pip install -r requirements.txt ▢️ Run the server uvicorn app:app --reload Access the docs at: http://localhost:8000/docs πŸ§ͺ Running Tests pytest tests/ Unit tests mock external API calls using unittest.mock.AsyncMock. 🐳 Docker Support πŸ”¨ Build image docker build -t mcp-server . πŸš€ Run container docker run -p 8000:8000 mcp-server 🧰 Example API Request curl -X POST http://localhost:8000/task \ -H "Content-Type: application/json" \ -d '{ "type": "chat", "input": "What are the benefits of restorative yoga?" }' πŸ” Directory Structure mcp/ β”œβ”€β”€ app.py # FastAPI entry β”œβ”€β”€ models/ # ML models (chat, sentiment, etc.) β”œβ”€β”€ agent/ β”‚ β”œβ”€β”€ task_router.py # Task router β”‚ └── model_registry.py # Registry loader β”œβ”€β”€ registry/models.yaml # YAML registry of model metadata β”œβ”€β”€ tests/ # Unit tests β”œβ”€β”€ Dockerfile β”œβ”€β”€ requirements.txt β”œβ”€β”€ README.md └── .env / .gitignore 🀝 Contributing Pull requests are welcome. For major changes, please open an issue first to discuss what you’d like to change. πŸ“„ License MIT ✨ Author Built by Sriram Kumar Reddy Challa
-
security - not tested
F
license - not found
-
quality - not tested

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/sriramkreddy10/mcp_server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server