Skip to main content
Glama

๐Ÿง  MCP Server (Model Compute Paradigm)

A modular, production-ready FastAPI server built to route and orchestrate multiple AI/LLM-powered models behind a unified, scalable interface. It supports streaming chat, LLM-based routing, and multi-model pipelines (like analyze โ†’ summarize โ†’ recommend) โ€“ all asynchronously and fully Dockerized.


๐ŸŽฏ Project Score (Production Readiness)

Capability

Status

Details

๐Ÿง  Multi-Model Orchestration

โœ… Complete

Dynamic routing between

chat

,

summarize

,

sentiment

,

recommend

๐Ÿค– LLM-Based Task Router

โœ… Complete

GPT-powered routing via

"auto"

task type

๐Ÿ” Async FastAPI + Concurrency

โœ… Complete

Async/await + concurrent task execution with simulated/model API delays

๐Ÿ”Š GPT Streaming Support

โœ… Complete

text/event-stream

chunked responses for chat endpoints

๐Ÿงช Unit + Mocked API Tests

โœ… Complete

Pytest-based test suite with mocked

run()

responses

๐Ÿณ Dockerized + Clean Layout

โœ… Complete

Python 3.13 base image, no Conda dependency, production-ready Dockerfile

๐Ÿ“ฆ Metadata-Driven Registry

โœ… Complete

Model metadata loaded from external YAML config

๐Ÿ” Rate Limiting & Retry

โณ In Progress

Handles 429 retry loop; rate limiting controls WIP

๐Ÿงช CI + Docs

โณ Next

GitHub Actions + Swagger/Redoc planned


๐Ÿงฉ Why This Project? (Motivation)

Modern ML/LLM deployments often involve:

  • Multiple task types and model backends (OpenAI, HF, local, REST)

  • Routing decisions based on input intent

  • Combining outputs of multiple models (e.g., summarize + recommend)

  • Handling 429 retries, async concurrency, streaming responses

๐Ÿ”ง However, building such an LLM backend API server that is:

  • Async + concurrent

  • Streamable

  • Pluggable (via metadata)

  • Testable

  • Dockerized โ€ฆ is non-trivial and not easily found in one single place.


๐Ÿ’ก What Weโ€™ve Built (Solution)

This repo is a production-ready PoC of an MCP (Model-Compute Paradigm) architecture:

  • โœ… FastAPI-based microserver to handle multiple tasks via /task endpoint

  • โœ… Task router that can:

    • ๐Ÿ” Dispatch to specific model types (chat, sentiment, summarize, recommend)

    • ๐Ÿค– Use an LLM to infer which task to run (auto)

    • ๐Ÿง  Run multiple models in sequence (analyze)

  • โœ… GPT streaming via text/event-stream

  • โœ… Async/await enabled architecture for concurrency

  • โœ… Clean modular code for easy extension

  • โœ… Dockerized for deployment

  • โœ… Tested using Pytest with mocking


๐Ÿ› ๏ธ Use Cases

Use Case

MCP Server Support

Build your own ChatGPT-style API

โœ…

chat

task with streaming

Build intelligent task router

โœ…

auto

task with GPT-powered intent parsing

Build AI pipelines (like RAG/RL)

โœ…

analyze

task with sequential execution

Swap between OpenAI/HuggingFace APIs

โœ… Via

model_registry.yaml

config

Add custom models (e.g., OCR, vision)

โœ… Just add a new module + registry entry


๐Ÿš€ Features

  • โœ… Async FastAPI server

  • ๐Ÿง  Task-based Model Routing (chat, sentiment, recommender, summarize)

  • ๐Ÿ“„ Model Registry from YAML/JSON

  • ๐Ÿ” Automatic Retry and Rate Limit Handling for APIs

  • ๐Ÿ”„ Streaming Responses for Chat

  • ๐Ÿงช Unit Tests + Mocked API Calls

  • ๐Ÿณ Dockerized for production deployment

  • ๐Ÿ“ฆ Modular structure, ready for CI/CD


๐Ÿ— Architecture Overview

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Frontend โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ–ผ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” YAML/JSON โ”‚ FastAPI โ”‚โ—„โ”€โ”€โ”€โ”€โ” Model Registry โ”‚ Server โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ–ผ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ โ”‚ โ”‚ โ–ผ โ–ผ โ–ผ [chat] [sentiment] [recommender] GPT-4 HF pipeline stub logic / API --- ๐Ÿ›  Setup ๐Ÿ“ฆ Install dependencies git clone https://github.com/YOUR_USERNAME/mcp-server.git cd mcp-server --- # Optional: create virtualenv python -m venv .venv source .venv/bin/activate # or .venv\Scripts\activate on Windows or conda create -n <env_name> conda activate <env_name> pip install -r requirements.txt โ–ถ๏ธ Run the server uvicorn app:app --reload Access the docs at: http://localhost:8000/docs ๐Ÿงช Running Tests pytest tests/ Unit tests mock external API calls using unittest.mock.AsyncMock. ๐Ÿณ Docker Support ๐Ÿ”จ Build image docker build -t mcp-server . ๐Ÿš€ Run container docker run -p 8000:8000 mcp-server ๐Ÿงฐ Example API Request curl -X POST http://localhost:8000/task \ -H "Content-Type: application/json" \ -d '{ "type": "chat", "input": "What are the benefits of restorative yoga?" }' ๐Ÿ” Directory Structure mcp/ โ”œโ”€โ”€ app.py # FastAPI entry โ”œโ”€โ”€ models/ # ML models (chat, sentiment, etc.) โ”œโ”€โ”€ agent/ โ”‚ โ”œโ”€โ”€ task_router.py # Task router โ”‚ โ””โ”€โ”€ model_registry.py # Registry loader โ”œโ”€โ”€ registry/models.yaml # YAML registry of model metadata โ”œโ”€โ”€ tests/ # Unit tests โ”œโ”€โ”€ Dockerfile โ”œโ”€โ”€ requirements.txt โ”œโ”€โ”€ README.md โ””โ”€โ”€ .env / .gitignore ๐Ÿค Contributing Pull requests are welcome. For major changes, please open an issue first to discuss what youโ€™d like to change. ๐Ÿ“„ License MIT โœจ Author Built by Sriram Kumar Reddy Challa
-
security - not tested
F
license - not found
-
quality - not tested

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/sriramkreddy10/mcp_server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server