Skip to main content
Glama
seligj95

Stateless MCP Server on Azure App Service

by seligj95

Stateless MCP Server on Azure App Service — 2026-07-28 edition

A reference implementation of a stateless, horizontally scaled MCP server built on the MCP 2026-07-28 specification and deployed behind Azure App Service's built-in load balancer.

The 2026-07-28 revision makes MCP stateless at the protocol level: it removes the initialize handshake and the Mcp-Session-Id header, so any instance can serve any request with no prior context. That is a perfect match for App Service's built-in load balancer — scale out and every instance is interchangeable.

Part 2. This is the sequel to You can scale MCP servers behind a load balancer on App Service — here's how, which scaled a 2025-11-25 server. The original sample lives at app-service-mcp-stateless-scale-python. This repo is the standalone 2026-07-28 version.

  • Stateless Streamable HTTP (MCP 2026-07-28) — no handshake, no session

  • Three App Service instances by default, no sticky sessions

  • Explicit-handle tool (tally) — the stateless replacement for session state

  • Spec-compliant Python client that exercises the new headers + _meta

  • Staging deployment slot for zero-downtime updates

  • Application Insights auto-instrumentation with per-instance request tagging

  • k6 load test that visualizes load distribution

What changed from 2025-11-25

Area

2025-11-25

2026-07-28 (this sample)

Handshake

initialize + notifications/initialized

Removed — every request self-describes via _meta (SEP-2575)

Sessions

Mcp-Session-Id header pins a client to state

Removed — explicit server-minted handles as tool args (SEP-2567)

Discovery

implied by initialize result

server/discover RPC, required (SEP-2575)

Headers

none required

Mcp-Method + Mcp-Name required on POST (SEP-2243)

List results

plain

ttlMs + cacheScope cache hints (SEP-2549)

Results

plain

resultType: "complete" on every result (SEP-2322)

Tracing

ad hoc

W3C Trace Context in _meta (SEP-414)

Tool schema

subset

full JSON Schema 2020-12 (SEP-2106)

Resource-not-found

-32002

-32602 (Invalid Params)

Full changelog: https://modelcontextprotocol.io/specification/draft/changelog

Related MCP server: Simple Streamable HTTP MCP Server

What's in the box

.
├── main.py                       # FastAPI app — MCP 2026-07-28 over stateless HTTP
├── requirements.txt
├── azure.yaml                    # azd service definition
├── client/
│   └── mcp_client.py             # spec-compliant 2026-07-28 client (headers + _meta + handles)
├── infra/
│   ├── main.bicep                # Resource group scope
│   ├── main.parameters.json
│   ├── abbreviations.json
│   ├── app/
│   │   └── web.bicep             # App Service + staging slot
│   └── shared/
│       ├── app-service-plan.bicep
│       └── monitoring.bicep      # Log Analytics + App Insights
├── loadtest/
│   ├── k6-mcp.js                 # k6 script — tags hits per instance
│   └── README.md
├── static/style.css
└── templates/index.html          # Status page showing serving instance

MCP tools

Tool

Purpose

whoami

Returns the App Service instance ID handling the request

echo

Echoes a message, tagged with the instance ID

lookup_fact

Static read-only fact lookup (stateless)

compute_primes

CPU-bound prime counter (useful for load testing each instance's CPU)

tally

Running total via an explicit signed handle — stateless cross-call state

Why tally matters

In 2025-11-25, a tool that needed to remember something across calls leaned on the session. The 2026-07-28 spec removes sessions, so this server mints an explicit handle instead: tally returns a signed token that contains the running total. Pass it back on the next call and the total accumulates — even though the load balancer may route each call to a different instance. State travels with the request, not the connection. (For real workloads you'd back handles with a shared store like Azure Storage, Cosmos DB, or Redis; here the handle is self-contained so the sample needs zero extra infrastructure.)

Local development

python -m venv .venv
source .venv/bin/activate          # Windows: .venv\Scripts\activate
pip install -r requirements.txt
python main.py

Open http://localhost:8000/. The MCP endpoint is at http://localhost:8000/mcp.

Try the bundled client

python client/mcp_client.py                 # against localhost
python client/mcp_client.py https://<your-app>.azurewebsites.net

It runs server/discover, lists tools (showing the cache hints), calls whoami a few times so you can watch the instance ID move, then drives the tally handle across calls — all with the required 2026-07-28 headers and _meta.

Strict header mode

By default the server is lenient about the new Mcp-Method / Mcp-Name headers so that not-yet-2026-07-28 clients still work. To enforce them (return -32020 HeaderMismatch when they're missing or wrong), set:

MCP_STRICT_HEADERS=1 python main.py

The bundled client and load test always send them.

Deploy to Azure

azd auth login
azd up

azd up provisions:

  • A Premium v3 (P0v3) Linux App Service Plan with capacity: 3 — three live instances behind App Service's built-in load balancer.

  • The Web App, with clientAffinityEnabled: false — no ARR Affinity cookie, so the load balancer is free to round-robin every request.

  • A staging deployment slot wired to the same plan for zero-downtime swaps.

  • A Log Analytics workspace + Application Insights resource, connected via APPLICATIONINSIGHTS_CONNECTION_STRING so the OpenTelemetry distro emits traces tagged with cloud_RoleInstance = WEBSITE_INSTANCE_ID.

Tune the scale-out level

azd env set INSTANCE_COUNT 5
azd provision

(The instanceCount bicep parameter accepts 1–10, wired through infra/main.parameters.json.)

Connect VS Code to the deployed server

Update .vscode/mcp.json:

{
  "servers": {
    "stateless-mcp-app-service-2026": {
      "url": "https://<your-app>.azurewebsites.net/mcp",
      "type": "http"
    }
  }
}

Verify load distribution

  1. Hit the home page a few times — the Instance ID value should change.

  2. Run the bundled client or the k6 load test:

    BASE_URL=https://<your-app>.azurewebsites.net k6 run loadtest/k6-mcp.js
  3. Inspect Application Insights:

    requests
    | where timestamp > ago(15m)
    | where name contains "/mcp"
    | summarize count() by cloud_RoleInstance

Architecture

                       ┌─────────────────────────────────────────┐
                       │       Azure App Service (P0v3 × 3)      │
                       │  ┌────────────┐ ┌────────────┐ ┌──────┐ │
   MCP client ── HTTP ─┤ ▶  instance0  │ │  instance1 │ │  …   │ │
   (stateless,         │  └────────────┘ └────────────┘ └──────┘ │
    no session,        │     ▲ built-in load balancer ▲          │
    no cookies)        │     │   clientAffinityEnabled=false     │
                       │  ┌──┴────────────────────────────────┐  │
                       │  │       Staging slot (same plan)    │  │
                       │  └───────────────────────────────────┘  │
                       └────────────────────┬────────────────────┘
                                            ▼
                                   Application Insights
                                  (cloud_RoleInstance =
                                   WEBSITE_INSTANCE_ID)

License

MIT.

A
license - permissive license
-
quality - not tested
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/seligj95/app-service-mcp-stateless-scale-2026-python'

If you have feedback or need assistance with the MCP directory API, please join our Discord server