Stateless MCP Server on Azure App Service
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Stateless MCP Server on Azure App Servicerun whoami to see the serving instance"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Stateless MCP Server on Azure App Service — 2026-07-28 edition
A reference implementation of a stateless, horizontally scaled MCP server
built on the MCP 2026-07-28 specification and deployed behind Azure App
Service's built-in load balancer.
The 2026-07-28 revision makes MCP stateless at the protocol level: it removes
the initialize handshake and the Mcp-Session-Id header, so any instance can
serve any request with no prior context. That is a perfect match for App
Service's built-in load balancer — scale out and every instance is
interchangeable.
Part 2. This is the sequel to You can scale MCP servers behind a load balancer on App Service — here's how, which scaled a
2025-11-25server. The original sample lives at app-service-mcp-stateless-scale-python. This repo is the standalone2026-07-28version.
Stateless Streamable HTTP (MCP
2026-07-28) — no handshake, no sessionThree App Service instances by default, no sticky sessions
Explicit-handle tool (
tally) — the stateless replacement for session stateSpec-compliant Python client that exercises the new headers +
_metaStaging deployment slot for zero-downtime updates
Application Insights auto-instrumentation with per-instance request tagging
k6 load test that visualizes load distribution
What changed from 2025-11-25
Area |
|
|
Handshake |
| Removed — every request self-describes via |
Sessions |
| Removed — explicit server-minted handles as tool args (SEP-2567) |
Discovery | implied by |
|
Headers | none required |
|
List results | plain |
|
Results | plain |
|
Tracing | ad hoc | W3C Trace Context in |
Tool schema | subset | full JSON Schema 2020-12 (SEP-2106) |
Resource-not-found |
|
|
Full changelog: https://modelcontextprotocol.io/specification/draft/changelog
Related MCP server: Simple Streamable HTTP MCP Server
What's in the box
.
├── main.py # FastAPI app — MCP 2026-07-28 over stateless HTTP
├── requirements.txt
├── azure.yaml # azd service definition
├── client/
│ └── mcp_client.py # spec-compliant 2026-07-28 client (headers + _meta + handles)
├── infra/
│ ├── main.bicep # Resource group scope
│ ├── main.parameters.json
│ ├── abbreviations.json
│ ├── app/
│ │ └── web.bicep # App Service + staging slot
│ └── shared/
│ ├── app-service-plan.bicep
│ └── monitoring.bicep # Log Analytics + App Insights
├── loadtest/
│ ├── k6-mcp.js # k6 script — tags hits per instance
│ └── README.md
├── static/style.css
└── templates/index.html # Status page showing serving instanceMCP tools
Tool | Purpose |
| Returns the App Service instance ID handling the request |
| Echoes a message, tagged with the instance ID |
| Static read-only fact lookup (stateless) |
| CPU-bound prime counter (useful for load testing each instance's CPU) |
| Running total via an explicit signed handle — stateless cross-call state |
Why tally matters
In 2025-11-25, a tool that needed to remember something across calls leaned on
the session. The 2026-07-28 spec removes sessions, so this server mints an
explicit handle instead: tally returns a signed token that contains the
running total. Pass it back on the next call and the total accumulates — even
though the load balancer may route each call to a different instance. State
travels with the request, not the connection. (For real workloads you'd back
handles with a shared store like Azure Storage, Cosmos DB, or Redis; here the
handle is self-contained so the sample needs zero extra infrastructure.)
Local development
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
python main.pyOpen http://localhost:8000/. The MCP endpoint is at
http://localhost:8000/mcp.
Try the bundled client
python client/mcp_client.py # against localhost
python client/mcp_client.py https://<your-app>.azurewebsites.netIt runs server/discover, lists tools (showing the cache hints), calls
whoami a few times so you can watch the instance ID move, then drives the
tally handle across calls — all with the required 2026-07-28 headers and
_meta.
Strict header mode
By default the server is lenient about the new Mcp-Method / Mcp-Name headers
so that not-yet-2026-07-28 clients still work. To enforce them (return
-32020 HeaderMismatch when they're missing or wrong), set:
MCP_STRICT_HEADERS=1 python main.pyThe bundled client and load test always send them.
Deploy to Azure
azd auth login
azd upazd up provisions:
A Premium v3 (P0v3) Linux App Service Plan with
capacity: 3— three live instances behind App Service's built-in load balancer.The Web App, with
clientAffinityEnabled: false— no ARR Affinity cookie, so the load balancer is free to round-robin every request.A
stagingdeployment slot wired to the same plan for zero-downtime swaps.A Log Analytics workspace + Application Insights resource, connected via
APPLICATIONINSIGHTS_CONNECTION_STRINGso the OpenTelemetry distro emits traces tagged withcloud_RoleInstance = WEBSITE_INSTANCE_ID.
Tune the scale-out level
azd env set INSTANCE_COUNT 5
azd provision(The instanceCount bicep parameter accepts 1–10, wired through
infra/main.parameters.json.)
Connect VS Code to the deployed server
Update .vscode/mcp.json:
{
"servers": {
"stateless-mcp-app-service-2026": {
"url": "https://<your-app>.azurewebsites.net/mcp",
"type": "http"
}
}
}Verify load distribution
Hit the home page a few times — the Instance ID value should change.
Run the bundled client or the k6 load test:
BASE_URL=https://<your-app>.azurewebsites.net k6 run loadtest/k6-mcp.jsInspect Application Insights:
requests | where timestamp > ago(15m) | where name contains "/mcp" | summarize count() by cloud_RoleInstance
Architecture
┌─────────────────────────────────────────┐
│ Azure App Service (P0v3 × 3) │
│ ┌────────────┐ ┌────────────┐ ┌──────┐ │
MCP client ── HTTP ─┤ ▶ instance0 │ │ instance1 │ │ … │ │
(stateless, │ └────────────┘ └────────────┘ └──────┘ │
no session, │ ▲ built-in load balancer ▲ │
no cookies) │ │ clientAffinityEnabled=false │
│ ┌──┴────────────────────────────────┐ │
│ │ Staging slot (same plan) │ │
│ └───────────────────────────────────┘ │
└────────────────────┬────────────────────┘
▼
Application Insights
(cloud_RoleInstance =
WEBSITE_INSTANCE_ID)License
MIT.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/seligj95/app-service-mcp-stateless-scale-2026-python'
If you have feedback or need assistance with the MCP directory API, please join our Discord server