Stateful MCP Server on ECS Fargate
Provides Redis-backed state management for MCP sessions, enabling persistence across task deployments.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Stateful MCP Server on ECS Fargatestore my current task state in session"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Stateful MCP Servers on ECS Fargate
This repository is an end-to-end practical test of a production question:
Can a stateful MCP server survive ECS Fargate force deployments?
The final answer from the live test is:
Yes, but Redis-backed tool state alone is not enough. The MCP Streamable HTTP transport session registry must also stop depending on one container's memory. In this implementation we use stateless Streamable HTTP plus Redis-backed logical session state.
This project was based on the experiment from:
https://github.com/AvinashDalvi89/stateful-mcp-on-ecs-fargate-example
We extended it, deployed it on AWS, reproduced the problem, fixed it, force deployed again, captured AWS Console evidence, and then tore the resources down.
The Problem
ECS Fargate tasks are disposable. During a force deployment or rolling deployment:
ECS starts new replacement tasks.
The new tasks register with the ALB target group.
Old tasks are deregistered and enter draining.
ECS sends
SIGTERMto old containers.Old container memory disappears.
If an MCP server keeps session data only in process memory, the session can break when the client is routed to a new task.
ALB sticky sessions can delay this problem, but they do not solve it. When a target is draining, unhealthy, or removed, the ALB can route the client to a different task.
Related MCP server: MCP Compatible Server
What We Tested
The original experiment used:
FastMCP server on ECS Fargate.
Streamable HTTP endpoint at
/mcp.ALB with sticky sessions.
ECS rolling deployment.
In-memory session store.
A client that repeatedly calls MCP tools.
We added:
ElastiCache Redis.
Redis-backed MCP tool state.
Health endpoint proof showing the active session backend.
Stateless Streamable HTTP mode.
Client-generated logical
Mcp-Session-Id.Evidence capture from AWS Console, CloudWatch, ECS, ALB, and test-client logs.
Key Discovery
The first Redis attempt still failed.
Redis moved our application/tool state out of the task, but FastMCP's stateful Streamable HTTP transport still kept active MCP transport sessions in process memory. When traffic moved to a new ECS task, the new task did not recognize the old Mcp-Session-Id and returned:
Session not foundThat error happened before our tool handler ran, which means Redis-backed tool state was necessary but not sufficient.
The working solution was:
MCP_STATELESS_HTTP=true
+ Redis-backed logical session state
+ client-provided Mcp-Session-IdFinal Architecture
Client
-> Application Load Balancer
-> ECS Fargate service
-> Task A: FastMCP server
-> Task B: FastMCP server
-> ElastiCache RedisRedis is outside the Fargate task. This is important.
Do not run Redis as a sidecar inside the same Fargate task for this use case. A sidecar Redis container dies with the task and does not solve deployment replacement.
Final Result
During the successful force-deployment test:
{
"http_status_counts": {
"200": 141
},
"unique_task_ids": [
"eb83d8d37aa448758abe33e410d17864",
"8454af6040484b64b252adf5d0448fff"
],
"first_task_id": "eb83d8d37aa448758abe33e410d17864",
"last_task_id": "8454af6040484b64b252adf5d0448fff",
"max_state_size": 70,
"session_not_found_count": 0,
"error_rows": 0,
"session_complete": true
}This proves:
The client crossed from one ECS task to another.
The session continued after task replacement.
Redis state accumulated up to 70 keys.
Every MCP request returned HTTP
200.There were zero
Session not founderrors.The session completed during ECS deployment replacement.
What Changed In The Code
Redis And Memory Session Stores
src/session_store.py now contains:
SessionStore: common interface.InMemorySessionStore: local/demo backend.RedisSessionStore: shared backend for ECS tasks.create_session_store(): selects backend from environment.
Backend selection:
REDIS_URL set -> RedisSessionStore
REDIS_URL missing -> InMemorySessionStoreStateless Streamable HTTP
src/server.py reads:
MCP_STATELESS_HTTP=trueWhen enabled, FastMCP starts with:
mcp.http_app(stateless_http=True)This avoids depending on a per-task in-memory Streamable HTTP transport session registry.
Logical MCP Session ID
In stateless mode, the server does not issue a transport session ID. The test client generates a stable logical session ID:
client-<uuid>It sends this value on every tool call:
Mcp-Session-Id: client-...FastMCP exposes that header through ctx.session_id, and our tools use it as the Redis key.
Lazy Session Creation
set_session_value() creates the logical Redis session on first write if the key does not exist.
get_session_state() still fails for a never-seen session, which keeps reads honest.
Health Endpoint
/health now returns the active backend:
{
"status": "healthy",
"active_sessions": 0,
"session_store": "redis"
}AWS Infrastructure
sam/infrastructure.yaml creates:
VPC
Public subnets
Private subnets
NAT gateway
Application Load Balancer
ALB target group
ALB sticky sessions
ECR repository
CloudWatch log groups
ElastiCache Redis
Redis security group allowing inbound traffic only from ECS tasks
sam/ecs.yaml creates:
ECS cluster
ECS task execution role
Fargate task definition
ECS service
ALB service attachment
The container receives:
REDIS_URL=redis://<elasticache-endpoint>:6379/0
SESSION_TTL_SECONDS=86400
MCP_STATELESS_HTTP=trueRepository Layout
.
├── Dockerfile
├── Makefile
├── README.md
├── evidence/
│ ├── README.md
│ ├── screenshots/
│ ├── 08-health.json
│ ├── 12-client-during-force-deploy.jsonl
│ ├── 21-client-stateless-force-deploy.jsonl
│ ├── 25-stateless-client-summary.json
│ └── 28-cloudwatch-tail.txt
├── sam/
│ ├── infrastructure.yaml
│ └── ecs.yaml
├── src/
│ ├── server.py
│ ├── session_store.py
│ ├── tools.py
│ ├── health.py
│ └── shutdown.py
└── test_client/
└── test_client.pyPrerequisites
AWS CLI configured with permissions for ECS, ECR, CloudFormation, EC2, ELB, CloudWatch Logs, IAM, and ElastiCache.
Docker Desktop or Docker Engine.
Python 3.12+.
AWS region used in this test:
ap-south-1.
SAM is optional. The Makefile uses sam deploy, but this test was also run with direct aws cloudformation deploy.
Deploy
1. Deploy Infrastructure
make deploy-infraAWS CLI equivalent:
aws cloudformation deploy \
--region ap-south-1 \
--template-file sam/infrastructure.yaml \
--stack-name mcp-infrastructure \
--capabilities CAPABILITY_IAM CAPABILITY_AUTO_EXPAND \
--no-fail-on-empty-changeset2. Build And Push Image
make build IMAGE_TAG=redis3. Deploy ECS
make deploy-ecs IMAGE_TAG=redis4. Verify Health
curl http://<ALB_DNS_NAME>/healthExpected:
{
"status": "healthy",
"session_store": "redis"
}Run The Force Deployment Test
Start the client:
python test_client/test_client.py \
--endpoint http://<ALB_DNS_NAME>/mcp \
--calls 70 \
--delay 2While the client is running, force a new ECS deployment:
aws ecs update-service \
--region ap-south-1 \
--cluster mcp-fargate-cluster \
--service mcp-fargate-service \
--force-new-deploymentWait for service stability:
aws ecs wait services-stable \
--region ap-south-1 \
--cluster mcp-fargate-cluster \
--services mcp-fargate-serviceThen inspect:
evidence/21-client-stateless-force-deploy.jsonl
evidence/25-stateless-client-summary.jsonEvidence
Evidence is included in evidence/.
Important files:
08-health.json: live/healthendpoint showing Redis mode.12-client-during-force-deploy.jsonl: Redis-only attempt that still hit transport-level session failure.21-client-stateless-force-deploy.jsonl: final successful stateless+Redis run.25-stateless-client-summary.json: parsed success summary.28-cloudwatch-tail.txt: ECS task logs from CloudWatch.screenshots/: AWS Console screenshots.
See evidence/README.md for a detailed evidence map.
AWS Console Evidence Captured
The screenshots show:
ECR image pushed.
ECS task definition revision.
ECS service healthy before deployment test.
ECS logs from task.
Force new deployment menu.
Deployment in progress.
ALB target group draining old target.
ECR image after rebuild.
Revision 2 deployment in progress.
Revision 2 deployment success.
CloudFormation teardown in progress.
Teardown
Delete ECS first:
aws cloudformation delete-stack \
--region ap-south-1 \
--stack-name mcp-ecsThen delete infrastructure:
aws cloudformation delete-stack \
--region ap-south-1 \
--stack-name mcp-infrastructureIf CloudFormation cannot delete ECR because images still exist:
aws ecr list-images \
--region ap-south-1 \
--repository-name mcp-fargate-server
aws ecr batch-delete-image \
--region ap-south-1 \
--repository-name mcp-fargate-server \
--image-ids imageDigest=<digest>In the captured test run, teardown completed after deleting the remaining ECR images.
Production Notes
Use ElastiCache with Multi-AZ or MemoryDB for stronger production durability.
Enable encryption in transit and Redis authentication for production.
Put Redis in private subnets.
Allow Redis inbound traffic only from the ECS task security group.
Do not rely on ALB sticky sessions as your durability layer.
Keep MCP tool operations idempotent where possible.
For server-sent event resumability, consider external event storage as a separate concern.
Main Lesson
For MCP on ECS Fargate, there are two different kinds of state:
Application/tool state.
MCP transport/session-manager state.
Moving only application state to Redis can still fail if the transport session manager is stateful and in memory.
This repository demonstrates a practical ECS-safe pattern:
stateless Streamable HTTP + external Redis logical session stateThis server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/bharisagar/MCP'
If you have feedback or need assistance with the MCP directory API, please join our Discord server