A2AMCP

A2AMCP
docs

ARCHITECtURE.md•11.3 KiB

# A2AMCP Architecture ## Overview A2AMCP (Agent-to-Agent Model Context Protocol) is a distributed communication system that enables AI agents to coordinate while working on parallel development tasks. It provides persistent state management, conflict prevention, and real-time messaging through a centralized MCP server backed by Redis. ## System Architecture ``` ┌──────────────────────────────────────────────────────────────┐ │ User's Machine │ │ │ │ ┌─────────────────┐ ┌──────────────────────────┐ │ │ │ Orchestrator │ │ Docker Network │ │ │ │ │ │ │ │ │ │ - Task Manager │ │ ┌───────────────────┐ │ │ │ │ - Agent Spawner│ │ │ A2AMCP Server │ │ │ │ │ - Monitor │ │ │ │ │ │ │ └─────────────────┘ │ │ - MCP Handler │ │ │ │ │ │ - Message Router │ │ │ │ Spawns │ │ - State Manager │ │ │ │ ↓ │ │ │ │ │ │ │ └────────┬──────────┘ │ │ │ ┌─────────────────────┐ │ │ │ │ │ │ tmux sessions │ │ │ │ │ │ │ │ │ ┌────────┴──────────┐ │ │ │ │ ┌──────────────┐ │ │ │ Redis Server │ │ │ │ │ │ Agent task-1 │←─┼────┼─→│ │ │ │ │ │ │ (Claude Code)│ │ │ │ - Persistence │ │ │ │ │ └──────────────┘ │ │ │ - Pub/Sub │ │ │ │ │ │ │ │ - Namespacing │ │ │ │ │ ┌──────────────┐ │ │ │ │ │ │ │ │ │ Agent task-2 │←─┼────┼─→│ project:app:* │ │ │ │ │ │ (Claude Code)│ │ │ │ project:web:* │ │ │ │ │ └──────────────┘ │ │ │ │ │ │ │ │ │ │ └───────────────────┘ │ │ │ │ ┌──────────────┐ │ │ │ │ │ │ │ Agent task-N │←─┼────┼───────────────────────────┘ │ │ │ │ (Claude Code)│ │ │ │ │ │ └──────────────┘ │ │ │ │ └─────────────────────┘ │ │ │ │ │ └──────────────────────────────────────────────────────────────┘ ``` ## Core Components ### 1. MCP Server The central communication hub implemented in Python: ```python class AgentCommunicationServer: def __init__(self, redis_url): self.server = Server("a2amcp") self.redis_client = redis.from_url(redis_url) self._setup_tools() # Register MCP tools ``` **Responsibilities:** - Handle MCP tool calls from agents - Route messages between agents - Manage agent lifecycle - Monitor heartbeats - Clean up dead agents **Key Features:** - Stateless design (all state in Redis) - Async/await for concurrent operations - Automatic cleanup via heartbeat monitoring - Project namespace isolation ### 2. Redis Backend Provides persistent storage and pub/sub messaging: **Data Structure:** ``` redis-server/ ├── project:{project_id}:agents # Hash: agent registry ├── project:{project_id}:heartbeat:{id} # String: last heartbeat ├── project:{project_id}:locks # Hash: file locks ├── project:{project_id}:interfaces # Hash: shared types ├── project:{project_id}:todos:{id} # List: agent todos ├── project:{project_id}:messages:{id} # List: message queue └── project:{project_id}:recent_changes # List: change history ``` **Benefits:** - Persistence across restarts - Horizontal scalability - Built-in pub/sub for events - Atomic operations - TTL for automatic cleanup ### 3. AI Agents Individual Claude Code instances running in tmux sessions: **Agent Lifecycle:** ``` Start → Register → Heartbeat Loop → Work → Unregister ↓ ↓ ↑ └── Check Messages ←─────────────────┘ ``` **Agent Capabilities:** - Register with project and task info - Maintain heartbeat for liveness - Create and update todo lists - Lock files before modification - Share interfaces and contracts - Query other agents - Broadcast messages ### 4. Communication Protocol #### Message Types 1. **Direct Query** ``` Agent A → MCP Server → Redis → Agent B's Queue ↓ Agent A ← MCP Server ← Redis ← Agent B Response ``` 2. **Broadcast** ``` Agent A → MCP Server → Redis → All Other Agents' Queues ``` 3. **Event Notification** ``` System Event → Redis Pub/Sub → Subscribed Agents ``` #### Message Format ```json { "id": "unique-message-id", "from": "task-001", "type": "query|broadcast|response|event", "content": "message content", "timestamp": "ISO-8601", "metadata": {} } ``` ## Key Design Decisions ### 1. MCP vs HTTP **Why MCP:** - Native integration with Claude Code - No need for HTTP servers per agent - Simpler security model - Direct tool calls from AI context **Trade-offs:** - Limited to MCP-compatible agents - Less flexibility than HTTP - Requires MCP runtime ### 2. Centralized vs Distributed **Why Centralized Server:** - Simpler consistency model - Easier debugging - Single point for monitoring - Reduced network complexity **Trade-offs:** - Single point of failure (mitigated by Redis persistence) - Potential bottleneck (mitigated by async operations) ### 3. Redis vs Other Storage **Why Redis:** - In-memory performance - Built-in data structures - Pub/sub capabilities - Persistence options - Wide language support **Alternatives Considered:** - PostgreSQL: Too heavy for message queuing - RabbitMQ: Overkill for our use case - In-memory only: No persistence ### 4. Project Namespacing **Implementation:** ``` project:{project_id}:{resource_type}:{resource_id} ``` **Benefits:** - Complete isolation between projects - Easy cleanup of project data - Parallel project support - Clear data organization ## Scalability Considerations ### Current Limits - **Agents per project**: 100+ (tested) - **Messages per second**: 1000+ - **File operations**: No hard limit - **Storage**: Limited by Redis memory ### Scaling Strategies 1. **Vertical Scaling** - Increase Redis memory - Upgrade server CPU/RAM - Optimize message processing 2. **Horizontal Scaling** - Redis Cluster for data sharding - Multiple MCP servers with load balancing - Read replicas for queries 3. **Performance Optimizations** - Message batching - Caching frequent queries - Lazy loading of todos - Compression for large messages ## Security Model ### Current Security 1. **Network Isolation** - Docker network isolation - Local-only by default - No external exposure 2. **Project Isolation** - Namespace separation - No cross-project access - Project-specific operations ### Future Security Enhancements 1. **Authentication** - API key per project - JWT tokens for agents - OAuth integration 2. **Authorization** - Role-based access - Operation permissions - Resource quotas 3. **Encryption** - TLS for network traffic - Encrypted storage - Secure key management ## Fault Tolerance ### Failure Scenarios 1. **Agent Crash** - Detected via heartbeat timeout - Automatic cleanup (locks, registry) - Other agents continue 2. **MCP Server Crash** - State preserved in Redis - Agents reconnect on restart - Messages queued in Redis 3. **Redis Crash** - AOF persistence for recovery - RDB snapshots for backup - Potential data loss window ### Recovery Mechanisms 1. **Heartbeat Monitoring** ```python async def _heartbeat_monitor(self): while True: check_all_heartbeats() cleanup_dead_agents() await asyncio.sleep(30) ``` 2. **Lock Cleanup** - Automatic on agent death - Manual unlock available - Timeout-based release (future) 3. **Message Replay** - Messages persist until acknowledged - At-least-once delivery - Idempotent operations recommended ## Integration Points ### 1. Orchestrator Integration ```python from a2amcp import A2AMCPClient, Project, AgentSpawner client = A2AMCPClient("localhost:5000") project = Project(client, "my-app") spawner = AgentSpawner(project) # Spawn agents with A2AMCP awareness sessions = await spawner.spawn_multiple(tasks, worktree_base) ``` ### 2. Monitoring Integration - Prometheus metrics (planned) - Health check endpoints - WebSocket for real-time updates - REST API for dashboards ### 3. CI/CD Integration - Pre-flight checks for conflicts - Automated testing with mock agents - Performance benchmarks - Integration test suites ## Future Architecture Evolution ### 1. Event Sourcing - Complete audit trail - Time-travel debugging - Replay capabilities ### 2. GraphQL API - Flexible queries - Real-time subscriptions - Better client efficiency ### 3. Plugin System - Custom message handlers - External tool integration - Workflow automation ### 4. Multi-Region Support - Geo-distributed agents - Regional Redis clusters - Edge computing compatibility ## Performance Characteristics ### Latency - Message delivery: <10ms (local) - Query response: <100ms (typical) - File lock: <5ms - Registration: <20ms ### Throughput - Messages: 1000+ msg/sec - Concurrent agents: 100+ - File operations: 500+ ops/sec ### Resource Usage - MCP Server: ~100MB RAM - Redis: ~1GB for 100 agents - Network: ~1MB/s per active agent ## Conclusion A2AMCP's architecture prioritizes simplicity, reliability, and developer experience. By leveraging proven technologies (MCP, Redis, Docker) and focusing on the specific needs of AI agent coordination, it provides a solid foundation for multi-agent development workflows. The architecture is designed to evolve with the ecosystem, supporting future enhancements while maintaining backward compatibility and operational simplicity.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/webdevtodayjason/A2AMCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

ARCHITECtURE.md•11.3 KiB