MCP Agent Mail

testing-tasks-v2.md•15.9 KiB

# MCP Agent Mail Test Suite - Comprehensive Plan ## P0 - Critical Regression Tests These tests prevent recurrence of known bugs and must pass before any release. --- ## Regression: Datetime Naive/Aware Handling priority: 0 labels: regression, critical, datetime **Background:** We fixed a bug where SQLite (which stores naive datetimes) was compared against timezone-aware Python datetimes, causing `TypeError: can't compare offset-naive and offset-aware datetimes`. ### Test Cases 1. `_naive_utc()` returns naive datetime when given None 2. `_naive_utc()` strips timezone from aware datetime 3. `_utcnow_naive()` model factory returns naive datetime 4. All model default_factory fields produce naive datetimes 5. File reservation expiration comparison works (the original failure point) 6. AgentLink timestamp comparisons work 7. MessageRecipient timestamp updates work 8. ProjectSiblingSuggestion timestamp comparisons work ### Files to Test - `src/mcp_agent_mail/models.py`: `_utcnow_naive()` - `src/mcp_agent_mail/app.py`: `_naive_utc()`, all datetime assignments ### Success Criteria - All datetime comparisons in WHERE clauses succeed - All model field updates with timestamps succeed - No `TypeError` for offset-naive/aware comparisons --- ## Regression: Session Context Management priority: 0 labels: regression, critical, database **Background:** We fixed a bug where `await session.commit()` was outside the `async with get_session()` block in `force_release_file_reservation`, causing commits to silently fail. ### Test Cases 1. `force_release_file_reservation` actually persists the release 2. All database writes in file reservation functions persist correctly 3. All database writes in contact functions persist correctly 4. Transaction rollback on error works correctly ### Files to Test - `src/mcp_agent_mail/app.py`: All `async with get_session()` blocks ### Success Criteria - Database state changes are actually persisted - Verify via direct SQL query after each operation --- ## Regression: Agent Name Validation priority: 0 labels: regression, critical, validation **Background:** Agent names must follow adjective+noun format. Invalid names should be rejected. ### Test Cases 1. Valid names accepted: "BlueLake", "GreenMountain", "RedStone" 2. Invalid names rejected: "BackendHarmonizer", "DatabaseMigrator", "agent1" 3. Placeholder detection works: "YourAgentName", "AgentName", etc. 4. Case-insensitive uniqueness enforced ### Success Criteria - Clear error messages for invalid names - Suggestions for valid alternatives --- ## P1 - Core Functionality Tests These test the primary user-facing functionality. --- ## Core: Message Delivery Flow priority: 1 labels: core, messaging Complete test of message delivery from send to acknowledgment. ### Test Cases 1. Send message to single recipient (to) 2. Send message with cc recipients 3. Send message with bcc recipients (not visible to others) 4. Send message to self 5. Send message with thread_id 6. Reply to message (preserves thread) 7. Fetch inbox shows unread messages 8. Mark message as read 9. Acknowledge message (ack_required=true) 10. Search messages by subject/body 11. Summarize thread extracts key points ### Verification - Database records created correctly - Git archive artifacts created (inbox/outbox .md files) - Timestamps are naive UTC --- ## Core: File Reservation Lifecycle priority: 1 labels: core, file-reservations Complete test of file reservation from claim to release. ### Test Cases 1. Create exclusive reservation 2. Create shared reservation 3. Conflict detection: exclusive vs exclusive 4. Conflict detection: exclusive vs shared 5. No conflict: shared vs shared 6. Pattern overlap detection (src/** vs src/main.py) 7. TTL expiration releases reservation 8. Manual release before TTL 9. Stale detection (agent inactive) 10. Force release with notification 11. Renew reservation extends TTL ### Verification - Git archive artifacts created (file_reservations/*.json) - Conflicts returned with holder information - Released reservations have released_ts set --- ## Core: Contact Management Flow priority: 1 labels: core, contacts Complete test of contact request/approval workflow. ### Test Cases 1. Request contact from Agent A to Agent B 2. Agent B receives contact request in inbox 3. Agent B approves contact 4. Agent A can now message Agent B 5. Agent B denies contact 6. Denied agent cannot message 7. Contact policy: open (anyone can message) 8. Contact policy: contacts_only (approved only) 9. Contact policy: block_all (nobody) 10. Contact expiration after TTL 11. Cross-project contacts ### Verification - AgentLink records created with correct status - Policy enforcement blocks/allows messages --- ## Core: Project and Agent Setup priority: 1 labels: core, setup Test project and agent registration flows. ### Test Cases 1. ensure_project creates new project 2. ensure_project is idempotent 3. Project slug generated from human_key 4. register_agent creates new agent 5. register_agent updates existing agent 6. create_agent_identity always creates new 7. Agent profile written to Git archive 8. last_active_ts updated on activity 9. whois returns agent details ### Verification - Database records correct - Git archive has agents/{name}/profile.json --- ## P1 - MCP Protocol Tests Test all MCP tools and resources work correctly. --- ## MCP Tools: Happy Path Coverage priority: 1 labels: mcp, tools Test each MCP tool with valid inputs via FastMCP Client. ### Tools to Test 1. health_check - returns status ok 2. ensure_project - creates/returns project 3. register_agent - creates/updates agent 4. create_agent_identity - creates new agent 5. whois - returns agent profile 6. send_message - delivers message 7. reply_message - replies in thread 8. fetch_inbox - returns messages 9. mark_message_read - sets read_ts 10. acknowledge_message - sets ack_ts 11. search_messages - FTS query works 12. summarize_thread - extracts summary 13. file_reservation_paths - creates reservations 14. release_file_reservations - releases 15. renew_file_reservations - extends TTL 16. force_release_file_reservation - force release 17. request_contact - creates link 18. respond_contact - approves/denies 19. list_contacts - returns links 20. set_contact_policy - updates policy 21. All macro_* tools ### Verification - Each tool returns expected response shape - Side effects occur (database, Git archive) --- ## MCP Resources: Read Access priority: 1 labels: mcp, resources Test all MCP resources return correct data. ### Resources to Test 1. resource://project/{slug} - project details 2. resource://agents/{project} - agent list 3. resource://inbox/{agent}?project= - inbox messages 4. resource://outbox/{agent}?project= - outbox messages 5. resource://thread/{id}?project= - thread messages 6. resource://file-reservations/{project} - active reservations ### Verification - Correct JSON structure - Query parameters work (limit, include_bodies) - Missing resources return appropriate error --- ## P2 - Error Handling Tests Test that errors are handled gracefully with clear messages. --- ## Errors: Invalid Inputs priority: 2 labels: errors, validation Test error handling for invalid inputs. ### Test Cases 1. Invalid project_key format 2. Non-existent project 3. Invalid agent name format 4. Non-existent agent 5. Agent name placeholder detection 6. Empty message body 7. Invalid importance level 8. Invalid contact policy 9. Invalid file reservation pattern 10. TTL below minimum (< 60s) 11. Missing required parameters ### Verification - Clear error messages returned - Suggestions provided where applicable - No stack traces exposed to user --- ## Errors: Database Failures priority: 2 labels: errors, database Test graceful handling of database issues. ### Test Cases 1. Database file missing (should auto-create) 2. Schema migration on startup 3. Concurrent write handling 4. Transaction rollback on error 5. Session cleanup on exception ### Verification - Errors logged appropriately - No data corruption - Recovery is automatic where possible --- ## Errors: Git Archive Failures priority: 2 labels: errors, git Test graceful handling of Git archive issues. ### Test Cases 1. Archive directory missing (should auto-create) 2. Git repo not initialized (should auto-init) 3. Concurrent archive writes (locking) 4. Invalid file paths sanitized 5. Large attachment handling ### Verification - Errors logged appropriately - No partial writes - Lock released on error --- ## P2 - CLI Integration Tests Test CLI commands work correctly. --- ## CLI: Mail Commands priority: 2 labels: cli, integration Test mail-related CLI commands. ### Commands to Test 1. `mcp-agent-mail mail status <path>` - shows project status 2. `mcp-agent-mail mail inbox <project> <agent>` - shows inbox 3. `mcp-agent-mail mail send` - sends message 4. `mcp-agent-mail mail ack <project> <agent> <msg_id>` - acknowledges 5. `mcp-agent-mail mail search <project> <query>` - searches ### Verification - Exit codes correct (0 success, non-zero failure) - JSON output format (--json flag) - Rich console output (default) --- ## CLI: Guard Commands priority: 2 labels: cli, guards Test guard-related CLI commands. ### Commands to Test 1. `mcp-agent-mail guard status <path>` - shows guard status 2. `mcp-agent-mail guard install <project> <path>` - installs hooks 3. `mcp-agent-mail guard uninstall <path>` - removes hooks ### Verification - Hook files created in .git/hooks/ - Chain runner preserves existing hooks - Uninstall cleans up properly --- ## CLI: Archive Commands priority: 2 labels: cli, archive Test archive backup/restore commands. ### Commands to Test 1. `mcp-agent-mail archive save --label <name>` - creates backup 2. `mcp-agent-mail archive list` - lists backups 3. `mcp-agent-mail archive restore <path>` - restores backup ### Verification - ZIP file created with correct structure - Database and storage both backed up - Restore recovers all data --- ## P2 - HTTP Transport Tests Test HTTP/SSE transport layer. --- ## HTTP: Server and Transport priority: 2 labels: http, transport Test HTTP server functionality. ### Test Cases 1. Server starts on configured port 2. Health endpoint returns 200 3. SSE connection established 4. Tool calls work over HTTP 5. Resource reads work over HTTP 6. CORS headers present ### Verification - Use httpx client for requests - Verify SSE event stream format --- ## HTTP: Authentication priority: 2 labels: http, auth Test HTTP authentication modes. ### Test Cases 1. Static bearer token accepted 2. Invalid static token rejected (401) 3. JWT token accepted (when configured) 4. Expired JWT rejected 5. Invalid JWT signature rejected 6. Missing auth rejected (when required) ### Verification - Correct HTTP status codes - Clear error messages in response --- ## HTTP: Rate Limiting priority: 2 labels: http, rate-limit Test rate limiting functionality. ### Test Cases 1. Requests within limit succeed 2. Requests over limit return 429 3. Rate limit headers present 4. Per-client tracking works 5. Redis-backed limiter (when configured) ### Verification - X-RateLimit-* headers correct - Retry-After header on 429 --- ## P2 - Guard Hook Tests Test pre-commit and pre-push guards. --- ## Guards: Pre-commit Enforcement priority: 2 labels: guards, git Test pre-commit guard blocks conflicting commits. ### Test Cases 1. No reservations - commit succeeds 2. Own reservation - commit succeeds 3. Other agent's exclusive reservation - commit blocked 4. Shared reservation - commit succeeds 5. Warning mode (AGENT_MAIL_GUARD_MODE=warn) - warns but allows 6. Bypass mode (AGENT_MAIL_BYPASS=1) - allows all ### Verification - Exit code 1 when blocked - Clear conflict message with holder info - Suggested resolution steps --- ## Guards: Pre-push Enforcement priority: 2 labels: guards, git Test pre-push guard blocks conflicting pushes. ### Test Cases 1. No conflicts in commits - push succeeds 2. Conflict in any commit - push blocked 3. Multiple commits checked 4. Remote branch comparison ### Verification - All commits enumerated correctly - Conflicts identified across commit range --- ## P3 - Security Tests Test security-sensitive areas. --- ## Security: Input Sanitization priority: 3 labels: security, xss Test XSS and injection prevention. ### Test Cases 1. HTML in message body sanitized 2. Script tags removed 3. Event handlers removed 4. Markdown rendered safely 5. Attachment metadata sanitized ### Verification - No executable content in output - Markdown rendering is safe --- ## Security: Path Traversal priority: 3 labels: security, paths Test path traversal prevention. ### Test Cases 1. File reservation pattern: "../../../etc/passwd" rejected 2. Attachment path traversal blocked 3. Archive extraction path validation 4. Agent name with path separators rejected ### Verification - Paths normalized and validated - No access outside project scope --- ## P3 - Concurrent Access Tests Test behavior under concurrent load. --- ## Concurrency: Multiple Agents priority: 3 labels: concurrency Test multiple agents operating simultaneously. ### Test Cases 1. 10 agents sending messages concurrently 2. Multiple agents claiming same file (conflict handling) 3. Concurrent inbox fetches 4. Concurrent archive writes (locking) 5. No data corruption under load ### Verification - All operations complete successfully - No deadlocks - Data integrity maintained --- ## P3 - Performance Tests Test performance characteristics. --- ## Performance: Baseline Benchmarks priority: 3 labels: performance Establish performance baselines. ### Benchmarks 1. Message send latency (p50, p95, p99) 2. Inbox fetch latency with 100 messages 3. Search latency with 1000 messages 4. File reservation conflict check with 100 reservations 5. Archive write latency ### Targets - Message send: < 100ms p95 - Inbox fetch: < 200ms p95 - Search: < 500ms p95 --- ## P4 - E2E Scenario Tests Complete end-to-end scenarios. --- ## E2E: Multi-Agent Development Workflow priority: 4 labels: e2e, scenario Simulate realistic multi-agent development. ### Scenario Three agents collaborate on a feature: 1. BlueLake reserves backend/** 2. GreenMountain reserves frontend/** 3. BlueLake sends "Starting API work" [bd-100] 4. GreenMountain replies "UI ready when you are" 5. BlueLake completes, releases reservation 6. RedStone reviews, sends feedback in thread 7. All acknowledge completion ### Verification - All messages in correct thread - Reservations properly released - Audit trail complete in Git --- ## E2E: Disaster Recovery priority: 4 labels: e2e, recovery Test full backup/restore cycle. ### Scenario 1. Create project with messages, agents, reservations 2. Simulate some time passing (messages sent) 3. Save archive with label "pre-disaster" 4. Delete database and storage 5. Restore from archive 6. Verify all data recovered 7. Continue operations normally ### Verification - Message timestamps preserved - Thread IDs maintained - File reservations restored - Git archive intact --- ## Milestones --- ## Milestone: Critical Path Coverage priority: 1 labels: milestone deps: Regression: Datetime Naive/Aware Handling, Regression: Session Context Management, Core: Message Delivery Flow, Core: File Reservation Lifecycle All P0 and P1 tests passing. Critical functionality verified. ### Criteria - All regression tests pass - Core messaging flow tested - Core file reservation flow tested - No known data loss scenarios --- ## Milestone: Full Integration Coverage priority: 2 labels: milestone deps: Milestone: Critical Path Coverage, CLI: Mail Commands, CLI: Guard Commands, HTTP: Server and Transport All integration points tested. ### Criteria - CLI commands tested - HTTP transport tested - Guard hooks tested - Error handling verified --- ## Milestone: Production Ready priority: 3 labels: milestone deps: Milestone: Full Integration Coverage, Security: Input Sanitization, Concurrency: Multiple Agents Ready for production deployment. ### Criteria - Security tests pass - Concurrent access safe - Performance acceptable - E2E scenarios pass

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Dicklesworthstone/mcp_agent_mail'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

testing-tasks-v2.md•15.9 KiB