Atomic CRM MCP Server

incident-response-runbook.md•8 KiB

# Incident Response Runbook ## Overview This runbook provides step-by-step procedures for responding to security incidents and operational issues in the Atomic CRM MCP Server. ## Incident Severity Levels | Level | Name | Description | Response Time | |-------|------|-------------|---------------| | P1 | Critical | Service down, data breach, security compromise | Immediate (< 15 min) | | P2 | High | Major feature unavailable, performance degradation | < 1 hour | | P3 | Medium | Minor feature issues, elevated error rates | < 4 hours | | P4 | Low | Cosmetic issues, documentation updates | < 24 hours | ## Incident Response Team | Role | Responsibilities | |------|------------------| | Incident Commander | Overall coordination, communication, decision making | | Technical Lead | Technical investigation, mitigation implementation | | Communications | Stakeholder notifications, status updates | | Security Analyst | Security assessment, forensics, containment | --- ## Incident Response Procedures ### Phase 1: Detection and Triage #### 1.1 Alert Sources - Health check failures (Kubernetes probes) - Error rate spikes (logging thresholds) - Security alerts (audit log anomalies) - User reports - External notifications #### 1.2 Initial Assessment ```bash # Check service health curl -s http://localhost:3000/health | jq # Check recent errors in logs # Look for: error, fatal, critical log levels # Check database connectivity curl -s http://localhost:3000/health/ready | jq # Check pool statistics # Look for: high utilization, waiting connections ``` #### 1.3 Severity Classification - **P1**: Multiple users affected, data at risk, service completely unavailable - **P2**: Single major feature down, significant performance impact - **P3**: Intermittent issues, elevated error rates but service functional - **P4**: Minor issues, workarounds available --- ### Phase 2: Containment #### 2.1 Security Incident Containment **SQL Injection Attempt Detected:** ```bash # 1. Block offending IP immediately iptables -A INPUT -s <IP_ADDRESS> -j DROP # 2. Review audit logs for the IP SELECT * FROM audit_log WHERE ip_address = '<IP_ADDRESS>' ORDER BY created_at DESC; # 3. Check for successful data exfiltration SELECT * FROM audit_log WHERE ip_address = '<IP_ADDRESS>' AND event_type IN ('DATA_READ', 'DATA_EXPORT') ORDER BY created_at DESC; # 4. Rotate database credentials if breach suspected # Via Supabase dashboard or CLI ``` **Authentication Bypass Attempt:** ```bash # 1. Revoke all sessions for affected user(s) # Add tokens to blocklist in Redis # 2. Force password reset for affected accounts # Via Supabase Auth API # 3. Enable additional monitoring # Increase log level to debug ``` **Rate Limit Exceeded:** ```bash # 1. Check if legitimate traffic spike or attack # Review request patterns in logs # 2. If attack, block IPs # If legitimate, consider temporary rate limit increase # 3. Enable request queuing if needed ``` #### 2.2 Service Degradation Containment **Database Pool Exhaustion:** ```bash # 1. Check pool stats # Look for: utilization > 90%, waiting > 20 # 2. Kill long-running queries SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'active' AND query_start < NOW() - INTERVAL '5 minutes' AND query NOT LIKE '%pg_stat_activity%'; # 3. Scale horizontally if needed # Deploy additional instances ``` **Memory Pressure:** ```bash # 1. Check memory usage free -m ps aux --sort=-%mem | head -20 # 2. Restart service if OOM imminent kubectl rollout restart deployment/atomic-crm-mcp # 3. Review memory limits kubectl describe pod <pod-name> ``` --- ### Phase 3: Investigation #### 3.1 Log Analysis **Key Log Locations:** - Application logs: stdout/stderr (JSON format) - Audit logs: `audit_log` table in database - Access logs: Ingress controller logs - Database logs: Supabase dashboard **Search Patterns:** ```bash # Find errors in last hour # Look for: "level":"error" in logs # Find authentication failures # Look for: "AUTH_FAILURE", "AUTH_LOGIN_FAILED" # Find SQL injection attempts # Look for: "forbidden keyword", "SQL injection" # Find rate limit violations # Look for: "rate limit exceeded" ``` #### 3.2 Database Investigation ```sql -- Recent audit events SELECT event_type, COUNT(*) FROM audit_log WHERE created_at > NOW() - INTERVAL '1 hour' GROUP BY event_type; -- Failed operations SELECT * FROM audit_log WHERE success = false AND created_at > NOW() - INTERVAL '1 hour' ORDER BY created_at DESC; -- Suspicious patterns SELECT ip_address, COUNT(*) as request_count FROM audit_log WHERE created_at > NOW() - INTERVAL '1 hour' GROUP BY ip_address HAVING COUNT(*) > 100 ORDER BY request_count DESC; ``` --- ### Phase 4: Remediation #### 4.1 Service Recovery **Restart Service:** ```bash # Kubernetes kubectl rollout restart deployment/atomic-crm-mcp # Docker docker restart atomic-crm-mcp # PM2 pm2 restart atomic-crm-mcp ``` **Database Recovery:** ```bash # Check database health curl -s http://localhost:3000/health/ready # Reset connection pool # Service restart will reset pool # Restore from backup if data corruption # Via Supabase dashboard ``` #### 4.2 Security Remediation **After Breach:** 1. Rotate all credentials (database, JWT secret, API keys) 2. Review and update RLS policies 3. Patch vulnerability that allowed breach 4. Update security rules/firewall 5. Notify affected users 6. File security incident report **After DDoS:** 1. Review and update rate limits 2. Implement additional rate limiting at CDN/load balancer 3. Consider WAF rules 4. Review auto-scaling policies --- ### Phase 5: Post-Incident #### 5.1 Documentation Create incident report including: - Timeline of events - Root cause analysis - Impact assessment - Actions taken - Lessons learned - Preventive measures #### 5.2 Follow-up Actions - [ ] Update runbook with new procedures - [ ] Implement additional monitoring - [ ] Schedule security review - [ ] Update documentation - [ ] Conduct team training if needed --- ## Common Incident Scenarios ### Scenario 1: Database Connection Pool Exhaustion **Symptoms:** - Health check failures - Request timeouts - "Connection pool exhausted" errors in logs **Resolution:** 1. Check pool stats via health endpoint 2. Identify long-running queries 3. Kill problematic queries 4. Restart service if needed 5. Review and optimize queries ### Scenario 2: Authentication Service Down **Symptoms:** - All requests return 401 - JWT verification failures in logs **Resolution:** 1. Check Supabase Auth service status 2. Verify JWT secret configuration 3. Check token expiration settings 4. Implement fallback if available ### Scenario 3: Rate Limit False Positive **Symptoms:** - Legitimate users getting 429 errors - Complaints about access denied **Resolution:** 1. Review rate limit configuration 2. Check if IP is shared (NAT, proxy) 3. Whitelist legitimate IPs if needed 4. Adjust rate limits for specific endpoints ### Scenario 4: Data Integrity Issue **Symptoms:** - Unexpected data in database - Missing records - Audit log anomalies **Resolution:** 1. Stop write operations if critical 2. Review audit logs for changes 3. Identify affected records 4. Restore from backup if needed 5. Investigate root cause --- ## Emergency Contacts | Role | Contact | Backup | |------|---------|--------| | Incident Commander | [Primary] | [Backup] | | Technical Lead | [Primary] | [Backup] | | Security Team | [Primary] | [Backup] | | Database Admin | [Primary] | [Backup] | ## External Resources - Supabase Status: https://status.supabase.com - Supabase Support: support@supabase.io - Security Advisory: security@atomic-crm.com --- ## Appendix: Useful Commands ```bash # Service health curl -s http://localhost:3000/health | jq # Pool statistics (via debug endpoint if available) curl -s http://localhost:3000/debug/pool-stats | jq # Recent errors # Filter logs for: level=error # Active connections SELECT * FROM pg_stat_activity WHERE state = 'active'; # Kill specific connection SELECT pg_terminate_backend(<pid>); # Clear Redis cache redis-cli FLUSHDB # Check rate limit status redis-cli KEYS "rate-limit:*" ```

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/manbelt/atomic-crm-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

incident-response-runbook.md•8 KiB