Skip to main content
Glama

MCP Git Server

by MementoRC
TROUBLESHOOTING.md24.5 kB
# MCP Git Server Troubleshooting Guide ## Table of Contents 1. [Common Issues](#common-issues) 2. [Error Diagnostics](#error-diagnostics) 3. [Performance Issues](#performance-issues) 4. [Session Management Problems](#session-management-problems) 5. [Validation Errors](#validation-errors) 6. [Circuit Breaker Issues](#circuit-breaker-issues) 7. [Memory and Resource Leaks](#memory-and-resource-leaks) 8. [Debugging Tools](#debugging-tools) 9. [Log Analysis](#log-analysis) 10. [Recovery Procedures](#recovery-procedures) ## Common Issues ### Server Won't Start **Symptoms:** - Server fails to initialize - Port binding errors - Import errors **Diagnosis:** ```python # Check basic server startup import asyncio from mcp_server_git.server import MCPGitServer async def test_startup(): try: server = MCPGitServer() await server.start() print("✓ Server started successfully") await server.shutdown() except Exception as e: print(f"✗ Server startup failed: {e}") import traceback traceback.print_exc() asyncio.run(test_startup()) ``` **Solutions:** 1. **Port already in use:** ```bash # Check what's using the port lsof -i :8080 # Use different port export MCP_GIT_PORT=8081 ``` 2. **Missing dependencies:** ```bash pixi install pixi run dev-setup ``` 3. **Permission issues:** ```bash # Check file permissions ls -la src/mcp_server_git/ # Fix permissions if needed chmod +x src/mcp_server_git/__main__.py ``` ### Client Connection Failures **Symptoms:** - Clients can't connect - Connection drops immediately - Authentication failures **Diagnosis:** ```python # Test client connection import websockets import json async def test_connection(): try: async with websockets.connect("ws://localhost:8080") as websocket: # Send test message test_msg = {"type": "ping", "id": "test_123"} await websocket.send(json.dumps(test_msg)) # Receive response response = await websocket.recv() print(f"✓ Connection successful: {response}") except Exception as e: print(f"✗ Connection failed: {e}") asyncio.run(test_connection()) ``` **Solutions:** 1. **Check server is running:** ```bash curl -f http://localhost:8080/health || echo "Server not responding" ``` 2. **Verify network connectivity:** ```bash telnet localhost 8080 ``` 3. **Check firewall settings:** ```bash # Linux sudo ufw status # macOS sudo pfctl -sr ``` ### Message Processing Errors **Symptoms:** - Messages not being processed - Invalid message format errors - Validation failures **Diagnosis:** ```python from mcp_server_git.models.validation import validate_message from mcp_server_git.models.notifications import CancelledNotification # Test message validation test_message = { "type": "notifications/cancelled", "id": "test_123", "request_id": "req_456" } result = validate_message(test_message, CancelledNotification) if not result.is_valid: print(f"Validation failed: {result.error}") else: print("✓ Message validation successful") ``` ## Error Diagnostics ### Error Classification **Critical Errors (Require session termination):** ```python # These errors should close the session CRITICAL_ERRORS = [ "authentication_failed", "protocol_violation", "security_breach", "corrupted_session_state" ] def is_critical_error(error_type): return error_type in CRITICAL_ERRORS ``` **Recoverable Errors (Can retry):** ```python # These errors should trigger retry logic RECOVERABLE_ERRORS = [ "network_timeout", "temporary_resource_unavailable", "rate_limit_exceeded", "transient_validation_error" ] def should_retry(error_type): return error_type in RECOVERABLE_ERRORS ``` ### Error Pattern Analysis ```python from collections import defaultdict from datetime import datetime, timedelta class ErrorAnalyzer: def __init__(self): self.error_counts = defaultdict(int) self.error_timestamps = defaultdict(list) def record_error(self, error_type, timestamp=None): if timestamp is None: timestamp = datetime.now() self.error_counts[error_type] += 1 self.error_timestamps[error_type].append(timestamp) def get_error_rate(self, error_type, window_minutes=60): """Get error rate for the last N minutes.""" cutoff = datetime.now() - timedelta(minutes=window_minutes) recent_errors = [ ts for ts in self.error_timestamps[error_type] if ts > cutoff ] return len(recent_errors) / window_minutes # errors per minute def detect_error_spikes(self, threshold=5): """Detect error spikes that might indicate systemic issues.""" spikes = {} for error_type in self.error_counts: rate = self.get_error_rate(error_type) if rate > threshold: spikes[error_type] = rate return spikes # Usage analyzer = ErrorAnalyzer() # In your error handler async def handle_error_with_analysis(error, error_type): analyzer.record_error(error_type) # Check for error spikes spikes = analyzer.detect_error_spikes() if spikes: logger.warning(f"Error spikes detected: {spikes}") ``` ## Performance Issues ### High Latency **Symptoms:** - Slow message processing - Client timeouts - High response times **Diagnosis:** ```python from mcp_server_git.optimizations import PerformanceTimer # Measure operation performance async def diagnose_performance(): with PerformanceTimer("message_processing") as timer: # Simulate message processing await process_test_message() if timer.elapsed_ms > 1000: # 1 second threshold print(f"⚠ High latency detected: {timer.elapsed_ms}ms") else: print(f"✓ Normal latency: {timer.elapsed_ms}ms") ``` **Solutions:** 1. **Enable validation caching:** ```python from mcp_server_git.optimizations import enable_validation_cache enable_validation_cache() ``` 2. **Profile bottlenecks:** ```python from mcp_server_git.optimizations import CPUProfiler with CPUProfiler("performance_analysis"): await process_messages(large_message_batch) ``` 3. **Optimize database queries:** ```python # Use async database operations async def get_session_data(session_id): async with database.connection() as conn: return await conn.fetch_one( "SELECT * FROM sessions WHERE id = ?", session_id ) ``` ### High Memory Usage **Symptoms:** - Memory usage keeps growing - Out of memory errors - Slow garbage collection **Diagnosis:** ```python from mcp_server_git.optimizations import MemoryMonitor import gc def diagnose_memory(): monitor = MemoryMonitor() # Take initial sample initial = monitor.take_sample("start") # Force garbage collection gc.collect() after_gc = monitor.take_sample("after_gc") print(f"Memory before GC: {initial:.2f}MB") print(f"Memory after GC: {after_gc:.2f}MB") print(f"GC recovered: {initial - after_gc:.2f}MB") # Check for memory leaks if initial - after_gc < 10: # Less than 10MB recovered print("⚠ Potential memory leak detected") # Get garbage collection stats stats = gc.get_stats() print(f"GC stats: {stats}") # Run periodically asyncio.create_task(periodic_memory_check()) async def periodic_memory_check(): while True: diagnose_memory() await asyncio.sleep(300) # Every 5 minutes ``` **Solutions:** 1. **Clear caches periodically:** ```python from mcp_server_git.optimizations import clear_validation_cache async def cache_cleanup_task(): while True: await asyncio.sleep(3600) # Every hour clear_validation_cache() gc.collect() ``` 2. **Limit session count:** ```python class MemoryAwareSessionManager(SessionManager): MAX_SESSIONS = 1000 async def create_session(self, session_id=None): if len(self.sessions) >= self.MAX_SESSIONS: # Close oldest sessions await self.cleanup_oldest_sessions(count=100) return await super().create_session(session_id) ``` ### High CPU Usage **Symptoms:** - CPU usage constantly high - Server becomes unresponsive - Slow message processing **Diagnosis:** ```python import psutil import asyncio from datetime import datetime async def monitor_cpu_usage(): """Monitor CPU usage and identify bottlenecks.""" while True: cpu_percent = psutil.cpu_percent(interval=1) if cpu_percent > 80: # High CPU threshold # Get detailed CPU info cpu_count = psutil.cpu_count() load_avg = psutil.getloadavg() logger.warning(f"High CPU usage detected", extra={ "cpu_percent": cpu_percent, "cpu_count": cpu_count, "load_average": load_avg, "timestamp": datetime.now().isoformat() }) # Get process-specific CPU usage process = psutil.Process() process_cpu = process.cpu_percent() threads = process.num_threads() logger.warning(f"Process CPU usage: {process_cpu}%, Threads: {threads}") await asyncio.sleep(10) # Check every 10 seconds ``` **Solutions:** 1. **Use async operations:** ```python # Instead of blocking operations def blocking_operation(): time.sleep(1) # Bad - blocks event loop # Use async alternatives async def async_operation(): await asyncio.sleep(1) # Good - yields control ``` 2. **Implement backpressure:** ```python class BackpressureAwareServer(MCPGitServer): def __init__(self): super().__init__() self.processing_semaphore = asyncio.Semaphore(10) # Max 10 concurrent async def handle_message(self, message, session): async with self.processing_semaphore: return await super().handle_message(message, session) ``` ## Session Management Problems ### Sessions Not Cleaning Up **Symptoms:** - Growing number of inactive sessions - Memory usage increasing over time - Performance degradation **Diagnosis:** ```python async def diagnose_session_cleanup(): sessions = await session_manager.get_all_sessions() active_sessions = [s for s in sessions if s.is_active] idle_sessions = [s for s in sessions if s.idle_time > 3600] # 1 hour error_sessions = [s for s in sessions if s.state == SessionState.ERROR] print(f"Total sessions: {len(sessions)}") print(f"Active sessions: {len(active_sessions)}") print(f"Idle sessions (>1h): {len(idle_sessions)}") print(f"Error sessions: {len(error_sessions)}") if len(idle_sessions) > 100: print("⚠ Too many idle sessions - cleanup needed") ``` **Solutions:** 1. **Implement aggressive cleanup:** ```python async def aggressive_session_cleanup(): while True: # Clean up idle sessions more frequently closed = await session_manager.cleanup_idle_sessions(max_idle_time=1800) # 30 min if closed > 0: logger.info(f"Cleaned up {closed} idle sessions") await asyncio.sleep(300) # Every 5 minutes ``` 2. **Fix heartbeat mechanism:** ```python # Ensure heartbeat manager is working if not heartbeat_manager.running: logger.error("Heartbeat manager not running - restarting") await heartbeat_manager.start() ``` ### Heartbeat Failures **Symptoms:** - Sessions timing out unexpectedly - Clients getting disconnected - Heartbeat messages not processed **Diagnosis:** ```python async def diagnose_heartbeat_system(): # Check heartbeat manager status if not heartbeat_manager.running: print("✗ Heartbeat manager not running") return # Check heartbeat intervals print(f"Heartbeat interval: {heartbeat_manager.heartbeat_interval}s") print(f"Missed threshold: {heartbeat_manager.missed_heartbeat_threshold}") # Check recent heartbeats for session_id, last_heartbeat in heartbeat_manager.session_heartbeats.items(): time_since = (datetime.now() - last_heartbeat).total_seconds() if time_since > heartbeat_manager.heartbeat_interval * 2: print(f"⚠ Session {session_id} hasn't sent heartbeat in {time_since}s") ``` **Solutions:** 1. **Adjust heartbeat parameters:** ```python # More tolerant heartbeat settings heartbeat_manager = HeartbeatManager( session_manager=session_manager, heartbeat_interval=60.0, # Longer interval missed_heartbeat_threshold=5 # More tolerance ) ``` 2. **Implement heartbeat recovery:** ```python async def recover_heartbeat_failures(): """Attempt to recover from heartbeat failures.""" sessions = await session_manager.get_all_sessions() for session in sessions: if session.session_id not in heartbeat_manager.session_heartbeats: # Re-initialize heartbeat for this session await heartbeat_manager.record_heartbeat(session.session_id) logger.info(f"Recovered heartbeat for session {session.session_id}") ``` ## Validation Errors ### Message Format Issues **Symptoms:** - Validation errors for valid-looking messages - Inconsistent validation results - Unknown field errors **Diagnosis:** ```python from mcp_server_git.models.enhanced_validation import enhanced_validate_message def diagnose_validation_error(message_data): # Try different validation modes strict_result = enhanced_validate_message(message_data, strict_mode=True) lenient_result = enhanced_validate_message(message_data, strict_mode=False) print(f"Strict validation: {'✓' if strict_result.is_valid else '✗'}") print(f"Lenient validation: {'✓' if lenient_result.is_valid else '✗'}") if not strict_result.is_valid: print(f"Strict errors: {strict_result.error}") if not lenient_result.is_valid: print(f"Lenient errors: {lenient_result.error}") # Show validation warnings if hasattr(lenient_result, 'validation_warnings'): print(f"Warnings: {lenient_result.validation_warnings}") ``` **Solutions:** 1. **Use enhanced validation:** ```python # Enable fallback validation from mcp_server_git.models.enhanced_validation import enhanced_validate_message result = enhanced_validate_message(data, strict_mode=False) if result.is_valid: # Process even if there were minor validation issues return await process_message(result.model) ``` 2. **Log validation details:** ```python def log_validation_failure(message_data, error): logger.error("Validation failed", extra={ "message_type": message_data.get("type", "unknown"), "message_id": message_data.get("id", "unknown"), "error_type": type(error).__name__, "error_details": str(error), "message_keys": list(message_data.keys()) }) ``` ## Circuit Breaker Issues ### Circuit Stuck Open **Symptoms:** - Operations failing with "circuit open" errors - Circuit not recovering automatically - All requests being rejected **Diagnosis:** ```python def diagnose_circuit_breaker(circuit): print(f"Circuit: {circuit.name}") print(f"State: {circuit.state.value}") print(f"Failure count: {circuit.failure_count}") print(f"Failure threshold: {circuit.failure_threshold}") print(f"Last failure time: {circuit.last_failure_time}") print(f"Recovery timeout: {circuit.recovery_timeout}") if circuit.state == CircuitState.OPEN: time_since_failure = time.time() - circuit.last_failure_time print(f"Time since last failure: {time_since_failure}s") if time_since_failure > circuit.recovery_timeout: print("⚠ Circuit should have moved to half-open by now") ``` **Solutions:** 1. **Manual circuit reset:** ```python # Reset circuit if it's stuck if circuit.state == CircuitState.OPEN: time_since_failure = time.time() - circuit.last_failure_time if time_since_failure > circuit.recovery_timeout * 2: # Double timeout logger.warning(f"Manually resetting stuck circuit: {circuit.name}") circuit.reset() ``` 2. **Adjust circuit parameters:** ```python # More lenient circuit breaker circuit = CircuitBreaker( name="operations", failure_threshold=10, # Higher threshold recovery_timeout=60.0, # Longer recovery time half_open_max_calls=3 # More test calls ) ``` ## Memory and Resource Leaks ### Detecting Memory Leaks ```python import tracemalloc import gc from mcp_server_git.optimizations import MemoryMonitor class MemoryLeakDetector: def __init__(self): self.monitor = MemoryMonitor() self.snapshots = [] def start_tracking(self): """Start memory leak detection.""" tracemalloc.start() self.monitor.take_sample("leak_detection_start") def take_snapshot(self, label): """Take a memory snapshot.""" gc.collect() # Force garbage collection snapshot = tracemalloc.take_snapshot() memory_mb = self.monitor.take_sample(label) self.snapshots.append({ "label": label, "snapshot": snapshot, "memory_mb": memory_mb, "timestamp": datetime.now() }) return memory_mb def analyze_leaks(self): """Analyze potential memory leaks.""" if len(self.snapshots) < 2: return # Compare first and last snapshots first = self.snapshots[0] last = self.snapshots[-1] # Memory growth analysis growth = last["memory_mb"] - first["memory_mb"] print(f"Memory growth: {growth:.2f}MB") # Top memory allocations top_stats = last["snapshot"].compare_to( first["snapshot"], 'lineno' ) print("Top 10 memory allocations:") for stat in top_stats[:10]: print(f" {stat}") def get_top_allocators(self, count=10): """Get top memory allocators.""" if not self.snapshots: return [] snapshot = self.snapshots[-1]["snapshot"] top_stats = snapshot.statistics('lineno') return [(stat.traceback, stat.size) for stat in top_stats[:count]] # Usage leak_detector = MemoryLeakDetector() leak_detector.start_tracking() # In your main loop async def memory_monitoring_loop(): while True: leak_detector.take_snapshot(f"periodic_{datetime.now().isoformat()}") if len(leak_detector.snapshots) > 10: leak_detector.analyze_leaks() # Keep only last 5 snapshots del leak_detector.snapshots[:-5] # More memory efficient await asyncio.sleep(600) # Every 10 minutes ``` ### Resource Cleanup ```python class ResourceManager: def __init__(self): self.active_resources = {} self.resource_count = 0 async def acquire_resource(self, resource_type, resource_id=None): """Acquire a resource with automatic cleanup.""" if resource_id is None: resource_id = f"{resource_type}_{self.resource_count}" self.resource_count += 1 # Create resource if resource_type == "file": resource = await self._create_file_resource(resource_id) elif resource_type == "connection": resource = await self._create_connection_resource(resource_id) else: raise ValueError(f"Unknown resource type: {resource_type}") self.active_resources[resource_id] = { "resource": resource, "type": resource_type, "created_at": datetime.now() } return resource_id, resource async def release_resource(self, resource_id): """Release a specific resource.""" if resource_id not in self.active_resources: logger.warning(f"Resource {resource_id} not found for release") return resource_info = self.active_resources[resource_id] resource = resource_info["resource"] # Type-specific cleanup if resource_info["type"] == "file": await self._cleanup_file_resource(resource) elif resource_info["type"] == "connection": await self._cleanup_connection_resource(resource) del self.active_resources[resource_id] logger.debug(f"Released resource {resource_id}") async def cleanup_old_resources(self, max_age_seconds=3600): """Clean up resources older than specified age.""" cutoff = datetime.now() - timedelta(seconds=max_age_seconds) to_cleanup = [ resource_id for resource_id, info in self.active_resources.items() if info["created_at"] < cutoff ] for resource_id in to_cleanup: await self.release_resource(resource_id) return len(to_cleanup) ``` ## Debugging Tools ### Debug Mode Setup ```python import logging from mcp_server_git.logging_config import setup_debug_logging # Enable comprehensive debug logging setup_debug_logging() # Set specific component log levels logging.getLogger("mcp_server_git.session").setLevel(logging.DEBUG) logging.getLogger("mcp_server_git.validation").setLevel(logging.DEBUG) logging.getLogger("mcp_server_git.error_handling").setLevel(logging.DEBUG) ``` ### Interactive Debugging ```python import pdb from mcp_server_git.server import MCPGitServer class DebuggableMCPServer(MCPGitServer): async def handle_message(self, message, session): if os.getenv("MCP_DEBUG_BREAKPOINT"): pdb.set_trace() # Interactive debugging return await super().handle_message(message, session) ``` ### Performance Profiling ```python import cProfile import pstats from mcp_server_git.optimizations import CPUProfiler # Profile specific operations def profile_operation(operation_name, func, *args, **kwargs): profiler = cProfile.Profile() profiler.enable() try: result = func(*args, **kwargs) return result finally: profiler.disable() # Save profile data profiler.dump_stats(f"{operation_name}_profile.stats") # Print top functions stats = pstats.Stats(profiler) stats.sort_stats('tottime') stats.print_stats(10) # Top 10 functions ``` ## Recovery Procedures ### Server Recovery ```python async def emergency_server_recovery(): """Emergency recovery procedure for server issues.""" logger.info("Starting emergency server recovery") try: # 1. Stop all active operations await stop_all_operations() # 2. Clear all caches clear_validation_cache() gc.collect() # 3. Reset circuit breakers for circuit in get_all_circuit_breakers(): circuit.reset() # 4. Clean up sessions session_manager = get_session_manager() sessions = await session_manager.get_all_sessions() for session in sessions: await session_manager.close_session(session.session_id) # 5. Restart critical components await restart_heartbeat_manager() await restart_validation_system() logger.info("Emergency recovery completed successfully") return True except Exception as e: logger.error(f"Emergency recovery failed: {e}") return False async def graceful_restart(): """Gracefully restart the server.""" logger.info("Starting graceful server restart") # 1. Stop accepting new connections await server.stop_accepting_connections() # 2. Wait for active operations to complete await wait_for_operations_to_complete(timeout=300) # 5 minutes # 3. Save session state if needed await save_session_state() # 4. Shutdown components await server.shutdown() # 5. Restart server await server.start() # 6. Restore session state if needed await restore_session_state() logger.info("Graceful restart completed") ``` This troubleshooting guide provides comprehensive diagnostics and solutions for common issues that may arise when running the MCP Git Server with enhanced protocol compliance and stability features.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/MementoRC/mcp-git'

If you have feedback or need assistance with the MCP directory API, please join our Discord server