Amazon Security Lake MCP Server

asl-mcp
claude

security_lake_tool_improvements.md•8.2 kB

# Amazon Security Lake MCP Tool - Recommended Improvements ## 🚨 Critical Issues Found ### 1. **Timestamp Query Bug (High Priority)** **Problem**: The search tools fail with `TYPE_MISMATCH: Cannot apply operator: bigint <= timestamp(0)` errors. **Root Cause**: Incompatible timestamp filtering in SQL queries when working with Iceberg tables. **Fix Needed**: ```sql -- Current (broken): WHERE time <= 1735689600000 -- bigint compared to timestamp -- Should be (fixed): WHERE time <= TIMESTAMP '2025-01-01 00:00:00' -- OR WHERE CAST(time AS bigint) <= 1735689600000 -- OR WHERE time <= from_unixtime(1735689600000/1000) ``` ### 2. **Empty Results Despite Data Present** **Problem**: Search functions return 0 results even though 25,224+ records exist in tables. **Root Cause**: Query filtering logic prevents any data from being returned. ## 📋 Recommended Code Changes ### Priority 1: Fix Timestamp Handling **File**: Search query generation functions ```python # Current problematic timestamp filtering def build_time_filter(start_time, end_time): # BROKEN: Direct bigint comparison to timestamp columns return f"time >= {start_timestamp} AND time <= {end_timestamp}" # FIXED: Proper timestamp handling for Iceberg tables def build_time_filter(start_time, end_time): if start_time: start_ts = f"TIMESTAMP '{start_time}'" if end_time: end_ts = f"TIMESTAMP '{end_time}'" filters = [] if start_time: filters.append(f"time >= {start_ts}") if end_time: filters.append(f"time <= {end_ts}") return " AND ".join(filters) if filters else "1=1" ``` ### Priority 2: Improve Query Logic **Current Issues**: - Default time ranges may be too restrictive - No fallback when time filters return empty results - Hard-coded assumptions about data freshness **Recommended Changes**: ```python def search_security_lake_data(query_params): # 1. Try query with time filters first results = execute_query_with_filters(query_params) # 2. If empty and time filters present, try without time filters if not results and has_time_filters(query_params): results = execute_query_without_time_filters(query_params) # 3. If still empty, try a simple count to verify data exists if not results: count = execute_count_query(query_params.table) if count > 0: return { "warning": f"Table has {count} records but filters returned no results", "suggestion": "Try broader time range or remove filters" } return results ``` ### Priority 3: Better Error Handling ```python def execute_athena_query(sql_query): try: result = athena_client.start_query_execution(query_params) return wait_for_completion(result['QueryExecutionId']) except ClientError as e: error_code = e.response['Error']['Code'] if 'TYPE_MISMATCH' in str(e): return { "error": "Timestamp format incompatibility", "suggestion": "Check timestamp column formats in query", "technical_detail": str(e) } elif 'COLUMN_NOT_FOUND' in str(e): return { "error": "Column not found - possible schema mismatch", "suggestion": "Verify table schema matches OCSF version" } else: return {"error": str(e)} ``` ## 🔧 Feature Enhancements ### 1. **Schema Validation** ```python def validate_table_schema(table_name): """Verify table schema matches expected OCSF format""" schema = get_table_schema(table_name) required_fields = ['time', 'severity', 'class_uid', 'metadata'] missing_fields = [f for f in required_fields if f not in schema] if missing_fields: return False, f"Missing required fields: {missing_fields}" return True, "Schema valid" ``` ### 2. **Data Freshness Check** ```python def check_data_freshness(table_name): """Check when data was last updated""" query = f""" SELECT MAX(time) as latest_record, COUNT(*) as total_records, COUNT(DISTINCT DATE(time)) as days_with_data FROM {table_name} """ result = execute_query(query) return { "latest_data": result['latest_record'], "total_records": result['total_records'], "data_span_days": result['days_with_data'] } ``` ### 3. **Flexible Querying** ```python def adaptive_search(search_params): """Try multiple query strategies to find data""" strategies = [ lambda: search_with_full_filters(search_params), lambda: search_with_relaxed_time_filter(search_params), lambda: search_without_time_filter(search_params), lambda: search_recent_data_only(search_params) ] for strategy in strategies: try: results = strategy() if results and len(results) > 0: return results except Exception as e: continue # Try next strategy return {"error": "No data found with any search strategy"} ``` ## 🧪 Testing Recommendations ### 1. **Unit Tests for Timestamp Handling** ```python def test_timestamp_conversion(): # Test various timestamp formats test_cases = [ ("2025-01-01T00:00:00Z", "ISO format"), (1735689600000, "Unix milliseconds"), (1735689600, "Unix seconds"), ] for timestamp, description in test_cases: result = convert_to_athena_timestamp(timestamp) assert result.startswith("TIMESTAMP"), f"Failed for {description}" ``` ### 2. **Integration Tests** ```python def test_end_to_end_search(): # Test with known data results = search_security_hub_findings({"limit": 1}) assert len(results) > 0, "Should find at least one record" # Test with impossible filters results = search_with_filters({"start_time": "2030-01-01"}) assert len(results) == 0, "Future date should return no results" ``` ## 📊 Monitoring & Debugging ### 1. **Query Performance Metrics** ```python def log_query_performance(query, execution_time, result_count): metrics = { "query_hash": hashlib.md5(query.encode()).hexdigest(), "execution_time_ms": execution_time, "result_count": result_count, "timestamp": datetime.utcnow().isoformat() } # Log to CloudWatch or local metrics ``` ### 2. **Data Source Health Check** ```python def health_check_all_sources(): """Check all configured data sources""" sources = ['security_hub', 'vpc_flow', 'route53', 'eks_audit', 'lambda_execution'] health_status = {} for source in sources: try: count = get_record_count(source) latest = get_latest_record_time(source) health_status[source] = { "status": "healthy" if count > 0 else "empty", "record_count": count, "latest_data": latest } except Exception as e: health_status[source] = {"status": "error", "error": str(e)} return health_status ``` ## 🎯 Implementation Priority 1. **Immediate (Critical)**: - Fix timestamp TYPE_MISMATCH errors - Add fallback queries when filters return empty results 2. **Short Term (1-2 weeks)**: - Implement adaptive search strategies - Add better error messages and suggestions 3. **Medium Term (1 month)**: - Add comprehensive schema validation - Implement data freshness monitoring - Add performance metrics 4. **Long Term (Ongoing)**: - Advanced query optimization - Predictive data source health monitoring - Auto-remediation for common issues ## 🔍 Key Files to Modify Based on typical MCP tool structure: - `src/amazon_security_lake/search.py` - Core search logic - `src/amazon_security_lake/athena_client.py` - Query execution - `src/amazon_security_lake/formatters.py` - Timestamp handling - `tests/test_search_integration.py` - Add comprehensive tests The tool is fundamentally well-designed but needs these critical bug fixes to unlock its full potential with the 25K+ Security Hub findings already available!

Latest Blog Posts

Model Context Protocol Proxies: Enabling Enterprise Control with Virtual MCPs
By Om-Shree-0709 on December 9, 2025.
AI Security
Virtual MCP
Kubernetes Operator
The State of MCP in 2025: Who's Building What and Why It Matters
By punkpeye on December 7, 2025.
mcp
startups
MCP hosting with persistent storage
By punkpeye on December 6, 2025.
changelog

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/kebabmane/asl-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server