Skip to main content
Glama

firewalla-mcp-server

geographic-data-handling-guide.md52.5 kB
# Geographic Data Handling and Normalization Guide This guide provides comprehensive documentation on geographic data processing, unknown value handling, and normalization patterns used throughout the Firewalla MCP Server. ## Table of Contents - [Overview](#overview) - [Geographic Data Sources](#geographic-data-sources) - [Unknown Value Handling](#unknown-value-handling) - [Data Normalization Patterns](#data-normalization-patterns) - [Geographic Enrichment Process](#geographic-enrichment-process) - [Caching and Performance](#caching-and-performance) - [Quality Assurance](#quality-assurance) - [Error Handling](#error-handling) - [Best Practices](#best-practices) - [Troubleshooting](#troubleshooting) ## Overview The Firewalla MCP Server implements sophisticated geographic data handling to enrich network flows, security alarms, and device information with location-based intelligence. This system addresses the challenge of inconsistent, missing, or "unknown" geographic data while maintaining performance and reliability. ### Geographic Data Challenges 1. **Inconsistent Sources**: Different APIs return varying geographic data formats 2. **Missing Data**: Not all IP addresses have complete geographic information 3. **Unknown Values**: APIs often return "unknown", null, or placeholder values 4. **Performance Impact**: Geographic enrichment adds processing overhead 5. **Cache Management**: Balancing accuracy with performance through intelligent caching ### Data Consistency Goals - **Standardized Format**: All geographic data follows consistent field naming and structure - **Reliable Fallbacks**: Graceful handling of missing or invalid data - **Performance Optimization**: Intelligent caching reduces API calls and processing time - **Quality Metrics**: Tracking and reporting of data quality and completeness ## Geographic Data Sources ### Primary Data Sources #### 1. Firewalla MSP API Geographic Data ```typescript interface FirewallaGeoData { country?: string; // "United States", "China", null, "unknown" region?: string; // "California", "Beijing", null, "unknown" city?: string; // "San Francisco", "Beijing", null, "unknown" asn?: string; // "AS15169", "AS4134", null, "unknown" org?: string; // "Google LLC", "China Telecom", null, "unknown" isp?: string; // "Google", "China Telecom", null, "unknown" lat?: number; // 37.7749, 39.9042, null lng?: number; // -122.4194, 116.4074, null } ``` #### 2. Enhanced Geographic Enrichment ```typescript /** * Canonical representation returned by all normalization helpers. * Keep this in sync with the EnrichedGeoData contract below. */ interface NormalizedGeoData { country: string; country_code: string; continent: string; region: string; city: string; asn: string | number | 'Unknown'; asn_name: string; coordinates: { lat: number | null; lng: number | null }; hosting_provider: string | null; is_cloud_provider: boolean; is_vpn: boolean; is_proxy: boolean; geographic_risk_score: number; // 0.0-1.0 threat_intelligence: Record<string, any>; data_quality: { completeness_score: number; // 0-1 confidence_level: 'high' | 'medium' | 'low'; last_updated: string; // ISO-8601 source: string; }; /** * Internal bookkeeping injected by the normalizer. * Not transmitted over the wire. */ _normalization_info?: { fallbacks_used?: number; validation_passed?: boolean; fallback?: boolean; original_data?: any; [key: string]: any; }; } interface EnrichedGeoData { // Normalized core fields country: string; // Always present, standardized country_code: string; // ISO 3166-1 alpha-2 code continent: string; // Derived from country region: string; // State/province, normalized city: string; // City name, normalized // Network infrastructure asn: string; // Autonomous System Number asn_name: string; // AS organization name hosting_provider?: string; // Cloud/hosting provider classification is_cloud_provider: boolean; is_vpn: boolean; is_proxy: boolean; // Risk and threat intelligence geographic_risk_score: number; // 0.0-1.0 risk rating threat_intelligence: { high_risk_country: boolean; known_threat_source: boolean; malware_hosting: boolean; }; // Data quality metrics data_quality: { completeness_score: number; // 0.0-1.0 completeness rating confidence_level: 'high' | 'medium' | 'low'; last_updated: string; // ISO timestamp source: string; // Data source identifier }; } ``` ### Data Source Reliability ```typescript const geoDataReliability = { firewalla_api: { availability: '99.5%', completeness: '85%', // 85% of IPs have some geographic data accuracy: '92%', // High accuracy for available data update_frequency: 'daily' }, maxmind_fallback: { availability: '99.9%', completeness: '95%', // Higher completeness accuracy: '90%', // Slightly lower accuracy update_frequency: 'weekly' }, threat_intelligence: { availability: '98%', completeness: '60%', // Only for known threat sources accuracy: '95%', // High accuracy for threat data update_frequency: 'hourly' } } ``` ## Unknown Value Handling ### Unknown Value Patterns The system encounters various forms of "unknown" or missing data: #### 1. Explicit Unknown Values ```typescript const explicitUnknownPatterns = [ 'unknown', 'n/a', 'null', 'undefined', '', '-', '?', 'unavailable' ]; ``` #### 2. Null and Undefined Values ```typescript // Common null/undefined patterns const nullPatterns = { javascript_null: null, javascript_undefined: undefined, json_null: 'null', empty_string: '', whitespace_only: ' ' }; ``` #### 3. Invalid or Placeholder Data ```typescript const invalidDataPatterns = { invalid_coordinates: { lat: 0, lng: 0 }, // Ocean coordinates placeholder_country: 'XX', // Invalid country code generic_asn: 'AS0', // Invalid ASN test_data: 'test', // Test environment data localhost_data: '127.0.0.1' // Local IP addresses }; ``` ### Normalization Functions #### 1. Country Normalization ```typescript function normalizeCountry(rawCountry: any): string { // Handle null, undefined, empty values if (!rawCountry || typeof rawCountry !== 'string' || rawCountry.trim() === '') { return 'Unknown'; } // Handle explicit unknown patterns const normalized = rawCountry.trim().toLowerCase(); if (explicitUnknownPatterns.includes(normalized)) { return 'Unknown'; } // Handle invalid country codes if (normalized.length === 2 && normalized === 'xx') { return 'Unknown'; } // Standardize country names const countryMappings = { 'us': 'United States', 'usa': 'United States', 'united states of america': 'United States', 'cn': 'China', 'prc': 'China', "people's republic of china": 'China', 'ru': 'Russia', 'russian federation': 'Russia' }; return countryMappings[normalized] || rawCountry.charAt(0).toUpperCase() + rawCountry.slice(1).toLowerCase(); } // Example usage and results const countryExamples = { normalizeCountry(null) => 'Unknown', normalizeCountry('unknown') => 'Unknown', normalizeCountry('') => 'Unknown', normalizeCountry('us') => 'United States', normalizeCountry('CHINA') => 'China', normalizeCountry('russian federation') => 'Russia' }; ``` #### 2. ASN Normalization ```typescript function normalizeASN(rawASN: any): string { if (!rawASN || typeof rawASN !== 'string' || rawASN.trim() === '') { return 'Unknown'; } const cleaned = rawASN.trim().toLowerCase(); // Handle explicit unknown patterns if (explicitUnknownPatterns.includes(cleaned)) { return 'Unknown'; } // Handle invalid ASN patterns if (cleaned === 'as0' || cleaned === '0') { return 'Unknown'; } // Ensure proper ASN format if (/^as\d+$/i.test(cleaned)) { return cleaned.toUpperCase(); } if (/^\d+$/.test(cleaned)) { return `AS${cleaned}`; } return 'Unknown'; } // Example usage and results const asnExamples = { normalizeASN(null) => 'Unknown', normalizeASN('unknown') => 'Unknown', normalizeASN('AS0') => 'Unknown', normalizeASN('15169') => 'AS15169', normalizeASN('as4134') => 'AS4134' }; ``` #### 3. Coordinate Normalization ```typescript function normalizeCoordinates(lat: any, lng: any): { lat: number | null, lng: number | null } { const parsedLat = parseFloat(lat); const parsedLng = parseFloat(lng); // Check for valid numeric values if (isNaN(parsedLat) || isNaN(parsedLng)) { return { lat: null, lng: null }; } // Check for placeholder coordinates (0,0 often indicates unknown) if (parsedLat === 0 && parsedLng === 0) { return { lat: null, lng: null }; } // Validate coordinate ranges if (parsedLat < -90 || parsedLat > 90 || parsedLng < -180 || parsedLng > 180) { return { lat: null, lng: null }; } return { lat: Math.round(parsedLat * 10000) / 10000, // 4 decimal precision lng: Math.round(parsedLng * 10000) / 10000 }; } ``` ## Data Normalization Patterns ### Batch Normalization Process ```typescript interface NormalizationConfig { fields: Record<string, (value: any) => any>; fallbacks: Record<string, any>; validation: Record<string, (value: any) => boolean>; qualityMetrics: boolean; } // Helper function to normalize individual geographic data items function normalizeGeoDataItem( item: any, config: NormalizationConfig ): NormalizedGeoData { const normalized: any = { _normalization_info: { fallbacks_used: 0, validation_passed: true } }; // Apply field transformations for (const [field, transformer] of Object.entries(config.fields)) { const value = item[field]; // Apply validation if provided if (config.validation[field] && !config.validation[field](value)) { normalized[field] = config.fallbacks[field]; normalized._normalization_info.fallbacks_used++; } else { normalized[field] = transformer(value); } } return normalized as NormalizedGeoData; } // Helper function to create fallback geographic data function createFallbackGeoData(item: any): NormalizedGeoData { // Minimal but structurally correct fallback return { country: 'Unknown', country_code: 'XX', continent: 'Unknown', region: 'Unknown', city: 'Unknown', asn: 'Unknown', asn_name: 'Unknown', coordinates: { lat: null, lng: null }, hosting_provider: null, is_cloud_provider: false, is_vpn: false, is_proxy: false, geographic_risk_score: 0, threat_intelligence: {}, data_quality: { completeness_score: 0, confidence_level: 'low', last_updated: new Date().toISOString(), source: 'fallback' }, _normalization_info: { fallback: true, original_data: item } } as NormalizedGeoData; } // Simple logger interface for the example const logger = { info: (message: string, data: any) => { console.log(`[INFO] ${message}`, JSON.stringify(data, null, 2)); }, warn: (message: string, data: any) => { console.warn(`[WARN] ${message}`, JSON.stringify(data, null, 2)); } }; function batchNormalizeGeoData( rawData: any[], config: NormalizationConfig ): NormalizedGeoData[] { const startTime = Date.now(); const results: NormalizedGeoData[] = []; const qualityStats = { total: rawData.length, normalized: 0, fallbacks_used: 0, validation_failures: 0 }; for (const item of rawData) { try { const normalized = normalizeGeoDataItem(item, config); results.push(normalized); qualityStats.normalized++; // Track fallback usage if (normalized._normalization_info?.fallbacks_used > 0) { qualityStats.fallbacks_used++; } } catch (error) { qualityStats.validation_failures++; // Create fallback item with minimal data results.push(createFallbackGeoData(item)); } } const processingTime = Date.now() - startTime; // Log quality metrics if (config.qualityMetrics) { logger.info('Geographic data normalization completed', { stats: qualityStats, processing_time: processingTime, success_rate: qualityStats.total > 0 ? qualityStats.normalized / qualityStats.total : 0 }); } return results; } ``` ### Field-Specific Normalization #### 1. Country and Region Normalization ```typescript const geographicNormalization = { country: (value: any) => { const normalized = normalizeCountry(value); return { value: normalized, confidence: normalized === 'Unknown' ? 'low' : 'high', source: normalized === 'Unknown' ? 'fallback' : 'api' }; }, region: (value: any) => { if (!value || typeof value !== 'string') { return { value: 'Unknown', confidence: 'low', source: 'fallback' }; } const cleaned = value.trim(); if (explicitUnknownPatterns.includes(cleaned.toLowerCase())) { return { value: 'Unknown', confidence: 'low', source: 'fallback' }; } return { value: cleaned, confidence: 'medium', source: 'api' }; }, city: (value: any) => { if (!value || typeof value !== 'string') { return { value: 'Unknown', confidence: 'low', source: 'fallback' }; } const cleaned = value.trim(); if (explicitUnknownPatterns.includes(cleaned.toLowerCase())) { return { value: 'Unknown', confidence: 'low', source: 'fallback' }; } return { value: cleaned, confidence: 'medium', source: 'api' }; } }; ``` #### 2. Network Infrastructure Normalization ```typescript const networkNormalization = { asn: (value: any) => { const normalized = normalizeASN(value); return { value: normalized, confidence: normalized === 'Unknown' ? 'low' : 'high', numeric: normalized !== 'Unknown' ? parseInt(normalized.replace('AS', '')) : null }; }, hosting_provider: (value: any) => { if (!value || typeof value !== 'string') { return { value: null, confidence: 'low' }; } const cleaned = value.toLowerCase().trim(); // Map common hosting providers const providerMappings = { 'google': 'Google', 'amazon': 'Amazon', 'cloudflare': 'Cloudflare', 'microsoft': 'Microsoft', 'alibaba': 'Alibaba Cloud', 'tencent': 'Tencent Cloud' }; for (const [key, provider] of Object.entries(providerMappings)) { if (cleaned.includes(key)) { return { value: provider, confidence: 'high' }; } } return { value: value.trim(), confidence: 'medium' }; } }; ``` ### Quality Score Calculation ```typescript function calculateDataQuality(normalizedData: NormalizedGeoData): number { let score = 0; let maxScore = 0; // Core geographic fields (40% of total score) const coreFields = ['country', 'region', 'city']; for (const field of coreFields) { maxScore += 40 / coreFields.length; if (normalizedData[field] && normalizedData[field] !== 'Unknown') { score += 40 / coreFields.length; } } // Network infrastructure (30% of total score) const networkFields = ['asn', 'hosting_provider']; for (const field of networkFields) { maxScore += 30 / networkFields.length; if (normalizedData[field] && normalizedData[field] !== 'Unknown') { score += 30 / networkFields.length; } } // Coordinates (20% of total score) maxScore += 20; if (normalizedData.coordinates && normalizedData.coordinates.lat !== null && normalizedData.coordinates.lng !== null) { score += 20; } // Threat intelligence (10% of total score) maxScore += 10; if (normalizedData.threat_intelligence && Object.keys(normalizedData.threat_intelligence).length > 0) { score += 10; } return Number((score / maxScore).toFixed(2)); // Return 0.0-1.0 } ``` ## Geographic Enrichment Process ### Enrichment Pipeline ```typescript // Interface definitions for type safety interface NetworkFlow { id: string; source_ip: string; destination_ip: string; protocol: string; bytes: number; timestamp: string; [key: string]: any; // Allow additional properties } interface EnrichedNetworkFlow extends NetworkFlow { source_geo: NormalizedGeoData | null; destination_geo: NormalizedGeoData | null; enrichment_metadata: { enriched_at: string; version: string; fallback?: boolean; }; } class GeographicEnrichmentPipeline { private cache = new GeographicCache(); private getCachedGeoData(ips: string[]): Record<string, NormalizedGeoData> { const cachedData: Record<string, NormalizedGeoData> = {}; for (const ip of ips) { const geoData = this.cache.getGeoData(ip); if (geoData) { cachedData[ip] = geoData; } } return cachedData; } private updateGeoDataCache(geoData: Record<string, NormalizedGeoData>): void { // Update cache with already-normalized geo data for (const [ip, data] of Object.entries(geoData)) { if (data && typeof data === 'object') { // Data is already normalized, just cache it directly this.cache.setGeoData(ip, data); } } } async enrichFlowData(flows: NetworkFlow[]): Promise<EnrichedNetworkFlow[]> { const enriched: EnrichedNetworkFlow[] = []; for (const flow of flows) { try { // 1. Extract IPs for enrichment const ipsToEnrich = this.extractIPsFromFlow(flow); // 2. Check cache for existing data const cachedData = this.getCachedGeoData(ipsToEnrich); // 3. Enrich missing IPs const missingIPs = ipsToEnrich.filter(ip => !cachedData[ip]); const newGeoData = await this.enrichIPsWithGeoData(missingIPs); // 4. Normalize only new data (cached data is already normalized) const normalizedNewData = this.normalizeGeoDataBatch(newGeoData); // 5. Combine cached and newly normalized data const allGeoData = { ...cachedData, ...normalizedNewData }; // 6. Apply to flow (risk scores already calculated in normalization) const enrichedFlow = this.applyGeoDataToFlow(flow, allGeoData); enriched.push(enrichedFlow); // 7. Update cache with normalized new data this.updateGeoDataCache(normalizedNewData); } catch (error) { logger.warn('Geographic enrichment failed for flow', { flow_id: flow.id, error: error.message }); // Add flow with minimal geographic data enriched.push(this.createFallbackEnrichedFlow(flow)); } } return enriched; } // Helper functions for geographic data normalization private normalizeCountry(country: any): string { // Delegate to the global normalizeCountry function for consistency return normalizeCountry(country); } private normalizeCountryCode(country: any): string { if (typeof country !== 'string') return 'XX'; const normalized = country.trim().toLowerCase(); // Common country name to ISO-3166-1 alpha-2 code mappings // Note: In production, consider using a library like 'i18n-iso-countries' for comprehensive support const countryToCode: Record<string, string> = { 'united states': 'US', 'united states of america': 'US', 'usa': 'US', 'united kingdom': 'GB', 'uk': 'GB', 'great britain': 'GB', 'china': 'CN', 'people\'s republic of china': 'CN', 'prc': 'CN', 'russia': 'RU', 'russian federation': 'RU', 'germany': 'DE', 'france': 'FR', 'japan': 'JP', 'canada': 'CA', 'australia': 'AU', 'brazil': 'BR', 'india': 'IN', 'south korea': 'KR', 'mexico': 'MX', 'spain': 'ES', 'italy': 'IT', 'netherlands': 'NL', 'singapore': 'SG', 'switzerland': 'CH', 'sweden': 'SE', 'norway': 'NO', 'denmark': 'DK', 'finland': 'FI', 'poland': 'PL', 'ukraine': 'UA', 'belgium': 'BE', 'austria': 'AT', 'ireland': 'IE', 'new zealand': 'NZ', 'israel': 'IL', 'united arab emirates': 'AE', 'uae': 'AE', 'saudi arabia': 'SA', 'turkey': 'TR', 'south africa': 'ZA', 'argentina': 'AR', 'chile': 'CL', 'colombia': 'CO', 'egypt': 'EG', 'greece': 'GR', 'portugal': 'PT', 'czech republic': 'CZ', 'romania': 'RO', 'hungary': 'HU', 'vietnam': 'VN', 'thailand': 'TH', 'malaysia': 'MY', 'indonesia': 'ID', 'philippines': 'PH', 'pakistan': 'PK', 'bangladesh': 'BD', 'nigeria': 'NG', 'kenya': 'KE', 'ethiopia': 'ET', 'morocco': 'MA', 'peru': 'PE', 'venezuela': 'VE', 'ecuador': 'EC', 'taiwan': 'TW', 'hong kong': 'HK', 'luxembourg': 'LU', 'bulgaria': 'BG', 'croatia': 'HR', 'serbia': 'RS', 'slovakia': 'SK', 'slovenia': 'SI', 'lithuania': 'LT', 'latvia': 'LV', 'estonia': 'EE', 'iceland': 'IS', 'malta': 'MT', 'cyprus': 'CY' }; // If we have a mapping, use it if (countryToCode[normalized]) { return countryToCode[normalized]; } // Check if it's already a 2-letter code if (normalized.length === 2 && /^[a-z]{2}$/.test(normalized)) { return normalized.toUpperCase(); } // Default to unknown return 'XX'; } private deriveContinent(country: any): string { // Simple continent mapping - in production use a proper geo library const continentMap: Record<string, string> = { 'US': 'North America', 'CA': 'North America', 'CN': 'Asia', 'JP': 'Asia', 'DE': 'Europe', 'FR': 'Europe', 'GB': 'Europe', 'AU': 'Oceania' }; const code = this.normalizeCountryCode(country); return continentMap[code] || 'Unknown'; } private normalizeRegion(region: any): string { return typeof region === 'string' ? region : 'Unknown'; } private normalizeCity(city: any): string { return typeof city === 'string' ? city : 'Unknown'; } private parseASNNumber(asn: any): string | number | 'Unknown' { if (typeof asn === 'number' && asn > 0) { return asn; } if (typeof asn === 'string' && asn.trim() !== '') { // Try to extract number from strings like "AS15169" const match = asn.match(/AS?(\d+)/i); if (match) { const num = parseInt(match[1]); return num > 0 ? num : 'Unknown'; } return asn; // Return string as-is if not AS format } return 'Unknown'; } private normalizeASNName(org: any): string { return typeof org === 'string' ? org : 'Unknown'; } private normalizeCoordinates(lat: any, lng: any): { lat: number | null; lng: number | null } { // Use the standalone normalizeCoordinates helper function defined earlier return normalizeCoordinates(lat, lng); } private classifyHostingProvider(org: any, isp: any): string | null { if (!org && !isp) return null; const providerMappings: Record<string, string> = { amazon: 'Amazon', aws: 'Amazon', google: 'Google', microsoft: 'Microsoft', azure: 'Microsoft', cloudflare: 'Cloudflare', alibaba: 'Alibaba Cloud', tencent: 'Tencent Cloud', digitalocean: 'DigitalOcean', ovh: 'OVH', hetzner: 'Hetzner' }; const combined = `${org ?? ''} ${isp ?? ''}`.toLowerCase(); for (const [keyword, provider] of Object.entries(providerMappings)) { if (combined.includes(keyword)) return provider; } // Check for generic hosting keywords const hostingKeywords = ['hosting', 'cloud', 'datacenter', 'server', 'vps']; if (hostingKeywords.some(keyword => combined.includes(keyword))) { return 'Generic Hosting Provider'; } return null; } private isCloudProvider(org: any, isp: any): boolean { const cloudProviders = ['amazon', 'google', 'microsoft', 'azure', 'aws']; const combined = `${org ?? ''} ${isp ?? ''}`.toLowerCase(); return cloudProviders.some(provider => combined.includes(provider)); } private isVPNProvider(org: any, isp: any): boolean { const vpnKeywords = ['vpn', 'proxy', 'tunnel']; const orgStr = String(org || '').toLowerCase(); const ispStr = String(isp || '').toLowerCase(); return vpnKeywords.some(keyword => orgStr.includes(keyword) || ispStr.includes(keyword) ); } private isProxyProvider(org: any): boolean { const proxyKeywords = ['proxy', 'anonymizer']; const orgStr = String(org || '').toLowerCase(); return proxyKeywords.some(keyword => orgStr.includes(keyword)); } private calculateGeographicRisk(data: any): number { // Simple risk scoring - in production use proper threat intelligence let risk = 0; const maxPossibleRisk = 0.60; // Sum of all risk factors // Check if we have normalized data fields or raw data fields if ('is_vpn' in data && 'is_proxy' in data && 'is_cloud_provider' in data) { // Working with NormalizedGeoData if (data.is_vpn) risk += 0.30; if (data.is_proxy) risk += 0.20; if (data.is_cloud_provider) risk += 0.10; } else { // Working with raw data if (this.isVPNProvider(data.org, data.isp)) risk += 0.30; if (this.isProxyProvider(data.org)) risk += 0.20; if (this.isCloudProvider(data.org, data.isp)) risk += 0.10; } // Scale to 0-1 range (current max is 0.6) return Number((risk / maxPossibleRisk).toFixed(2)); } private enrichThreatIntelligence(ip: string, data: any): any { // Placeholder for threat intelligence enrichment return { reputation_score: 50, // neutral threat_categories: [], last_seen_malicious: null }; } // Simple quality check for raw API data before normalization private calculateRawDataCompleteness(data: any): number { let quality = 0; if (data.country) quality += 25; if (data.city) quality += 25; if (data.org) quality += 25; if (data.asn) quality += 25; return quality / 100; // Normalize to 0-1 scale } private calculateConfidenceLevel(data: any): 'high' | 'medium' | 'low' { // Simple confidence calculation based on data completeness const score = this.calculateRawDataCompleteness(data); if (score >= 0.75) return 'high'; if (score >= 0.50) return 'medium'; return 'low'; } private normalizeGeoDataBatch(rawGeoData: Record<string, any>): Record<string, NormalizedGeoData> { const normalized: Record<string, NormalizedGeoData> = {}; for (const [ip, data] of Object.entries(rawGeoData)) { normalized[ip] = { country: this.normalizeCountry(data.country), country_code: this.normalizeCountryCode(data.country), continent: this.deriveContinent(data.country), region: this.normalizeRegion(data.region), city: this.normalizeCity(data.city), asn: this.parseASNNumber(data.asn), asn_name: this.normalizeASNName(data.org), coordinates: this.normalizeCoordinates(data.lat, data.lng), hosting_provider: this.classifyHostingProvider(data.org, data.isp), is_cloud_provider: this.isCloudProvider(data.org, data.isp), is_vpn: this.isVPNProvider(data.org, data.isp), is_proxy: this.isProxyProvider(data.org), geographic_risk_score: this.calculateGeographicRisk(data), threat_intelligence: this.enrichThreatIntelligence(ip, data), data_quality: { completeness_score: this.calculateRawDataCompleteness(data), confidence_level: this.calculateConfidenceLevel(data), last_updated: new Date().toISOString(), source: 'firewalla_api' } }; } return normalized; } // Missing helper methods for enrichFlowData private extractIPsFromFlow(flow: NetworkFlow): string[] { const ips: string[] = []; if (flow.source_ip) ips.push(flow.source_ip); if (flow.destination_ip) ips.push(flow.destination_ip); return [...new Set(ips)]; // Remove duplicates } private async enrichIPsWithGeoData(ips: string[]): Promise<Record<string, any>> { // In production, this would call your geo IP API service // Example implementation with mock data for demonstration const result: Record<string, any> = {}; for (const ip of ips) { // Simulate API response with realistic data structure result[ip] = { ip: ip, country: 'United States', country_code: 'US', region: 'California', city: 'San Francisco', lat: 37.7749, lng: -122.4194, asn: 'AS15169', org: 'Google LLC', isp: 'Google', timezone: 'America/Los_Angeles', is_vpn: false, is_proxy: false, is_datacenter: true }; } return result; } // NOTE: This method is now redundant as risk scores are calculated during normalization // Kept for backward compatibility or if separate risk recalculation is needed private calculateRiskScores(geoData: Record<string, NormalizedGeoData>): Record<string, NormalizedGeoData> { // Apply risk scoring to each geographic entry const riskEnriched: Record<string, NormalizedGeoData> = {}; for (const [ip, data] of Object.entries(geoData)) { riskEnriched[ip] = { ...data, geographic_risk_score: this.calculateGeographicRisk(data) }; } return riskEnriched; } private applyGeoDataToFlow(flow: NetworkFlow, geoData: Record<string, NormalizedGeoData>): EnrichedNetworkFlow { return { ...flow, source_geo: geoData[flow.source_ip] || null, destination_geo: geoData[flow.destination_ip] || null, enrichment_metadata: { enriched_at: new Date().toISOString(), version: '1.0' } }; } private createFallbackEnrichedFlow(flow: NetworkFlow): EnrichedNetworkFlow { return { ...flow, source_geo: null, destination_geo: null, enrichment_metadata: { enriched_at: new Date().toISOString(), version: '1.0', fallback: true } }; } } ``` ### Risk Score Calculation ```typescript // Note: calculateGeographicRisk is implemented as a private method in the GeographicEnrichmentPipeline class above ``` ## Caching and Performance ### Cache Configuration ```typescript interface GeoCacheConfig { ttl: number; // Time to live in seconds maxEntries: number; // Maximum cache entries // Note: The following flags are placeholders for future enhancements compressionEnabled: boolean; // Not implemented in sample cache persistToDisk: boolean; // Not implemented in sample cache } // Type definition for log entries used in cache warming interface LogEntry { timestamp: string; source_ip?: string; destination_ip?: string; [key: string]: any; // Allow additional fields } const geoCacheConfig: GeoCacheConfig = { ttl: 3600, // 1 hour cache maxEntries: 10000, // 10k IP addresses compressionEnabled: true, persistToDisk: true }; ``` ### Cache Performance Metrics ```typescript interface CachedGeoData extends NormalizedGeoData { _cachedAt: number; } class GeographicCache { private cache = new Map<string, CachedGeoData>(); private hitCount = 0; private missCount = 0; private totalRequests = 0; getGeoData(ip: string): NormalizedGeoData | null { this.totalRequests++; const cached = this.cache.get(ip); if (cached && (Date.now() - cached._cachedAt) < geoCacheConfig.ttl * 1000) { this.hitCount++; // Return without the internal _cachedAt property const { _cachedAt, ...geoData } = cached; return geoData; } // Remove expired entry if exists if (cached) { this.cache.delete(ip); } // Clean up other expired entries to prevent stale data accumulation // Run cleanup periodically to avoid O(n) cost on every cache miss if (this.totalRequests % 100 === 0) { this.removeExpiredEntries(); } this.missCount++; return null; } setGeoData(ip: string, data: NormalizedGeoData): void { // Add timestamp for TTL checking const cachedData: CachedGeoData = { ...data, _cachedAt: Date.now() }; this.cache.set(ip, cachedData); // Prune cache if needed this.pruneIfNeeded(); } private removeExpiredEntries(): void { const now = Date.now(); const ttlMs = geoCacheConfig.ttl * 1000; for (const [ip, data] of this.cache.entries()) { if (now - data._cachedAt > ttlMs) { this.cache.delete(ip); } } } private pruneIfNeeded(): void { if (this.cache.size <= geoCacheConfig.maxEntries) return; // Remove expired entries first this.removeExpiredEntries(); // If still over limit, use FIFO eviction (oldest entries first) // Map maintains insertion order, so first entries are oldest // // Note: This cache uses FIFO (First In, First Out) eviction strategy. // FIFO is simpler than LRU and performs well for geographic data where // access patterns are often temporal (recent IPs are more likely to be accessed again). // // For LRU (Least Recently Used) behavior, consider using a dedicated // LRU cache library like 'lru-cache' npm package. if (this.cache.size > geoCacheConfig.maxEntries) { const entriesToRemove = this.cache.size - geoCacheConfig.maxEntries; const keysIterator = this.cache.keys(); for (let i = 0; i < entriesToRemove; i++) { const oldestKey = keysIterator.next().value; if (oldestKey !== undefined) { this.cache.delete(oldestKey); } } } } getPerformanceMetrics() { return { hit_rate: this.totalRequests ? this.hitCount / this.totalRequests : 0, miss_rate: this.totalRequests ? this.missCount / this.totalRequests : 0, total_requests: this.totalRequests, cache_size: this.cache.size, memory_usage: typeof process !== 'undefined' ? process.memoryUsage().heapUsed : 0 }; } } ``` ### Cache Optimization Strategies ```typescript // Helper function stubs for cache optimization async function preloadIPRange(cidr: string): Promise<void> { // Implementation would iterate through IP range and pre-cache } function extractUniqueIPs(logs: any[]): string[] { // Implementation would extract and deduplicate IPs from log entries return []; } async function enrichAndCacheIPBatch(ips: string[]): Promise<void> { // Implementation would enrich multiple IPs in parallel and cache results } const cacheOptimization = { // Preload common IP ranges preloadCommonRanges: async () => { const commonRanges = [ '8.8.8.0/24', // Google DNS '1.1.1.0/24', // Cloudflare DNS '192.168.0.0/16', // Private networks '10.0.0.0/8' // Private networks ]; for (const range of commonRanges) { await preloadIPRange(range); } }, // Batch cache warming warmCacheFromLogs: async (logEntries: LogEntry[]) => { const uniqueIPs = extractUniqueIPs(logEntries); const batchSize = 100; for (let i = 0; i < uniqueIPs.length; i += batchSize) { const batch = uniqueIPs.slice(i, i + batchSize); await enrichAndCacheIPBatch(batch); } }, // Smart eviction based on access patterns intelligentEviction: () => { // Keep frequently accessed IPs // Evict old, rarely accessed entries // Prioritize high-quality data } }; ``` ## Quality Assurance ### Data Quality Monitoring ```typescript interface DataQualityMetrics { completeness: { country: number; // % of records with valid country region: number; // % of records with valid region city: number; // % of records with valid city asn: number; // % of records with valid ASN coordinates: number; // % of records with valid coordinates }; accuracy: { country_validation: number; // % passing country validation coordinate_validation: number; // % passing coordinate validation asn_validation: number; // % passing ASN validation }; consistency: { country_region_match: number; // % where country/region are consistent coordinate_country_match: number; // % where coordinates match country }; timeliness: { cache_hit_rate: number; // % of requests served from cache average_enrichment_time: number; // Average ms for enrichment stale_data_percentage: number; // % of data older than threshold }; } function generateQualityReport(enrichedData: NormalizedGeoData[]): DataQualityMetrics { const total = enrichedData.length; return { completeness: { country: calculateFieldCompleteness(enrichedData, 'country'), region: calculateFieldCompleteness(enrichedData, 'region'), city: calculateFieldCompleteness(enrichedData, 'city'), asn: calculateFieldCompleteness(enrichedData, 'asn'), coordinates: calculateCoordinateCompleteness(enrichedData) }, accuracy: { country_validation: validateCountryData(enrichedData), coordinate_validation: validateCoordinateData(enrichedData), asn_validation: validateASNData(enrichedData) }, consistency: { country_region_match: validateCountryRegionConsistency(enrichedData), coordinate_country_match: validateCoordinateCountryConsistency(enrichedData) }, timeliness: { cache_hit_rate: getCurrentCacheHitRate(), average_enrichment_time: getAverageEnrichmentTime(), stale_data_percentage: calculateStaleDataPercentage(enrichedData) } }; } // Helper function stubs for generateQualityReport function calculateFieldCompleteness(data: NormalizedGeoData[], field: keyof NormalizedGeoData): number { const validCount = data.filter(item => item[field] && item[field] !== 'Unknown').length; return data.length > 0 ? validCount / data.length : 0; } function calculateCoordinateCompleteness(data: NormalizedGeoData[]): number { const validCount = data.filter(item => item.coordinates && item.coordinates.lat !== null && item.coordinates.lng !== null && !(item.coordinates.lat === 0 && item.coordinates.lng === 0) // ocean placeholder ).length; return data.length > 0 ? validCount / data.length : 0; } function validateCountryData(data: NormalizedGeoData[]): number { const validCount = data.filter(item => item.country_code && item.country_code.length === 2 && /^[A-Z]{2}$/.test(item.country_code) ).length; return data.length > 0 ? validCount / data.length : 0; } function validateCoordinateData(data: NormalizedGeoData[]): number { const validCount = data.filter(item => item.coordinates && item.coordinates.lat !== null && item.coordinates.lng !== null && Math.abs(item.coordinates.lat) <= 90 && Math.abs(item.coordinates.lng) <= 180 ).length; return data.length > 0 ? validCount / data.length : 0; } function validateASNData(data: NormalizedGeoData[]): number { const validCount = data.filter(item => { if (!item.asn || item.asn === 'Unknown') return false; // Handle numeric ASNs if (typeof item.asn === 'number') { return item.asn > 0; } // Handle string ASNs (e.g., "AS15169") if (typeof item.asn === 'string') { return /^AS\d+$/i.test(item.asn); } return false; }).length; return data.length > 0 ? validCount / data.length : 0; } function validateCountryRegionConsistency(data: NormalizedGeoData[]): number { if (data.length === 0) return 1; // Define expected country-region mappings (examples) const countryRegions: Record<string, string[]> = { 'US': ['California', 'Texas', 'New York', 'Florida', 'Illinois', 'Pennsylvania'], 'GB': ['England', 'Scotland', 'Wales', 'Northern Ireland'], 'CA': ['Ontario', 'Quebec', 'British Columbia', 'Alberta'], 'AU': ['New South Wales', 'Victoria', 'Queensland', 'Western Australia'], 'DE': ['Bavaria', 'Berlin', 'Hamburg', 'North Rhine-Westphalia'] }; let consistentCount = 0; for (const item of data) { const expectedRegions = countryRegions[item.country_code]; if (!expectedRegions || expectedRegions.includes(item.region)) { consistentCount++; } } return consistentCount / data.length; } function validateCoordinateCountryConsistency(data: NormalizedGeoData[]): number { if (data.length === 0) return 1; // Simple bounding box validation for major countries const countryBounds: Record<string, { minLat: number; maxLat: number; minLng: number; maxLng: number }> = { 'US': { minLat: 24.5, maxLat: 49.4, minLng: -125, maxLng: -66.9 }, 'GB': { minLat: 49.9, maxLat: 60.9, minLng: -8.6, maxLng: 1.8 }, 'CA': { minLat: 41.7, maxLat: 83.1, minLng: -141, maxLng: -52.6 }, 'AU': { minLat: -43.6, maxLat: -10.7, minLng: 112.9, maxLng: 153.6 }, 'CN': { minLat: 18.2, maxLat: 53.6, minLng: 73.5, maxLng: 134.8 } }; let consistentCount = 0; for (const item of data) { const bounds = countryBounds[item.country_code]; if (!bounds) { consistentCount++; // Unknown country, assume valid } else { const { lat, lng } = item.coordinates; if (lat !== null && lng !== null && lat >= bounds.minLat && lat <= bounds.maxLat && lng >= bounds.minLng && lng <= bounds.maxLng) { consistentCount++; } } } return consistentCount / data.length; } // Note: In production, this would access the actual cache instance function getCurrentCacheHitRate(): number { // This would typically be tracked by the GeographicCache class // For documentation purposes, showing expected range return 0.85; // Typical cache hit rate for geographic data } // Note: In production, this would access actual performance metrics function getAverageEnrichmentTime(): number { // This would typically be tracked by performance monitoring // For documentation purposes, showing expected value return 45; // Average enrichment time in milliseconds } function calculateStaleDataPercentage(data: NormalizedGeoData[]): number { const staleThreshold = Date.now() - (24 * 60 * 60 * 1000); // 24 hours ago const staleCount = data.filter(item => { const lastUpdated = new Date(item.data_quality.last_updated).getTime(); return lastUpdated < staleThreshold; }).length; return data.length > 0 ? staleCount / data.length : 0; } ``` ### Quality Improvement Strategies ```typescript const qualityImprovementStrategies = { // Handle common data issues dataCleaningRules: { // Fix common country name variations countryAliases: { 'US': 'United States', 'UK': 'United Kingdom', 'UAE': 'United Arab Emirates' }, // Standardize region names regionStandardization: { 'CA': 'California', 'NY': 'New York', 'TX': 'Texas' }, // Validate and clean ASN data asnValidation: (asn: string) => { return /^AS\d+$/.test(asn) ? asn : 'Unknown'; } }, // Fallback data sources fallbackSources: [ 'maxmind_database', 'ip2location_api', 'ipinfo_api', 'local_geo_database' ], // Data validation rules validationRules: { coordinates: (lat: number, lng: number) => { return lat >= -90 && lat <= 90 && lng >= -180 && lng <= 180; }, countryRegionConsistency: (country: string, region: string) => { // TODO: Implement getValidRegionsForCountry with actual country-region mappings // For now, using the helper function defined below const validRegions = getValidRegionsForCountry(country); return validRegions.includes(region); } } }; // Helper function stub for country-region validation function getValidRegionsForCountry(country: string): string[] { // TODO: Implement with comprehensive country-region mappings // This is a simplified example with a few countries const regionMappings: Record<string, string[]> = { 'US': ['California', 'Texas', 'New York', 'Florida', 'Illinois'], 'GB': ['England', 'Scotland', 'Wales', 'Northern Ireland'], 'CA': ['Ontario', 'Quebec', 'British Columbia', 'Alberta'], 'AU': ['New South Wales', 'Victoria', 'Queensland', 'Western Australia'] }; return regionMappings[country] || []; } ``` ## Error Handling ### Common Geographic Data Errors ```typescript enum GeoDataErrorType { API_UNAVAILABLE = 'api_unavailable', INVALID_IP = 'invalid_ip', NO_GEO_DATA = 'no_geo_data', NORMALIZATION_FAILED = 'normalization_failed', CACHE_ERROR = 'cache_error', VALIDATION_FAILED = 'validation_failed' } interface GeoDataError { type: GeoDataErrorType; message: string; ip?: string; details?: any; fallback_applied: boolean; } ``` ### Error Recovery Patterns ```typescript class GeoDataErrorHandler { async handleEnrichmentError( error: GeoDataError, ip: string, context: any ): Promise<NormalizedGeoData> { switch (error.type) { case GeoDataErrorType.API_UNAVAILABLE: // Try fallback API or use cached data return await this.tryFallbackSources(ip) || this.createMinimalGeoData(ip); case GeoDataErrorType.INVALID_IP: // Handle private/local IPs return this.createPrivateIPGeoData(ip); case GeoDataErrorType.NO_GEO_DATA: // Create unknown geo data with metadata return this.createUnknownGeoData(ip, 'no_data_available'); case GeoDataErrorType.NORMALIZATION_FAILED: // Log error and use raw data with basic normalization logger.warn('Geographic normalization failed', { ip, error: error.message }); return this.applyBasicNormalization(ip, context.raw_data); default: return this.createFallbackGeoData(ip); } } private createMinimalGeoData(ip: string): NormalizedGeoData { return { country: 'Unknown', country_code: 'XX', continent: 'Unknown', region: 'Unknown', city: 'Unknown', asn: 'Unknown', asn_name: 'Unknown', coordinates: { lat: null, lng: null }, hosting_provider: null, is_cloud_provider: false, is_vpn: false, is_proxy: false, geographic_risk_score: 0.0, threat_intelligence: { high_risk_country: false, known_threat_source: false, malware_hosting: false }, data_quality: { completeness_score: 0.0, confidence_level: 'low', last_updated: new Date().toISOString(), source: 'fallback' } }; } // Stub implementations for recovery helpers private async tryFallbackSources(ip: string): Promise<NormalizedGeoData | null> { // TODO: Implement fallback API calls (e.g., MaxMind, IP2Location) return null; } private createPrivateIPGeoData(ip: string): NormalizedGeoData { // TODO: Handle private/local IP addresses (192.168.*, 10.*, 172.16-31.*, etc.) return this.createMinimalGeoData(ip); } private createUnknownGeoData(ip: string, reason: string): NormalizedGeoData { // TODO: Create unknown data with specific reason metadata const geoData = this.createMinimalGeoData(ip); (geoData as any)._error_reason = reason; return geoData; } private applyBasicNormalization(ip: string, rawData: any): NormalizedGeoData { // TODO: Apply minimal normalization to raw data return this.createMinimalGeoData(ip); } private createFallbackGeoData(ip: string): NormalizedGeoData { // TODO: Create ultimate fallback data return this.createMinimalGeoData(ip); } } ``` ## Best Practices ### Implementation Best Practices 1. **Graceful Degradation**: Always provide fallback values for missing data 2. **Performance Optimization**: Use intelligent caching to reduce API calls 3. **Data Validation**: Validate all geographic data before normalization 4. **Error Logging**: Log data quality issues for monitoring and improvement 5. **Regular Updates**: Refresh cached data periodically to maintain accuracy ### Query Optimization ```typescript // Geographic query optimization patterns const geographicQueryOptimization = { // Use specific countries instead of wildcards preferred: 'country:China OR country:Russia', avoid: 'country:*', // Combine geographic with other filters efficient: 'country:China AND severity:high AND timestamp:>NOW-1h', inefficient: 'country:China', // Use geographic filters appropriately good: { countries: ['China', 'Russia'], // Specific list min_risk_score: 0.7 // Risk-based filtering }, problematic: { countries: ['*'], // Wildcard usage min_risk_score: 0.0 // No filtering } }; ``` ### Data Quality Guidelines ```typescript const dataQualityGuidelines = { // Minimum acceptable data quality thresholds quality_thresholds: { completeness: 0.8, // 80% data completeness accuracy: 0.9, // 90% data accuracy cache_hit_rate: 0.7, // 70% cache hit rate enrichment_time: 500 // Max 500ms enrichment time }, // Monitoring and alerting monitoring: { quality_checks: 'every_hour', performance_checks: 'every_5_minutes', cache_health: 'every_minute', alert_thresholds: { quality_drop: 0.1, // Alert if quality drops 10% performance_degradation: 2.0, // Alert if enrichment time doubles cache_miss_spike: 0.2 // Alert if cache miss rate spikes 20% } } }; ``` ## Troubleshooting ### Common Issues and Solutions #### 1. High Cache Miss Rate ```typescript // Symptoms: Slow geographic enrichment, high API usage // Diagnosis: Check cache hit rate metrics const cacheHealthCheck = { hit_rate: 0.3, // Low hit rate (target: >0.7) miss_rate: 0.7, // High miss rate eviction_rate: 0.5 // High eviction rate }; // Solutions: // 1. Increase cache size // 2. Increase TTL for stable data // 3. Implement cache warming strategies // 4. Review query patterns for cacheable data ``` #### 2. Poor Data Quality ```typescript // Symptoms: Many "Unknown" values in geographic data // Diagnosis: Check data quality metrics const qualityIssues = { completeness: { country: 0.6, // Low country completeness (target: >0.8) coordinates: 0.3 // Very low coordinate data }, unknown_percentage: 0.4 // 40% unknown values }; // Solutions: // 1. Review IP address sources (private vs public) // 2. Check API data source reliability // 3. Implement additional fallback sources // 4. Improve normalization algorithms ``` #### 3. Performance Issues ```typescript // Symptoms: Slow response times, timeouts // Diagnosis: Check enrichment performance const performanceIssues = { average_enrichment_time: 2000, // 2 seconds (target: <500ms) timeout_rate: 0.1, // 10% timeout rate concurrent_requests: 50 // High concurrency }; // Solutions: // 1. Implement request batching // 2. Add circuit breaker for failing APIs // 3. Optimize normalization algorithms // 4. Scale geographic enrichment workers ``` ### Debugging Geographic Data Issues ```bash # Enable geographic data debugging DEBUG=firewalla:geo,firewalla:cache npm run mcp:start # Monitor data quality DEBUG=firewalla:quality npm run mcp:start # Track cache performance DEBUG=firewalla:cache:performance npm run mcp:start # Full geographic debugging DEBUG=firewalla:geo:* npm run mcp:start ``` ### Performance Monitoring Commands ```bash # Monitor cache hit rates npm run geo:cache:stats # Check data quality metrics npm run geo:quality:report # Analyze unknown data patterns npm run geo:analyze:unknowns # Performance benchmark npm run geo:benchmark ```

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/amittell/firewalla-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server