Skip to main content
Glama
MODEL_CACHING.mdβ€’15.8 kB
# Model Caching System Guide πŸš€ **Complete guide to the OpenRouter MCP Server's intelligent caching system** ## Overview The OpenRouter MCP Server features a sophisticated dual-layer caching system that provides lightning-fast model access while ensuring data freshness. This system combines in-memory caching for instant access with persistent file storage for cross-session reliability, enhanced with comprehensive metadata enrichment. ## Features ### 1. Dynamic Model Fetching - Models are fetched directly from the OpenRouter API - No hardcoded model list - always up-to-date - Supports all latest models (January 2025): - **OpenAI**: GPT-4o, GPT-4 Turbo, o1, o1-mini - **Anthropic**: Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku - **Google**: Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini Pro 1.5 - **Meta**: Llama 3.3 70B, Llama 3.2 Vision models - **Mistral**: Mistral Large, Mixtral, Devstral - **DeepSeek**: DeepSeek V3 (SOTA reasoning), DeepSeek Chat, DeepSeek Coder - **xAI**: Grok 2, Grok 2 Vision - And many more... ### 2. Advanced Caching Architecture #### 🏎️ Dual-Layer System ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Memory Cache │◄──── Cache Manager β”œβ”€β”€β”€β–Ίβ”‚ File Cache β”‚ β”‚ (Fast Access) β”‚ β”‚ (Intelligence) β”‚ β”‚ (Persistence) β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β€’ Sub-ms access β”‚ β”‚ β€’ TTL management β”‚ β”‚ β€’ Cross-session β”‚ β”‚ β€’ 1000+ models β”‚ β”‚ β€’ Auto refresh β”‚ β”‚ β€’ JSON storage β”‚ β”‚ β€’ Enhanced data β”‚ β”‚ β€’ Stats tracking β”‚ β”‚ β€’ Backup source β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` #### Key Features - **⚑ Memory Cache**: Sub-millisecond model access with enhanced metadata - **πŸ’Ύ File Cache**: Persistent storage across server restarts - **⏰ TTL Management**: Configurable expiration (1-24 hours) - **πŸ”„ Smart Refresh**: Automatic background updates - **πŸ“Š Statistics**: Real-time cache performance metrics - **πŸ›‘οΈ Fallback**: API fallback when cache fails ### 3. Rich Metadata Enhancement Every cached model includes comprehensive metadata automatically extracted during caching: #### 🏷️ Core Metadata - **Provider Classification**: OpenAI, Anthropic, Google, Meta, DeepSeek, XAI, etc. - **Category System**: Chat, Image, Audio, Reasoning, Code, Multimodal, Embedding - **Capability Matrix**: Vision, Functions, Tools, Streaming, JSON mode, PDF support - **Performance Tiers**: Premium, Standard, Economy (quality-based) - **Cost Analysis**: Free, Low, Medium, High (pricing-based) - **Quality Scoring**: 0-10 scale based on context, pricing, capabilities #### πŸ” Advanced Features - **Version Parsing**: Family identification (GPT-4, Claude-3) with release dates - **Latest Detection**: Identifies newest model versions automatically - **Context Analysis**: Flags for long-context models (>100K tokens) - **Tag System**: Searchable tags for flexible filtering - **Statistics**: Usage patterns and performance metrics ## Usage ### Cache Integration ```python from openrouter_mcp.models.cache import ModelCache # Initialize cache with custom settings cache = ModelCache( ttl_hours=2, # Cache for 2 hours max_memory_items=1000, # Memory limit cache_file="models.json" # Custom cache file ) # Get models (uses cache automatically) models = await cache.get_models() print(f"Loaded {len(models)} models from cache") # Force refresh from API models = await cache.get_models(force_refresh=True) # Check cache status if cache.is_expired(): print("Cache is expired, will refresh on next request") else: print(f"Cache is fresh, {cache.ttl_seconds}s TTL") ``` ### Advanced Cache Management ```python # Cache statistics and monitoring stats = cache.get_cache_stats() print(f""" πŸ“Š Cache Statistics: Total models: {stats['total_models']} Providers: {len(stats['providers'])} ({', '.join(stats['providers'][:3])}, ...) Vision models: {stats['vision_capable_count']} Reasoning models: {stats['reasoning_model_count']} Cache size: {stats['cache_size_mb']:.2f} MB Last updated: {stats['last_updated']} Expired: {stats['is_expired']} """) # Manual cache operations await cache.refresh_cache(force=True) # Force refresh # Performance monitoring import time start = time.time() models = await cache.get_models() load_time = (time.time() - start) * 1000 print(f"Loaded {len(models)} models in {load_time:.1f}ms") ``` ### Enhanced Model Data Access ```python # All models are automatically enhanced models = await cache.get_models() for model in models[:3]: # First 3 models print(f"πŸ€– {model['id']}") print(f" Provider: {model['provider']}") print(f" Category: {model['category']}") print(f" Quality: {model['quality_score']:.1f}/10") print(f" Performance: {model['performance_tier']}") print(f" Cost: {model['cost_tier']}") print(f" Vision: {model['capabilities']['supports_vision']}") print(f" Context: {model['capabilities']['max_tokens']:,} tokens") print(f" Tags: {', '.join(model['tags'][:5])}") print() # Get specific model metadata gpt4_meta = cache.get_model_metadata("openai/gpt-4o") if "error" not in gpt4_meta: print(f"GPT-4o Quality Score: {gpt4_meta['quality_score']}") ``` ### Advanced Filtering ```python # Filter by provider openai_models = cache.filter_models_by_metadata(provider="openai") print(f"OpenAI models: {len(openai_models)}") # Filter by category vision_models = cache.filter_models_by_metadata( category="multimodal", capabilities={"supports_vision": True} ) print(f"Vision models: {len(vision_models)}") # Complex filtering premium_coding = cache.filter_models_by_metadata( category="code", performance_tier="premium", cost_tier="medium", min_quality_score=7.0 ) print(f"Premium coding models: {len(premium_coding)}") # Capability-based filtering long_context = cache.filter_models_by_metadata( capabilities={"min_context_length": 100000} ) print(f"Long context models: {len(long_context)}") # Tag-based filtering latest_models = cache.filter_models_by_metadata(tags=["latest"]) print(f"Latest models: {len(latest_models)}") ``` ## Intelligent Categorization System ### 🏷️ Automatic Categories | Category | Detection Method | Examples | Use Cases | |----------|------------------|----------|-----------| | **chat** | Default text models | GPT-4, Claude Sonnet | General conversation | | **image** | `textβ†’image` modality | DALL-E 3, Stable Diffusion | Image generation | | **audio** | `audioβ†’text` patterns | Whisper, TTS models | Speech processing | | **embedding** | Pattern: `embed`, `vector` | text-embedding-3-large | Vector search | | **multimodal** | `text+imageβ†’text` | GPT-4V, Claude Vision | Image analysis | | **reasoning** | Pattern: `o1`, reasoning indicators | O1-Preview, O1-Mini | Complex problem solving | | **code** | Pattern: `code`, `codex`, `coder` | CodeLlama, DeepSeek-Coder | Programming assistance | ### 🎯 Performance Tiers ```python # Get models by performance tier tiers = cache.get_models_by_performance_tier() print(f"Premium models: {len(tiers['premium'])}") # Examples: GPT-4o, Claude Opus, Gemini Ultra print(f"Standard models: {len(tiers['standard'])}") # Examples: GPT-3.5, Claude Sonnet, Gemini Pro print(f"Economy models: {len(tiers['economy'])}") # Examples: Llama 2, Mistral 7B, free models ``` ### πŸ’° Cost Optimization ```python # Find cost-effective models budget_models = cache.filter_models_by_metadata( cost_tier="low", min_quality_score=6.0 ) free_models = cache.filter_models_by_metadata(cost_tier="free") print(f"Free models available: {len(free_models)}") # Best value models value_models = cache.filter_models_by_metadata( performance_tier="standard", cost_tier="medium" ) ``` ## Technical Architecture ### πŸ—‚οΈ Storage Layers #### Memory Cache ```python class ModelCache: def __init__(self): self._memory_cache: List[Dict[str, Any]] = [] # Enhanced models self._last_update: Optional[datetime] = None # Timestamp self.ttl_seconds = ttl_hours * 3600 # Expiration ``` #### File Cache Structure ```json { "models": [ { "id": "openai/gpt-4o", "name": "GPT-4o", "provider": "openai", "category": "chat", "capabilities": { "supports_vision": true, "supports_function_calling": true, "max_tokens": 128000, "max_output_tokens": 4096 }, "performance_tier": "premium", "quality_score": 9.5, "tags": ["openai", "chat", "premium", "vision", "latest"] } ], "updated_at": "2025-01-15T10:30:00" } ``` ### ⚑ Performance Metrics | Operation | Cold Start | Warm Cache | Notes | |-----------|------------|------------|---------| | **API Fetch** | 2-3 seconds | - | Initial load + enhancement | | **Memory Access** | 100ms | <1ms | List operations | | **Filtering** | 50ms | 2-5ms | 500 models, complex filters | | **Statistics** | 20ms | 5-10ms | Cache analysis | | **File Load** | 100-200ms | - | Startup only | ### πŸš€ Optimization Features 1. **Batch Enhancement**: All models processed together 2. **Lazy Statistics**: Computed on-demand 3. **Efficient Filtering**: In-memory operations 4. **Smart Persistence**: Only saves when changed 5. **Background Refresh**: Non-blocking updates ```python # Example: High-performance filtering import time # Benchmark filtering operations start = time.time() vision_premium = cache.filter_models_by_metadata( capabilities={"supports_vision": True}, performance_tier="premium", min_quality_score=8.0 ) filter_time = (time.time() - start) * 1000 print(f"Filtered {len(vision_premium)} models in {filter_time:.1f}ms") ``` ## Testing & Validation ### πŸ§ͺ Comprehensive Test Suite ```bash # Run all cache tests python -m pytest tests/test_models_cache.py tests/test_metadata.py -v # Performance testing python -m pytest tests/test_performance.py --benchmark-only # Integration testing python -m pytest tests/test_integration.py -v ``` ### Test Coverage Areas #### Core Functionality - βœ… Cache initialization and configuration - βœ… TTL expiration and auto-refresh - βœ… Memory and file cache synchronization - βœ… API fallback mechanisms - βœ… Error handling and recovery #### Metadata System - βœ… Provider detection accuracy (99%+) - βœ… Category classification precision - βœ… Capability extraction completeness - βœ… Quality scoring consistency - βœ… Version parsing edge cases #### Performance - βœ… Sub-millisecond memory access - βœ… Efficient filtering operations - βœ… Memory usage optimization - βœ… Concurrent access safety - βœ… Large dataset handling (1000+ models) ### πŸ“Š Performance Benchmarks ```python # Benchmark script example import asyncio import time from openrouter_mcp.models.cache import ModelCache async def benchmark_cache(): cache = ModelCache(ttl_hours=1) # Cold start benchmark start = time.time() models = await cache.get_models(force_refresh=True) cold_time = time.time() - start # Warm cache benchmark start = time.time() models = await cache.get_models() warm_time = time.time() - start # Filter benchmark start = time.time() filtered = cache.filter_models_by_metadata( provider="openai", performance_tier="premium" ) filter_time = time.time() - start print(f""" πŸ“Š Performance Benchmarks: Cold start: {cold_time:.2f}s Warm access: {warm_time*1000:.1f}ms Filtering: {filter_time*1000:.1f}ms Models cached: {len(models)} Filter results: {len(filtered)} """) # Run benchmark asyncio.run(benchmark_cache()) ``` ## Key Benefits ### πŸš€ Performance - **Sub-second Access**: Memory cache provides instant model data - **99%+ Cache Hit Rate**: Typical applications rarely hit the API - **Background Updates**: Non-blocking cache refresh - **Optimized Filtering**: Fast metadata-based queries ### πŸ”„ Reliability - **Dual Redundancy**: Memory + file cache layers - **API Fallback**: Graceful degradation when cache fails - **Smart Recovery**: Auto-reload from file cache on restart - **Error Resilience**: Continues with partial data if needed ### πŸ“Š Intelligence - **Auto-Enhancement**: Every model enriched with metadata - **Quality Scoring**: 0-10 scale for model comparison - **Smart Categorization**: Automatic provider/category detection - **Latest Detection**: Identifies newest model versions ### πŸ’° Cost Efficiency - **Reduced API Calls**: 95%+ reduction in API requests - **Intelligent Refresh**: Only updates when needed - **Cost Awareness**: Built-in cost tier classification - **Usage Optimization**: Track and minimize API usage ## Configuration & Environment ### πŸ”§ Environment Variables ```env # Cache Configuration CACHE_TTL_HOURS=1 # Cache lifetime (1-24 hours) CACHE_MAX_ITEMS=1000 # Memory cache limit CACHE_FILE=openrouter_model_cache.json # Cache file location # Performance Tuning CACHE_ENABLE_STATS=true # Enable statistics collection CACHE_AUTO_REFRESH=true # Background refresh enabled CACHE_FALLBACK_API=true # API fallback on cache failure ``` ### πŸ› οΈ Programmatic Configuration ```python # Custom cache setup cache = ModelCache( ttl_hours=6, # Cache for 6 hours max_memory_items=2000, # Higher memory limit cache_file="/custom/cache.json" # Custom location ) # Production settings production_cache = ModelCache( ttl_hours=24, # Daily refresh max_memory_items=5000, # Large memory cache cache_file="/data/models.json" # Persistent storage ) ``` ## Troubleshooting ### Common Issues **Cache not refreshing:** ```python # Force cache refresh models = await cache.get_models(force_refresh=True) # Check expiration if cache.is_expired(): print("Cache expired, will auto-refresh on next request") ``` **High memory usage:** ```python # Monitor cache size stats = cache.get_cache_stats() if stats['cache_size_mb'] > 100: print("Consider reducing max_memory_items") ``` **Performance issues:** ```python # Benchmark operations import time start = time.time() models = await cache.get_models() print(f"Cache access: {(time.time() - start)*1000:.1f}ms") # Check cache hit rate stats = cache.get_cache_stats() print(f"Last updated: {stats['last_updated']}") ``` ## Future Roadmap ### Planned Features - πŸ”„ **Redis Support**: Distributed caching for multi-instance deployments - πŸ“ˆ **Usage Analytics**: Model popularity and usage pattern tracking - πŸ€– **Smart Selection**: AI-powered model recommendation engine - βš–οΈ **Load Balancing**: Distribute requests across model variants - πŸ“Š **Performance Benchmarks**: Real-world model performance data - πŸ” **Advanced Search**: Natural language model discovery - πŸ’‘ **Cost Optimizer**: Automatic cost-performance optimization ### Integration Possibilities - **Prometheus Metrics**: Cache performance monitoring - **Grafana Dashboards**: Visual cache analytics - **Kubernetes**: Cloud-native deployment patterns - **Multi-region**: Geographic cache distribution --- **Last Updated**: 2025-01-12 **Version**: 1.0.0

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/physics91/openrouter-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server