MCP Server Proto-OKN

mcp-proto-okn
docs
benchmarks

1-securechainkg-ffmpeg-vulnerabilities-mcp-server-design-feedback.md•35.7 KiB

# MCP SERVER DESIGN FEEDBACK: SPARQL Query Generation Improvements ## Executive Summary The evaluation revealed that while the LLM-generated query was **semantically correct** (100% recall/precision), it was **~400x less efficient** than the ground-truth query due to: 1. Poor triple pattern ordering (type-first instead of literal-first) 2. Use of string functions that block index usage (LCASE) 3. Unnecessary property navigation for user-friendly output 4. Missing strategic query planning guidance This document provides actionable recommendations to improve the MCP server's query generation capabilities. --- ## SECTION 1: PROMPTING IMPROVEMENTS ### Current State Analysis The MCP server currently lacks: - Explicit guidance on query efficiency - Pattern ordering heuristics - Index-aware query construction - Performance vs. user-friendliness trade-off guidance ### Recommendation 1.1: Add Query Efficiency Guidelines to System Prompt **Add to System Prompt**: ```markdown ## SPARQL Query Performance Guidelines When constructing SPARQL queries, follow these optimization principles: ### 1. Pattern Ordering (CRITICAL) **Always put the most selective triple pattern FIRST.** Selectivity hierarchy (most to least selective): 1. Exact literal matches (e.g., schema:name "ffmpeg") - O(1) index lookup 2. URI equality (e.g., ?x = <http://example.org/resource>) - O(1) 3. Bounded property paths (e.g., ?x foaf:knows/foaf:name "Alice") 4. Type checks (e.g., ?x a schema:Software) - Often returns many results 5. Unbound properties (e.g., ?x ?p ?o) - Returns everything **Example (BAD - Type First)**: ```sparql ?software a sc:Software ; # Returns 803K results schema:name ?name . # Still 803K results FILTER(LCASE(?name) = "ffmpeg") # Finally filters to 1 ``` **Example (GOOD - Literal First)**: ```sparql ?software schema:name "ffmpeg" ; # Returns 1 result immediately a sc:Software . # Type check on 1 node ``` ### 2. Prefer Exact Matches Over String Functions **Avoid LCASE(), UCASE(), REGEX, CONTAINS** in filters when possible. String functions prevent index usage and force full scans. **Instead of**: ```sparql FILTER(LCASE(?name) = "searchterm") ``` **Use**: ```sparql VALUES ?name { "searchterm" "SearchTerm" "SEARCHTERM" } ?entity schema:name ?name . ``` Or prompt user: "Should I search case-sensitively for 'searchterm'?" ### 3. Minimize Property Traversals Only navigate to literal properties if the user explicitly needs human-readable output. **For APIs/programmatic use**: Return URIs **For human display**: Navigate to labels/identifiers ### 4. Use DISTINCT and ORDER BY Sparingly - DISTINCT: Only if duplicates are expected (investigate why first) - ORDER BY: Only for direct user display, not for piping to other processes - Both add overhead; remove if not explicitly requested ### 5. Type Constraints - Add type checks for data quality - Place AFTER selective literal matches, not before ``` --- ### Recommendation 1.2: Add Few-Shot Examples to System Prompt **Add Section**: ```markdown ## SPARQL Query Examples (Study These Patterns) ### Example 1: Finding Software with Vulnerabilities **User Query**: "Find all ffmpeg versions with CVEs" **Good Query** (selective-first, exact match): ```sparql PREFIX sc: <https://w3id.org/secure-chain/> PREFIX schema: <http://schema.org/> SELECT ?version ?cve WHERE { ?software schema:name "ffmpeg" ; # MOST SELECTIVE FIRST sc:hasSoftwareVersion ?version . ?version sc:vulnerableTo ?cve . } ``` **Why it's good**: - Exact literal match uses index - Most selective pattern first (1 result) - Returns URIs (efficient) - Simple 3-pattern chain **Bad Query** (type-first, string functions): ```sparql SELECT DISTINCT ?versionName ?cveId WHERE { ?software a sc:Software ; # Non-selective (803K results) schema:name ?name . FILTER(REGEX(?name, "ffmpeg", "i")) # Blocks indexes ?software sc:hasSoftwareVersion ?v . ?v sc:versionName ?versionName ; sc:vulnerableTo ?cve . ?cve schema:identifier ?cveId . } ``` **Why it's bad**: - Type pattern retrieves all software - REGEX forces full scan - Extra property navigation - Unnecessary DISTINCT ### Example 2: When to Use FILTER vs Exact Match **User Query**: "Find software named 'openssl'" **Scenario A: User explicitly wants case-insensitive** ```sparql # Use VALUES with known variants VALUES ?name { "openssl" "OpenSSL" "OPENSSL" } ?software schema:name ?name . ``` **Scenario B: Default to exact match** ```sparql ?software schema:name "openssl" . ``` **Then ask user**: "I found results for 'openssl'. Would you like me to also check for case variations like 'OpenSSL'?" ### Example 3: Returning URIs vs Literals **User Query**: "List Python packages with GPL license" **For programmatic use** (return URIs): ```sparql SELECT ?package ?license WHERE { ?package a sc:Software ; sc:ecosystem "PyPI" ; sc:hasSoftwareVersion ?version . ?version schema:license ?license . ?license a sc:License . FILTER(CONTAINS(STR(?license), "GPL")) } ``` **For human display** (return labels): ```sparql SELECT ?packageName ?licenseName WHERE { ?package a sc:Software ; schema:name ?packageName ; sc:ecosystem "PyPI" ; sc:hasSoftwareVersion ?version . ?version schema:license ?license . ?license schema:name ?licenseName . FILTER(CONTAINS(?licenseName, "GPL")) } ``` **Decision rule**: - If user says "list", "find", "identify" → Return URIs (efficient) - If user says "show me", "what are", "display" → Return labels (readable) - If output format is CSV/programmatic → Return URIs - If conversational → Return labels ``` --- ### Recommendation 1.3: Add Query Planning Decision Tree **Add to System Prompt**: ```markdown ## Query Planning Decision Tree Before writing a SPARQL query, ask these questions: ### Step 1: Identify the Anchor Pattern Q: What is the MOST SELECTIVE constraint in the user's query? - Exact URI? Use it first. - Exact literal (name, ID)? Use it first. - Type constraint only? WARNING: This may be slow. Look for other constraints. ### Step 2: Case Sensitivity Q: Did the user specify case-insensitive matching? - NO → Use exact match (default) - YES → Use VALUES with variants or ask user for confirmation before using FILTER ### Step 3: Output Format Q: What is the user's intent? - CSV export / programmatic → Return URIs (fast) - Human reading / display → Navigate to labels (slower but readable) - Unclear? → Return URIs and offer to reformat ### Step 4: Result Size Estimation Q: Is this query likely to return >1000 results? - Depends on: Number of type matches, number of software in ecosystem, etc. - If YES → Warn user and suggest adding LIMIT - If NO → Proceed ### Step 5: Complexity Check Q: Does this query need DISTINCT or ORDER BY? - DISTINCT: Only if same entity could be reached via multiple paths - ORDER BY: Only if user explicitly requests sorting - Default: REMOVE both unless necessary ### Example Decision Process: User: "Find all ffmpeg versions with CVEs" Step 1: Most selective = "ffmpeg" (exact name match) ✓ Step 2: Case-insensitive? Not specified → Use exact ✓ Step 3: Output format? "Find" suggests programmatic → Return URIs ✓ Step 4: Result size? ~17 vulnerabilities → Small, no LIMIT needed ✓ Step 5: DISTINCT needed? No multiple paths → Remove DISTINCT ✓ Result: Use exact match on name, return URIs, no DISTINCT/ORDER BY ``` --- ### Recommendation 1.4: Add Performance Warning Thresholds **Add to System Prompt**: ```markdown ## Query Performance Warnings Before executing a query, estimate its cost and warn the user if expensive: ### Cost Estimation Heuristics: 1. **Type-only starting pattern**: WARNING - Could retrieve 100K+ entities 2. **String functions (LCASE, REGEX)**: WARNING - Cannot use indexes 3. **Multiple unbound properties**: WARNING - Cartesian product risk 4. **No literals in first 2 patterns**: WARNING - Non-selective start ### Warning Template: "I've constructed a query, but it may be slow because [REASON]. This could take 10-60 seconds on a large knowledge graph. Options: 1. Run it anyway (may be slow) 2. Add more constraints (e.g., limit to specific ecosystem) 3. Add LIMIT to get first N results quickly Would you like me to proceed or modify the query?" ### Example Warning Triggers: **Query**: ```sparql ?software a sc:Software . # WARNING: Type-only pattern! FILTER(REGEX(?name, "regex")) # WARNING: String function! ``` **Warning**: "⚠️ This query will scan all 803K software packages because it starts with a type pattern and uses a regex filter. This could take 30-60 seconds. Can you provide: - Exact software name? (e.g., 'ffmpeg' instead of regex) - Specific ecosystem? (e.g., 'PyPI' to limit scope) Or I can add LIMIT 100 to get first 100 matches quickly." ``` --- ### Recommendation 1.5: Add Clarification Prompts **Add to System Prompt**: ```markdown ## When to Request Clarification ASK the user for clarification BEFORE executing expensive queries in these cases: ### Case 1: Ambiguous Case Sensitivity User: "Find OpenSSL vulnerabilities" Query will use: schema:name "OpenSSL" ASK: "Should I search case-sensitively for 'OpenSSL' or also check variants like 'openssl', 'OPENSSL'?" ### Case 2: Ambiguous Output Format User: "List all Python packages" Could return: URIs or package names ASK: "This could return 600K+ packages. Would you like: 1. Package names only (slower, human-readable) 2. Package URIs (faster, for further processing) 3. First 100 packages as a sample" ### Case 3: Potentially Expensive Query User: "Find all software with vulnerabilities" Estimated: 100K+ results, slow query ASK: "This query would scan all software in the KG (~800K packages). Would you like to narrow it down by: - Specific ecosystem (PyPI, npm, cargo)? - Specific software name? - Software versions from a certain time period?" ### Case 4: Ambiguous Property Names User: "Find ffmpeg version 4.2.1" Could mean: sc:versionName "4.2.1" OR URI contains "4.2.1" ASK: "Should I search by: 1. Exact version string '4.2.1' 2. Version URI pattern (e.g., contains '4.2.1') 3. Both" ### DO NOT ASK if: - User provides exact, specific constraints - Query is clearly efficient (exact matches, small result set) - User has already confirmed preferences in this conversation ``` --- ## SECTION 2: QUERY PATTERN TEMPLATES ### Recommendation 2.1: Provide Reusable Query Templates **Add to MCP Server Code**: ```python QUERY_TEMPLATES = { "find_software_by_name": """ PREFIX sc: <https://w3id.org/secure-chain/> PREFIX schema: <http://schema.org/> SELECT ?software WHERE {{ ?software schema:name "{name}" ; a sc:Software . }} """, "find_software_versions": """ PREFIX sc: <https://w3id.org/secure-chain/> PREFIX schema: <http://schema.org/> SELECT ?version WHERE {{ ?software schema:name "{name}" ; sc:hasSoftwareVersion ?version . }} """, "find_vulnerabilities_by_software": """ PREFIX sc: <https://w3id.org/secure-chain/> PREFIX schema: <http://schema.org/> SELECT ?version ?cve WHERE {{ ?software schema:name "{name}" ; sc:hasSoftwareVersion ?version . ?version sc:vulnerableTo ?cve . }} """, "find_software_by_ecosystem": """ PREFIX sc: <https://w3id.org/secure-chain/> SELECT ?software WHERE {{ ?software sc:ecosystem "{ecosystem}" ; a sc:Software . }} LIMIT {limit} """, # Template with case-insensitive variants "find_software_case_insensitive": """ PREFIX sc: <https://w3id.org/secure-chain/> PREFIX schema: <http://schema.org/> SELECT ?software WHERE {{ VALUES ?name {{ {name_variants} }} ?software schema:name ?name ; a sc:Software . }} """, } def generate_name_variants(name: str) -> str: """Generate common case variants for software names.""" variants = [ f'"{name}"', # original f'"{name.lower()}"', # lowercase f'"{name.upper()}"', # uppercase f'"{name.capitalize()}"', # capitalized ] # Remove duplicates return " ".join(dict.fromkeys(variants)) # Usage example: def find_software_vulnerabilities(name: str, case_sensitive: bool = True): if case_sensitive: return QUERY_TEMPLATES["find_vulnerabilities_by_software"].format(name=name) else: # Use case-insensitive template with VALUES variants = generate_name_variants(name) return f""" PREFIX sc: <https://w3id.org/secure-chain/> PREFIX schema: <http://schema.org/> SELECT ?version ?cve WHERE {{ VALUES ?name {{ {variants} }} ?software schema:name ?name ; sc:hasSoftwareVersion ?version . ?version sc:vulnerableTo ?cve . }} """ ``` --- ### Recommendation 2.2: Pattern Selection Logic **Add to MCP Server**: ```python def select_query_pattern(user_query: str, schema: dict) -> dict: """ Analyze user query and select optimal SPARQL pattern. Returns: { 'anchor_pattern': str, # Most selective pattern to start with 'use_case_insensitive': bool, 'return_uris': bool, # True = URIs, False = literals 'add_limit': bool, 'estimated_cost': str, # 'low', 'medium', 'high' } """ result = { 'anchor_pattern': None, 'use_case_insensitive': False, 'return_uris': True, # Default to URIs for efficiency 'add_limit': False, 'estimated_cost': 'low', } # Check for exact name match (highly selective) if 'named' in user_query.lower() or 'name' in user_query.lower(): # Extract quoted strings or specific terms # e.g., "Find software named 'ffmpeg'" -> anchor on name result['anchor_pattern'] = 'exact_name' result['estimated_cost'] = 'low' # Check for case-insensitive hints if any(word in user_query.lower() for word in ['case-insensitive', 'ignoring case', 'any case']): result['use_case_insensitive'] = True # Check for type-only queries (expensive) if 'all software' in user_query.lower() or 'all packages' in user_query.lower(): result['anchor_pattern'] = 'type_only' result['estimated_cost'] = 'high' result['add_limit'] = True # Always limit type-only queries # Check for output format hints if any(word in user_query.lower() for word in ['show', 'display', 'what are', 'list names']): result['return_uris'] = False # User wants readable output elif any(word in user_query.lower() for word in ['find', 'get', 'fetch', 'identify']): result['return_uris'] = True # Programmatic use return result # Example usage: user_query = "Find all ffmpeg versions with CVEs" config = select_query_pattern(user_query, schema) if config['anchor_pattern'] == 'exact_name': # Use efficient name-first pattern query = generate_efficient_query(name="ffmpeg", return_uris=config['return_uris']) elif config['estimated_cost'] == 'high': # Warn user before executing return warn_expensive_query(user_query) ``` --- ### Recommendation 2.3: Query Pattern Library **Document Common Patterns**: ```markdown ## Query Pattern Library ### Pattern 1: Find Entity by Exact Literal **Use when**: User provides specific name, ID, or exact value **Selectivity**: VERY HIGH (typically 1-10 results) **Template**: ```sparql ?entity schema:name "{exact_value}" ; a {Type} . ``` ### Pattern 2: Find Entity by Type + Property **Use when**: Type is selective AND combined with property filter **Selectivity**: MEDIUM (depends on type cardinality) **Template**: ```sparql ?entity a {Type} ; schema:property ?value . FILTER(?value > {threshold}) ``` ### Pattern 3: Find Related Entities (Traversal) **Use when**: Following relationships from known entity **Selectivity**: VARIES (depends on relationship fan-out) **Template**: ```sparql {anchor_entity} schema:relationship ?related . ?related a {RelatedType} . ``` ### Pattern 4: Avoid - Type-Only Start **DANGER**: Starting with type-only retrieves all entities of that type **Only use if**: No other constraints available AND user confirms **Template**: ```sparql ?entity a {Type} . # ⚠️ WARNING: Could return millions of results LIMIT 100 # ✓ REQUIRED: Always add LIMIT ``` ### Pattern Selection Decision Tree: ``` User query: "Find X" ├─ Has exact literal (name, ID)? │ ├─ YES → Pattern 1 (exact literal) ✓ BEST │ └─ NO → Continue ├─ Has type + selective property? │ ├─ YES → Pattern 2 (type + property) ✓ GOOD │ └─ NO → Continue ├─ Has known entity to traverse from? │ ├─ YES → Pattern 3 (traversal) ✓ OK │ └─ NO → Pattern 4 (type-only) ⚠️ ASK USER FIRST ``` ``` --- ## SECTION 3: AUTOMATIC POST-PROCESSING AND VALIDATION ### Recommendation 3.1: Query Validation Pipeline **Implement Pre-Execution Validation**: ```python class SPARQLQueryValidator: """Validates and optimizes SPARQL queries before execution.""" def __init__(self, schema: dict): self.schema = schema self.warnings = [] self.optimizations = [] def validate(self, query: str) -> dict: """ Validate query and return analysis. Returns: { 'is_safe': bool, 'estimated_cost': str, # 'low', 'medium', 'high', 'extreme' 'warnings': List[str], 'suggestions': List[str], 'optimized_query': str, } """ self.warnings = [] self.optimizations = [] # Parse query patterns = self._extract_triple_patterns(query) filters = self._extract_filters(query) # Validation checks self._check_pattern_order(patterns) self._check_string_functions(filters) self._check_cartesian_products(patterns) self._check_unnecessary_modifiers(query) self._estimate_cost(patterns) return { 'is_safe': len([w for w in self.warnings if 'CRITICAL' in w]) == 0, 'estimated_cost': self._estimate_cost(patterns), 'warnings': self.warnings, 'suggestions': self.optimizations, 'optimized_query': self._optimize_query(query), } def _check_pattern_order(self, patterns: List[dict]): """Check if most selective pattern is first.""" if not patterns: return first_pattern = patterns[0] # Check if first pattern is type-only (non-selective) if self._is_type_only(first_pattern): self.warnings.append( "⚠️ CRITICAL: Query starts with type-only pattern. " "This will retrieve all entities of that type." ) self.optimizations.append( "Move exact literal matches before type checks for better performance." ) # Check if literal matches appear after type checks for i, pattern in enumerate(patterns): if i > 0 and self._is_exact_literal(pattern): self.warnings.append( f"⚠️ Pattern {i+1} is highly selective but appears late. " "Consider reordering for better performance." ) def _check_string_functions(self, filters: List[str]): """Check for index-blocking string functions.""" blocking_functions = ['LCASE', 'UCASE', 'REGEX', 'CONTAINS', 'STRSTARTS'] for filter_expr in filters: for func in blocking_functions: if func in filter_expr.upper(): self.warnings.append( f"⚠️ PERFORMANCE: {func}() function blocks index usage. " f"Consider using exact matches or VALUES with variants." ) if func == 'LCASE': self.optimizations.append( "Replace LCASE() filter with VALUES containing known case variants." ) def _check_cartesian_products(self, patterns: List[dict]): """Detect potential cartesian products.""" # Check for multiple unconnected patterns connected_vars = set() for pattern in patterns: vars_in_pattern = self._extract_variables(pattern) if connected_vars and not (connected_vars & vars_in_pattern): self.warnings.append( "⚠️ RISK: Detected potentially unconnected patterns. " "This could create a cartesian product." ) connected_vars.update(vars_in_pattern) def _check_unnecessary_modifiers(self, query: str): """Check for unnecessary DISTINCT, ORDER BY.""" if 'DISTINCT' in query.upper(): self.warnings.append( "ℹ️ INFO: DISTINCT adds overhead. Remove if duplicates are unlikely." ) self.optimizations.append( "Test query without DISTINCT first to see if it's needed." ) if 'ORDER BY' in query.upper(): self.warnings.append( "ℹ️ INFO: ORDER BY adds sorting overhead. " "Only needed for direct user display." ) def _estimate_cost(self, patterns: List[dict]) -> str: """Estimate query execution cost.""" if not patterns: return 'unknown' first_pattern = patterns[0] # Type-only start = expensive if self._is_type_only(first_pattern): return 'high' # Exact literal start = cheap if self._is_exact_literal(first_pattern): return 'low' return 'medium' def _optimize_query(self, query: str) -> str: """Automatically optimize query structure.""" # This would implement automatic query rewriting # For now, return original query return query # Helper methods def _is_type_only(self, pattern: dict) -> bool: """Check if pattern is type-only with no literals.""" # Implementation would parse pattern structure pass def _is_exact_literal(self, pattern: dict) -> bool: """Check if pattern contains exact literal match.""" # Implementation would check for quoted strings pass def _extract_variables(self, pattern: dict) -> set: """Extract variables from pattern.""" # Implementation would parse variables pass # Usage: validator = SPARQLQueryValidator(schema) result = validator.validate(llm_generated_query) if result['estimated_cost'] == 'high': print("⚠️ WARNING: Expensive query detected!") print("Warnings:") for warning in result['warnings']: print(f" - {warning}") print("\nSuggestions:") for suggestion in result['optimizations']: print(f" - {suggestion}") # Ask user for confirmation proceed = ask_user("Proceed with query? (yes/no)") if not proceed: # Suggest optimized version print("\nOptimized query:") print(result['optimized_query']) ``` --- ### Recommendation 3.2: Automatic Query Rewriting **Implement Query Optimizer**: ```python class SPARQLQueryOptimizer: """Automatically rewrites queries for better performance.""" def optimize(self, query: str, schema: dict) -> str: """ Rewrite query with optimizations. Optimizations applied: 1. Reorder patterns (most selective first) 2. Replace LCASE/REGEX with VALUES where possible 3. Remove unnecessary DISTINCT/ORDER BY 4. Add type constraints after literals """ # Parse query into components parsed = self._parse_query(query) # Apply optimizations parsed = self._reorder_patterns(parsed, schema) parsed = self._replace_string_functions(parsed) parsed = self._remove_unnecessary_modifiers(parsed) # Reconstruct query return self._build_query(parsed) def _reorder_patterns(self, parsed: dict, schema: dict) -> dict: """Reorder triple patterns for optimal selectivity.""" patterns = parsed['patterns'] # Sort patterns by selectivity (estimate based on pattern structure) def selectivity_score(pattern): """Lower score = more selective = should come first.""" # Exact literal match = most selective if self._has_exact_literal(pattern): return 1 # URI equality if self._has_uri_equality(pattern): return 2 # Property with filter if self._has_filter(pattern): return 3 # Type check if self._is_type_pattern(pattern): return 4 # Unbound property return 5 patterns.sort(key=selectivity_score) parsed['patterns'] = patterns return parsed def _replace_string_functions(self, parsed: dict) -> dict: """Replace LCASE/REGEX with VALUES where possible.""" filters = parsed.get('filters', []) new_filters = [] new_patterns = [] for filter_expr in filters: # Check for LCASE pattern if 'LCASE(?name)' in filter_expr: # Extract literal value match = re.search(r'LCASE\(\?(\w+)\)\s*=\s*"([^"]+)"', filter_expr) if match: var_name = match.group(1) value = match.group(2) # Generate case variants variants = self._generate_variants(value) # Add VALUES pattern instead new_patterns.append(f"VALUES ?{var_name} {{ {variants} }}") continue # Skip adding this filter new_filters.append(filter_expr) parsed['filters'] = new_filters parsed['patterns'].extend(new_patterns) return parsed def _generate_variants(self, value: str) -> str: """Generate common case variants.""" variants = [ f'"{value}"', f'"{value.lower()}"', f'"{value.upper()}"', f'"{value.capitalize()}"', ] return " ".join(dict.fromkeys(variants)) # Remove duplicates def _remove_unnecessary_modifiers(self, parsed: dict) -> dict: """Remove DISTINCT/ORDER BY if not needed.""" # Strategy: Keep modifiers but add comment explaining overhead # Or make configurable via parameter if parsed.get('distinct') and not self._needs_distinct(parsed): parsed['distinct'] = False parsed['comments'].append("# DISTINCT removed - no duplicate paths detected") return parsed # Usage: optimizer = SPARQLQueryOptimizer() optimized_query = optimizer.optimize(llm_generated_query, schema) print("Original query:") print(llm_generated_query) print("\nOptimized query:") print(optimized_query) ``` --- ### Recommendation 3.3: Query Execution Monitoring **Add Execution Telemetry**: ```python class QueryExecutionMonitor: """Monitor query execution and provide feedback.""" def __init__(self): self.execution_log = [] def execute_with_monitoring(self, query: str, endpoint: str) -> dict: """ Execute query with timing and result analysis. Returns: { 'results': List[dict], 'execution_time_ms': float, 'result_count': int, 'warnings': List[str], 'suggestions': List[str], } """ start_time = time.time() # Execute query results = self._execute_sparql(query, endpoint) execution_time = (time.time() - start_time) * 1000 result_count = len(results) # Analyze results warnings = [] suggestions = [] # Check execution time if execution_time > 1000: # > 1 second warnings.append( f"⚠️ Slow query: {execution_time:.0f}ms execution time. " "Consider adding more selective constraints." ) # Check result count if result_count > 10000: warnings.append( f"⚠️ Large result set: {result_count} rows. " "Consider adding LIMIT or more filters." ) elif result_count == 0: suggestions.append( "No results found. Try:\n" " - Removing type constraints\n" " - Using case-insensitive matching\n" " - Checking for typos in literal values" ) # Log execution self.execution_log.append({ 'query': query, 'execution_time_ms': execution_time, 'result_count': result_count, 'timestamp': datetime.now(), }) return { 'results': results, 'execution_time_ms': execution_time, 'result_count': result_count, 'warnings': warnings, 'suggestions': suggestions, } def get_performance_summary(self) -> str: """Generate performance summary from execution log.""" if not self.execution_log: return "No queries executed yet." avg_time = sum(e['execution_time_ms'] for e in self.execution_log) / len(self.execution_log) slow_queries = [e for e in self.execution_log if e['execution_time_ms'] > 1000] summary = f""" Query Performance Summary: - Total queries: {len(self.execution_log)} - Average execution time: {avg_time:.0f}ms - Slow queries (>1s): {len(slow_queries)} Recommendations: """ if slow_queries: summary += "- Review slow queries for optimization opportunities\n" summary += "- Consider adding indexes on frequently queried properties\n" return summary # Usage: monitor = QueryExecutionMonitor() result = monitor.execute_with_monitoring(query, endpoint) print(f"Execution time: {result['execution_time_ms']:.0f}ms") print(f"Results: {result['result_count']} rows") if result['warnings']: print("\nWarnings:") for warning in result['warnings']: print(f" {warning}") if result['suggestions']: print("\nSuggestions:") for suggestion in result['suggestions']: print(f" {suggestion}") ``` --- ### Recommendation 3.4: Automatic Fallback Strategies **Implement Query Fallback Logic**: ```python class QueryFallbackHandler: """Handle query failures with automatic fallback strategies.""" def execute_with_fallback(self, query: str, endpoint: str) -> dict: """ Execute query with automatic fallback on failure/timeout. Fallback strategies: 1. Add LIMIT if query times out 2. Remove ORDER BY if query is slow 3. Remove DISTINCT if query is slow 4. Simplify filters if no results """ try: # Try original query first result = self._execute_with_timeout(query, endpoint, timeout=10) return {'results': result, 'strategy': 'original'} except TimeoutError: # Fallback 1: Add LIMIT limited_query = self._add_limit(query, 1000) result = self._execute_with_timeout(limited_query, endpoint, timeout=10) return { 'results': result, 'strategy': 'limited', 'message': 'Query timed out. Returning first 1000 results.' } except NoResultsError: # Fallback 2: Try case-insensitive version case_insensitive = self._make_case_insensitive(query) result = self._execute_with_timeout(case_insensitive, endpoint, timeout=10) if result: return { 'results': result, 'strategy': 'case_insensitive', 'message': 'No exact matches. Showing case-insensitive results.' } # Fallback 3: Remove type constraints relaxed = self._remove_type_constraints(query) result = self._execute_with_timeout(relaxed, endpoint, timeout=10) return { 'results': result, 'strategy': 'relaxed', 'message': 'No results with strict constraints. Showing relaxed matches.' } # Usage: handler = QueryFallbackHandler() result = handler.execute_with_fallback(query, endpoint) print(f"Strategy used: {result['strategy']}") if 'message' in result: print(f"Note: {result['message']}") print(f"Results: {len(result['results'])} rows") ``` --- ## IMPLEMENTATION ROADMAP ### Phase 1: Quick Wins (1-2 weeks) 1. ✓ Add query efficiency guidelines to system prompt 2. ✓ Add few-shot examples with good/bad patterns 3. ✓ Implement basic query validator (pattern order check) 4. ✓ Add performance warnings before execution ### Phase 2: Core Improvements (2-4 weeks) 1. ✓ Implement query pattern templates 2. ✓ Add automatic query rewriting (LCASE → VALUES) 3. ✓ Implement pattern reordering logic 4. ✓ Add clarification prompts for expensive queries ### Phase 3: Advanced Features (4-8 weeks) 1. ✓ Implement execution monitoring with telemetry 2. ✓ Add fallback strategies for failed queries 3. ✓ Build query optimization engine 4. ✓ Create query pattern library with examples ### Phase 4: Learning & Adaptation (Ongoing) 1. ✓ Collect query performance metrics 2. ✓ Build corpus of optimal query patterns 3. ✓ Fine-tune LLM with successful queries 4. ✓ Add user feedback loop for query quality --- ## METRICS FOR SUCCESS Track these metrics to measure improvement: 1. **Query Efficiency** - Average execution time - Percentage of queries using exact matches - Percentage of queries with optimal pattern order 2. **User Experience** - Query success rate (returns expected results) - Number of clarification requests needed - User satisfaction ratings 3. **System Performance** - Number of expensive queries avoided - Reduction in query execution time - Cache hit rate ### Target Improvements: - Reduce average query execution time by 80% - Increase exact match usage from 20% to 90% - Reduce queries requiring >1s execution from 30% to <5% - Increase user satisfaction from 70% to 95% --- ## CONCLUSION The evaluation revealed that the LLM-generated query, while semantically correct, was ~400x less efficient than optimal. The recommendations above address three key areas: 1. **Prompting**: Educate the LLM on query efficiency principles 2. **Templates**: Provide proven patterns for common queries 3. **Validation**: Automatically detect and fix inefficient patterns Implementing these improvements will result in: - ✓ Faster query execution (80% reduction in avg time) - ✓ Better index utilization (exact matches vs string functions) - ✓ Reduced server load (fewer full table scans) - ✓ Improved user experience (faster responses, fewer timeouts) The most critical improvement is teaching the LLM to **start with the most selective pattern** and **use exact matches over string functions**. These two changes alone account for the majority of the 400x performance gap.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/sbl-sdsc/mcp-proto-okn'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

1-securechainkg-ffmpeg-vulnerabilities-mcp-server-design-feedback.md•35.7 KiB