en es ja ko zh

LLM Gateway MCP Server

by Dicklesworthstone

Python

MIT License

108

Overview InspectNew Endpoints Schema Related Servers Reviews Score

Need Help?View Source Code Report Issue

llm_gateway_mcp_server
examples

docstring_refiner_demo.py•130 kB

#!/usr/bin/env python """ Advanced Docstring Refiner Demo for Ultimate MCP Server. This script demonstrates the autonomous documentation refinement tool that analyzes, tests, and improves documentation (descriptions, schemas, examples) for MCP tools, enhancing their usability with LLM agents. The demo showcases multiple refinement approaches, and visualization techniques, while providing comprehensive performance metrics and cost analysis. Features: - Single and multi-tool refinement demonstrations - Custom test generation strategy configuration - Provider fallbacks and model selection optimization - Visual diffs of documentation improvements - Cost estimation and optimization techniques - Schema-focused refinement capabilities - Model comparison and performance analysis - Practical testing with intentionally flawed tools - Adaptive refinement based on tool complexity Command-line Arguments: --demo {all,single,multi,custom-testing,optimize,all-tools,schema-focus,practical,model-comparison}: Specific demo to run (default: all) --tool TOOL: Specify a specific tool to refine (bypasses automatic selection) --iterations N: Number of refinement iterations to run --model MODEL: Specify a model to use for refinement (e.g., gpt-4.1-mini, claude-3-5-haiku) --provider PROVIDER: Specify a provider to use for refinement (e.g., openai, anthropic) --visualize {minimal,standard,full}: Control visualization detail level (default: standard) --cost-limit FLOAT: Maximum cost limit in USD (default: 5.0) --output-dir DIR: Directory to save results --save-results: Save refinement results to files --verbose, -v: Increase output verbosity --create-flawed: Create flawed example tools for practical testing Demo Modes: single: Demonstrates refining a single tool with detailed progress tracking and visualization of description, schema, and example improvements. multi: Demonstrates refining multiple tools simultaneously, showcasing parallel processing and cross-tool analysis of documentation patterns. custom-testing: Demonstrates advanced test generation strategies with fine-grained control over the types and quantities of test cases. optimize: Showcases cost optimization techniques for large-scale refinement, comparing standard and cost-optimized approaches. all-tools: Demonstrates the capability to refine all available tools in a single run, with resource management and prioritization features. schema-focus: Focuses specifically on schema improvements, with detailed visualization of JSON schema patches and validation improvements. practical: Creates and refines intentionally flawed example tools to demonstrate the system's ability to identify and fix common documentation issues. model-comparison: Compares the performance of different LLM models for refinement tasks, with detailed metrics on success rates, cost, and processing time. Dependencies: - ultimate: Core framework for interfacing with LLMs and tools - rich: For beautiful console output and visualizations - asyncio: For asynchronous processing of refinement operations - Required API keys for providers (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.) Usage Examples: # Run all demos with standard visualization python docstring_refiner_demo.py # Run just the single tool refinement demo with a specific tool python docstring_refiner_demo.py --demo single --tool generate_completion # Run the model comparison demo with full visualization and save results python docstring_refiner_demo.py --demo model-comparison --visualize full --save-results # Run the multi-tool demo with a specific model and cost limit python docstring_refiner_demo.py --demo multi --model gpt-4.1-mini --cost-limit 2.5 # Create and test flawed example tools python docstring_refiner_demo.py --demo practical --create-flawed Return Values: The script returns exit code 0 on successful completion, or exit code 1 if critical errors occur during execution. Methods: The script contains various helper functions and demo methods: setup_gateway_and_tools(): Initializes the Gateway and ensures required tools are available get_suitable_tools(): Finds appropriate tools for demonstrations based on complexity display_refinement_progress(): Callback for tracking refinement progress events create_text_diff(), create_side_by_side_diff(): Generate visual diffs of documentation changes display_refinement_result(): Formats and displays refinement results with appropriate detail level create_flawed_example_tools(): Creates example tools with intentional documentation flaws Demo functions (demo_*): Implement specific demonstration scenarios Implementation Notes: - The script uses the global MCP instance from the Gateway for all tool operations - Refinement operations are tracked through a CostTracker instance for budget management - All demonstrations include graceful fallbacks for providers and models - Progress updates are displayed using Rich's Progress components - Results can be saved to files for later analysis or integration Author: Ultimate MCP Server Team Version: 1.0.0 """ import argparse import asyncio import datetime import difflib import json import random import sys import tempfile import time from pathlib import Path from typing import Dict, List, Optional # Add project root to path for imports when running as script sys.path.insert(0, str(Path(__file__).parent.parent)) # Rich for beautiful console output from rich import box from rich.console import Console, Group from rich.markup import escape from rich.panel import Panel from rich.progress import ( BarColumn, Progress, SpinnerColumn, TaskProgressColumn, TextColumn, TimeElapsedColumn, TimeRemainingColumn, ) from rich.rule import Rule from rich.syntax import Syntax from rich.table import Table from rich.tree import Tree # Project imports from ultimate_mcp_server.constants import Provider from ultimate_mcp_server.core.server import Gateway from ultimate_mcp_server.tools.base import with_error_handling from ultimate_mcp_server.tools.docstring_refiner import ( RefinementProgressEvent, ) from ultimate_mcp_server.utils import get_logger from ultimate_mcp_server.utils.display import CostTracker from ultimate_mcp_server.utils.logging.console import console # Initialize logger logger = get_logger("example.docstring_refiner") # Create a separate console for detailed output detail_console = Console(highlight=False) # Global MCP instance (will be populated from Gateway) mcp = None # Global settings that can be modified by command line args SETTINGS = { "output_dir": None, "visualization_level": "standard", # "minimal", "standard", "full" "cost_limit": 5.0, # USD "preferred_providers": [Provider.OPENAI.value, Provider.ANTHROPIC.value, Provider.GEMINI.value], "fallback_providers": [Provider.DEEPSEEK.value, Provider.GROK.value], "save_results": False, "verbose": False, } def parse_arguments(): """Parse command line arguments for the demo.""" parser = argparse.ArgumentParser( description="Advanced Docstring Refiner Demo for Ultimate MCP Server", formatter_class=argparse.RawDescriptionHelpFormatter, epilog="""Available demos: all - Run all demos (default) single - Single tool refinement multi - Multi-tool refinement custom-testing - Custom test generation strategies optimize - Cost optimization techniques all-tools - Refine all available tools schema-focus - Focus on schema improvements practical - Practical testing with flawed tools model-comparison - Compare different LLM models for refinement """ ) # Demo selection parser.add_argument( "--demo", default="all", choices=[ "all", "single", "multi", "custom-testing", "optimize", "all-tools", "schema-focus", "practical", "model-comparison" ], help="Specific demo to run (default: all)" ) # Tool selection parser.add_argument( "--tool", help="Specify a specific tool to refine (bypasses automatic selection)" ) # Iteration control parser.add_argument( "--iterations", type=int, default=None, help="Number of refinement iterations to run" ) # Model specification parser.add_argument( "--model", default=None, help="Specify a model to use for refinement (e.g., gpt-4.1-mini, claude-3-5-haiku)" ) # Provider specification parser.add_argument( "--provider", default=None, help=f"Specify a provider to use for refinement (e.g., {Provider.OPENAI.value}, {Provider.ANTHROPIC.value})" ) # Visualization options parser.add_argument( "--visualize", choices=["minimal", "standard", "full"], default="standard", help="Control visualization detail level" ) # Cost limit parser.add_argument( "--cost-limit", type=float, default=5.0, help="Maximum cost limit in USD" ) # Output directory parser.add_argument( "--output-dir", help="Directory to save results" ) # Save results parser.add_argument( "--save-results", action="store_true", help="Save refinement results to files" ) # Verbosity parser.add_argument( "-v", "--verbose", action="store_true", help="Increase output verbosity" ) # Create flawed tools for testing parser.add_argument( "--create-flawed", action="store_true", help="Create flawed example tools for practical testing" ) args = parser.parse_args() # Update settings SETTINGS["visualization_level"] = args.visualize SETTINGS["cost_limit"] = args.cost_limit SETTINGS["save_results"] = args.save_results SETTINGS["verbose"] = args.verbose if args.output_dir: output_dir = Path(args.output_dir) output_dir.mkdir(parents=True, exist_ok=True) SETTINGS["output_dir"] = output_dir return args async def setup_gateway_and_tools(create_flawed_tools=False): """Set up the gateway and ensure docstring refiner tool is available.""" global mcp logger.debug("Initializing Gateway for docstring refiner demo...") logger.info("Initializing Gateway for docstring refiner demo...", emoji_key="start") # Create Gateway instance with all tools logger.debug("Creating Gateway instance with all tools") gateway = Gateway("docstring-refiner-demo", register_tools=True) # Register all tools, not just minimal tools # Initialize providers (needed for the tool to function) try: logger.debug("Initializing providers...") await gateway._initialize_providers() logger.success("Successfully initialized providers", emoji_key="success") logger.debug("Successfully initialized providers") except Exception as e: logger.error(f"Error initializing providers: {e}", emoji_key="error", exc_info=True) logger.exception("Error initializing providers") console.print(Panel( f"Error initializing providers: {escape(str(e))}\n\n" "Check that your API keys are set correctly in environment variables:\n" "- OPENAI_API_KEY\n" "- ANTHROPIC_API_KEY\n" "- GEMINI_API_KEY\n", title="❌ Provider Initialization Failed", border_style="red", expand=False )) # Continue anyway, as some providers might still work # Store the MCP server instance mcp = gateway.mcp logger.debug("Stored MCP server instance") # Display available providers with available models logger.debug("Getting provider information") provider_tree = Tree("[bold cyan]Available Providers & Models[/bold cyan]") provider_info = [] for provider_name, provider in gateway.providers.items(): if provider: try: models = await provider.list_models() provider_branch = provider_tree.add(f"[yellow]{provider_name}[/yellow]") # Group models by category/capability categorized_models = {} for model in models: model_id = model.get("id", "unknown") if "4" in model_id: category = "GPT-4 Family" elif "3" in model_id: category = "GPT-3 Family" elif "claude" in model_id.lower(): category = "Claude Family" elif "gemini" in model_id.lower(): category = "Gemini Family" elif "deepseek" in model_id.lower(): category = "DeepSeek Family" else: category = "Other Models" if category not in categorized_models: categorized_models[category] = [] categorized_models[category].append(model_id) # Add models to the tree by category for category, model_list in categorized_models.items(): category_branch = provider_branch.add(f"[cyan]{category}[/cyan]") for model_id in sorted(model_list): category_branch.add(f"[green]{model_id}[/green]") # Get default model for provider info default_model = provider.get_default_model() provider_info.append(f"{provider_name} (default: {default_model})") except Exception as e: logger.warning(f"Could not get models for {provider_name}: {e}", emoji_key="warning") logger.warning(f"Could not get models for {provider_name}: {e}") provider_info.append(f"{provider_name} (models unavailable)") provider_branch = provider_tree.add(f"[yellow]{provider_name}[/yellow]") provider_branch.add(f"[red]Error listing models: {escape(str(e))}[/red]") # Display provider info based on visualization level if SETTINGS["visualization_level"] == "full": console.print(Panel(provider_tree, border_style="dim cyan", padding=(1, 2))) else: console.print(Panel( f"Available providers: {', '.join(provider_info)}", title="Provider Configuration", border_style="cyan", expand=False )) # Verify the docstring_refiner tool is available logger.debug("Checking for available tools") tool_list = await mcp.list_tools() available_tools = [t.name for t in tool_list] logger.debug(f"Available tools before registration: {available_tools}") # Display all available tools tool_tree = Tree("[bold cyan]Available MCP Tools[/bold cyan]") # Group tools by namespace for better visualization tool_namespaces = {} for tool_name in available_tools: if ":" in tool_name: namespace, name = tool_name.split(":", 1) if namespace not in tool_namespaces: tool_namespaces[namespace] = [] tool_namespaces[namespace].append(name) else: if "root" not in tool_namespaces: tool_namespaces["root"] = [] tool_namespaces["root"].append(tool_name) # Add tools to tree with proper grouping for namespace, tools in tool_namespaces.items(): if namespace == "root": for tool in sorted(tools): tool_tree.add(f"[green]{tool}[/green]") else: ns_branch = tool_tree.add(f"[yellow]{namespace}[/yellow]") for tool in sorted(tools): ns_branch.add(f"[green]{tool}[/green]") # Display tool info based on visualization level if SETTINGS["visualization_level"] in ["standard", "full"]: console.print(Panel(tool_tree, border_style="dim cyan", padding=(1, 2))) else: console.print(f"[cyan]Tools available:[/cyan] {len(available_tools)}") # Check if refine_tool_documentation is available if "refine_tool_documentation" in available_tools: logger.success("refine_tool_documentation tool available.", emoji_key="success") else: logger.warning("refine_tool_documentation tool not found in available tools list.", emoji_key="warning") console.print(Panel( "The refine_tool_documentation tool is not registered automatically.\n" "This demo will attempt to register it manually as a fallback.", title="⚠️ Tool Availability Notice", border_style="yellow" )) # Manually register the refine_tool_documentation tool as a fallback # Note: This should no longer be necessary since the tool is now included in STANDALONE_TOOL_FUNCTIONS # in ultimate/tools/__init__.py, but we keep it as a fallback in case of issues try: print("Attempting to manually register refine_tool_documentation tool as fallback...") from ultimate_mcp_server.tools.docstring_refiner import refine_tool_documentation print("Imported refine_tool_documentation successfully") # Create a simplified wrapper to avoid Pydantic validation issues @with_error_handling async def docstring_refiner_wrapper( tool_names=None, refine_all_available=False, max_iterations=1, ctx=None ): """ Refine the documentation of MCP tools. Args: tool_names: List of tools to refine, or None to use refine_all_available refine_all_available: Whether to refine all available tools max_iterations: Maximum number of refinement iterations ctx: MCP context Returns: Refinement results """ print(f"Wrapper called with tool_names={tool_names}, refine_all_available={refine_all_available}") # Simply pass through to the actual implementation return await refine_tool_documentation( tool_names=tool_names, refine_all_available=refine_all_available, max_iterations=max_iterations, ctx=ctx ) # Register our simplified wrapper instead mcp.tool(name="refine_tool_documentation")(docstring_refiner_wrapper) print("Registered fallback wrapper tool successfully") logger.success("Successfully registered fallback wrapper for refine_tool_documentation tool", emoji_key="success") except Exception as e: logger.error(f"Failed to register fallback refine_tool_documentation tool: {e}", emoji_key="error", exc_info=True) print(f"Error registering fallback tool: {type(e).__name__}: {str(e)}") import traceback print("Stack trace:") traceback.print_exc() console.print(Panel( f"Error registering the fallback refine_tool_documentation tool: {escape(str(e))}\n\n" "This demo requires the docstring_refiner tool to be properly registered.", title="❌ Registration Failed", border_style="red", expand=False )) console.print(Panel( "This demo requires the docstring_refiner tool to be properly registered.\n" "Check that you have the correct version of the Ultimate MCP Server and dependencies installed.", title="⚠️ Demo Requirements Not Met", border_style="red", expand=False )) return gateway # Create flawed example tools if requested if create_flawed_tools: created_tools = await create_flawed_example_tools(mcp) if created_tools: console.print(Panel( f"Created {len(created_tools)} flawed example tools for testing:\n" + "\n".join([f"- [cyan]{name}[/cyan]" for name in created_tools]), title="🛠️ Flawed Tools Created", border_style="yellow", expand=False )) return gateway async def create_flawed_example_tools(mcp_instance): """Create flawed example tools for demonstration purposes.""" created_tools = [] try: # Create a temporary directory to store any needed files temp_dir = tempfile.mkdtemp(prefix="docstring_refiner_flawed_tools_") logger.info(f"Created temporary directory for flawed tools: {temp_dir}", emoji_key="setup") # Define several flawed tools with various issues # Tool 1: Ambiguous Description @mcp_instance.tool() async def flawed_process_text(text: str, mode: str = "simple", include_metadata: bool = False): """Process the given text. This tool does processing on text. Args: text: Text to process mode: Processing mode (simple, advanced, expert) include_metadata: Whether to include metadata in result """ # Actual implementation doesn't matter for the demo result = {"processed": text[::-1]} # Just reverse the text if include_metadata: result["metadata"] = {"length": len(text), "mode": mode} return result created_tools.append("flawed_process_text") # Tool 2: Missing Parameter Descriptions @mcp_instance.tool() async def flawed_scrape_website(url, depth=1, extract_links=True, timeout=30.0): """Website scraper tool. Extracts content from websites. """ # Simulate scraping return { "title": f"Page at {url}", "content": f"Scraped content with depth {depth}", "links": ["https://example.com/1", "https://example.com/2"] if extract_links else [] } created_tools.append("flawed_scrape_website") # Tool 3: Confusing Schema & Inconsistent Description @mcp_instance.tool() async def flawed_data_processor(config, inputs, format="json"): """Processes data. The analyzer takes configuration and processes input data. The system allows different engine versions and parameters. """ # Just return dummy data return { "outputs": [f"Processed: {i}" for i in inputs], "engine_used": config.get("engine", "v1"), "format": format } created_tools.append("flawed_data_processor") # Tool 4: Misleading Examples in Description but no schema examples @mcp_instance.tool() async def flawed_product_search(query, filters=None, sort="rating", page=1, per_page=20): """Search for products in the database. Example usage: ``` search_products("laptop", {"category": "electronics", "min_price": 500}, sort_by="newest") ``` The search function allows querying for items along with filtering and sorting options. """ # Return dummy results return { "results": [{"id": i, "name": f"{query} product {i}", "price": random.randint(10, 1000)} for i in range(1, 6)], "total": 243, "page": page, "per_page": per_page } created_tools.append("flawed_product_search") # Tool 5: Schema with type issues (number vs integer conflicts) @mcp_instance.tool() async def flawed_calculator(values, operation, precision=2, scale_factor=1.0): """Statistical calculator. Calculate statistics on a set of values. The operation determines which statistic to calculate. Valid operations are: - sum: Calculate the sum of all values - average: Calculate the mean of the values - max: Find the maximum value - min: Find the minimum value The precision parameter must be an integer between 0 and 10. After calculation, the result is multiplied by the scale_factor. """ # Perform the calculation if operation == "sum": result = sum(values) elif operation == "average": result = sum(values) / len(values) if values else 0 elif operation == "max": result = max(values) if values else None elif operation == "min": result = min(values) if values else None else: result = None # Apply scale and precision if result is not None: result = round(result * scale_factor, precision) return {"result": result} created_tools.append("flawed_calculator") logger.success(f"Successfully created {len(created_tools)} flawed example tools", emoji_key="success") return created_tools except Exception as e: logger.error(f"Error creating flawed example tools: {e}", emoji_key="error", exc_info=True) console.print(f"[bold red]Error creating flawed example tools:[/bold red] {escape(str(e))}") return [] async def display_refinement_progress(event: RefinementProgressEvent): """Handle progress events from the refinement process.""" # Create a formatted message based on the event type if event.stage == "starting_iteration": message = f"[bold cyan]Starting iteration {event.iteration}/{event.total_iterations} for {event.tool_name}[/bold cyan]" elif event.stage == "agent_simulation": message = f"[blue]Simulating agent usage for {event.tool_name}...[/blue]" elif event.stage == "test_generation": message = f"[blue]Generating test cases for {event.tool_name}...[/blue]" elif event.stage == "test_execution_start": message = f"[blue]Executing tests for {event.tool_name}...[/blue]" elif event.stage == "test_execution_progress": message = f"[blue]Test execution progress: {event.progress_pct:.1f}%[/blue]" elif event.stage == "test_execution_end": success_rate = event.details.get("success_rate") if event.details else None if success_rate is not None: message = f"[green]Tests completed for {event.tool_name} - Success rate: {success_rate:.1%}[/green]" else: message = f"[green]Tests completed for {event.tool_name}[/green]" elif event.stage == "analysis_start": message = f"[blue]Analyzing results for {event.tool_name}...[/blue]" elif event.stage == "analysis_end": message = f"[green]Analysis completed for {event.tool_name}[/green]" elif event.stage == "schema_patching": message = f"[blue]Applying schema patches for {event.tool_name}...[/blue]" elif event.stage == "winnowing": message = f"[blue]Optimizing documentation for {event.tool_name}...[/blue]" elif event.stage == "iteration_complete": message = f"[bold green]Iteration {event.iteration} complete for {event.tool_name}[/bold green]" elif event.stage == "tool_complete": message = f"[bold magenta]Refinement complete for {event.tool_name}[/bold magenta]" elif event.stage == "error": message = f"[bold red]Error during refinement for {event.tool_name}: {event.message}[/bold red]" else: message = f"[dim]{event.message}[/dim]" # Print the message detail_console.print(message) # Print additional details if in verbose mode if SETTINGS["verbose"] and event.details: try: detail_console.print(f"[dim cyan]Details: {json.dumps(event.details, default=str)}[/dim cyan]") except Exception: detail_console.print(f"[dim cyan]Details: {event.details}[/dim cyan]") # Return True to confirm the callback was processed return True def create_text_diff(original: str, improved: str) -> Panel: """Create a colorized diff between original and improved text.""" diff = difflib.unified_diff( original.splitlines(), improved.splitlines(), lineterm='', n=3 # Context lines ) # Convert diff to rich text with colors rich_diff = [] for line in diff: if line.startswith('+'): rich_diff.append(f"[green]{escape(line)}[/green]") elif line.startswith('-'): rich_diff.append(f"[red]{escape(line)}[/red]") elif line.startswith('@@'): rich_diff.append(f"[cyan]{escape(line)}[/cyan]") else: rich_diff.append(escape(line)) # Return as panel if rich_diff: diff_panel = Panel( "\n".join(rich_diff), title="Documentation Changes (Diff)", border_style="yellow", expand=False ) return diff_panel else: return Panel( "[dim italic]No differences found[/dim italic]", title="Documentation Changes (Diff)", border_style="dim", expand=False ) def create_side_by_side_diff(original: str, improved: str, title: str = "Documentation Comparison") -> Panel: """Create a side-by-side comparison of original and improved text.""" # Wrap in panels with highlighting original_panel = Panel( escape(original), title="Original", border_style="dim red", expand=True ) improved_panel = Panel( escape(improved), title="Improved", border_style="green", expand=True ) # Create side-by-side group comparison = Group( Rule("Before / After"), Group( original_panel, improved_panel ) ) return Panel( comparison, title=title, border_style="cyan", expand=False ) def display_refinement_result( result: Dict, console: Console = console, visualization_level: str = "standard", save_to_file: bool = False, output_dir: Optional[Path] = None ): """Display the results of the docstring refinement process.""" console.print(Rule("[bold green]Refinement Results[/bold green]", style="green")) # Summary statistics stats_table = Table(title="[bold]Summary Statistics[/bold]", box=box.ROUNDED, show_header=False, expand=False) stats_table.add_column("Metric", style="cyan", no_wrap=True) stats_table.add_column("Value", style="white") stats_table.add_row("Total Tools Refined", str(len(result.get("refined_tools", [])))) stats_table.add_row("Total Iterations", str(result.get("total_iterations_run", 0))) stats_table.add_row("Total Tests Executed", str(result.get("total_test_calls_attempted", 0))) stats_table.add_row("Total Test Failures", str(result.get("total_test_calls_failed", 0))) stats_table.add_row("Total Validation Failures", str(result.get("total_schema_validation_failures", 0))) stats_table.add_row("Total Processing Time", f"{result.get('total_processing_time', 0.0):.2f}s") stats_table.add_row("Total Cost", f"${result.get('total_refinement_cost', 0.0):.6f}") console.print(stats_table) # Save results to file if requested if save_to_file and output_dir: timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S") result_file = output_dir / f"refinement_results_{timestamp}.json" try: with open(result_file, 'w') as f: json.dump(result, f, indent=2, default=str) console.print(f"[green]Results saved to:[/green] {result_file}") except Exception as e: console.print(f"[red]Error saving results to file:[/red] {e}") # Tools refined refined_tools = result.get("refined_tools", []) if refined_tools: console.print("\n[bold]Tools Refined:[/bold]") # Results tallying total_description_improvements = 0 total_schema_improvements = 0 total_example_improvements = 0 flaw_categories_observed = {} for i, tool in enumerate(refined_tools): tool_name = tool.get("tool_name", "Unknown tool") initial_success_rate = tool.get("initial_success_rate", 0.0) final_success_rate = tool.get("final_success_rate", 0.0) improvement_factor = tool.get("improvement_factor", 0.0) # Decide on panel color based on improvement if improvement_factor > 0.5: border_style = "green" elif improvement_factor > 0: border_style = "blue" else: border_style = "yellow" # Create a panel for each tool success_change = (final_success_rate - initial_success_rate) * 100 success_change_str = ( f"[green]+{success_change:.1f}%[/green]" if success_change > 0 else f"[red]{success_change:.1f}%[/red]" if success_change < 0 else "[yellow]No change[/yellow]" ) tool_panel_content = [ f"Initial Success Rate: [yellow]{initial_success_rate:.1%}[/yellow]", f"Final Success Rate: [green]{final_success_rate:.1%}[/green]", f"Change: {success_change_str}", f"Improvement Factor: [cyan]{improvement_factor:.2f}x[/cyan]" ] console.print(Panel( Group(*tool_panel_content), title=f"[bold]{i+1}. {tool_name}[/bold]", border_style=border_style, expand=False )) # Display the final proposed changes final_changes = tool.get("final_proposed_changes", {}) iterations = tool.get("iterations", []) if final_changes: # Check if description was improved original_desc = None for iter_data in iterations: if iter_data.get("iteration") == 1: # Get the original description from the first iteration original_desc = iter_data.get("documentation_used", {}).get("description", "") break final_desc = final_changes.get("description", "") # Count this as an improvement if descriptions differ if original_desc and final_desc and original_desc != final_desc: total_description_improvements += 1 # Display description changes based on visualization level if visualization_level in ["standard", "full"]: console.print("[bold cyan]Description Changes:[/bold cyan]") if visualization_level == "full": # Show diff view for detailed visualization console.print(create_text_diff(original_desc, final_desc)) # Show side-by-side comparison console.print(create_side_by_side_diff( original_desc, final_desc, title="Description Comparison" )) # Display schema patches if any schema_patches = tool.get("final_proposed_schema_patches", []) if schema_patches: total_schema_improvements += 1 if visualization_level in ["standard", "full"]: console.print("[bold cyan]Schema Patches Applied:[/bold cyan]") console.print(Panel( Syntax(json.dumps(schema_patches, indent=2), "json", theme="default", line_numbers=False), title="JSON Patch Operations", border_style="magenta", expand=False )) # Display examples examples = final_changes.get("examples", []) if examples: total_example_improvements += len(examples) if visualization_level in ["standard", "full"]: console.print("[bold cyan]Generated Examples:[/bold cyan]") examples_to_show = examples if visualization_level == "full" else examples[:3] for j, example in enumerate(examples_to_show): args = example.get("args", {}) comment = example.get("comment", "No description") addresses_failure = example.get("addresses_failure_pattern", "") # Add failure pattern as subtitle if present subtitle = f"Addresses: {addresses_failure}" if addresses_failure else None console.print(Panel( Syntax(json.dumps(args, indent=2), "json", theme="default", line_numbers=False), title=f"Example {j+1}: {comment}", subtitle=subtitle, border_style="dim green", expand=False )) if len(examples) > 3 and visualization_level == "standard": console.print(f"[dim]...and {len(examples) - 3} more examples[/dim]") # Collect flaw categories if available for iter_data in iterations: analysis = iter_data.get("analysis", {}) if analysis: flaws = analysis.get("identified_flaw_categories", []) for flaw in flaws: if flaw not in flaw_categories_observed: flaw_categories_observed[flaw] = 0 flaw_categories_observed[flaw] += 1 console.print() # Add spacing between tools # Display improvement summary console.print(Rule("[bold blue]Improvement Summary[/bold blue]", style="blue")) improvement_table = Table(box=box.SIMPLE, show_header=True, header_style="bold cyan") improvement_table.add_column("Improvement Type", style="blue") improvement_table.add_column("Count", style="cyan") improvement_table.add_column("Details", style="white") improvement_table.add_row( "Description Improvements", str(total_description_improvements), f"{total_description_improvements} of {len(refined_tools)} tools ({total_description_improvements/len(refined_tools)*100:.0f}%)" ) improvement_table.add_row( "Schema Improvements", str(total_schema_improvements), f"{total_schema_improvements} of {len(refined_tools)} tools ({total_schema_improvements/len(refined_tools)*100:.0f}%)" ) improvement_table.add_row( "Example Additions", str(total_example_improvements), f"Average {total_example_improvements/len(refined_tools):.1f} examples per tool" ) console.print(improvement_table) # Display flaw categories if any were observed if flaw_categories_observed and visualization_level in ["standard", "full"]: console.print("\n[bold cyan]Documentation Flaws Identified:[/bold cyan]") flaws_table = Table(box=box.SIMPLE, show_header=True, header_style="bold magenta") flaws_table.add_column("Flaw Category", style="magenta") flaws_table.add_column("Occurrences", style="cyan") flaws_table.add_column("Description", style="white") # Map flaw categories to descriptions flaw_descriptions = { "MISSING_DESCRIPTION": "Documentation is missing key information", "AMBIGUOUS_DESCRIPTION": "Description is unclear or can be interpreted in multiple ways", "INCORRECT_DESCRIPTION": "Description contains incorrect information", "MISSING_SCHEMA_CONSTRAINT": "Schema is missing important constraints", "INCORRECT_SCHEMA_CONSTRAINT": "Schema contains incorrect constraints", "OVERLY_RESTRICTIVE_SCHEMA": "Schema is unnecessarily restrictive", "TYPE_CONFUSION": "Parameter types are inconsistent or unclear", "MISSING_EXAMPLE": "Documentation lacks necessary examples", "MISLEADING_EXAMPLE": "Examples provided are incorrect or misleading", "INCOMPLETE_EXAMPLE": "Examples are present but insufficient", "PARAMETER_DEPENDENCY_UNCLEAR": "Dependencies between parameters are not explained", "CONFLICTING_CONSTRAINTS": "Schema contains contradictory constraints", "AGENT_FORMULATION_ERROR": "Documentation hinders LLM agent's ability to use the tool", "SCHEMA_PREVALIDATION_FAILURE": "Schema validation issues", "TOOL_EXECUTION_ERROR": "Issues with tool execution", "UNKNOWN": "Unspecified documentation issue" } # Sort flaws by occurrence count sorted_flaws = sorted(flaw_categories_observed.items(), key=lambda x: x[1], reverse=True) for flaw, count in sorted_flaws: flaws_table.add_row( flaw, str(count), flaw_descriptions.get(flaw, "No description available") ) console.print(flaws_table) # Error reporting errors = result.get("errors_during_refinement_process", []) if errors: console.print("[bold red]Errors During Refinement:[/bold red]") for error in errors: console.print(f"- [red]{escape(error)}[/red]") async def get_suitable_tools( mcp_instance, count: int = 1, complexity: str = "medium", exclude_tools: Optional[List[str]] = None ) -> List[str]: """ Find suitable tools for refinement based on complexity. Args: mcp_instance: The MCP server instance count: Number of tools to return complexity: Desired complexity level ("simple", "medium", "complex") exclude_tools: List of tool names to exclude Returns: List of suitable tool names """ exclude_tools = exclude_tools or [] # Get all available tools tool_list = await mcp_instance.list_tools() # Filter out excluded tools and refine_tool_documentation itself available_tools = [ t.name for t in tool_list if t.name not in exclude_tools and t.name != "refine_tool_documentation" ] if not available_tools: return [] # Define complexity criteria based on schema properties if complexity == "simple": # Simple tools have few required parameters and a flat schema preferred_tools = [] for tool_name in available_tools: try: tool_def = next((t for t in tool_list if t.name == tool_name), None) if not tool_def: continue input_schema = getattr(tool_def, "inputSchema", {}) if not input_schema: continue properties = input_schema.get("properties", {}) required = input_schema.get("required", []) # Simple tools have few properties and required fields if len(properties) <= 3 and len(required) <= 1: # Check for nested objects which would increase complexity has_nested = any( isinstance(prop, dict) and prop.get("type") == "object" for prop in properties.values() ) if not has_nested: preferred_tools.append(tool_name) except Exception: continue elif complexity == "complex": # Complex tools have deep nested structures and many required parameters preferred_tools = [] for tool_name in available_tools: try: tool_def = next((t for t in tool_list if t.name == tool_name), None) if not tool_def: continue input_schema = getattr(tool_def, "inputSchema", {}) if not input_schema: continue properties = input_schema.get("properties", {}) required = input_schema.get("required", []) # Complex tools have many properties or required fields if len(properties) >= 5 or len(required) >= 3: # Check for nested objects which would increase complexity has_nested = any( isinstance(prop, dict) and prop.get("type") == "object" for prop in properties.values() ) if has_nested: preferred_tools.append(tool_name) except Exception: continue else: # medium complexity (default) # Medium tools are somewhere in between preferred_tools = [] for tool_name in available_tools: try: tool_def = next((t for t in tool_list if t.name == tool_name), None) if not tool_def: continue input_schema = getattr(tool_def, "inputSchema", {}) if not input_schema: continue properties = input_schema.get("properties", {}) # Medium tools have a moderate number of properties if 3 <= len(properties) <= 6: preferred_tools.append(tool_name) except Exception: continue # If we couldn't find tools matching the complexity criteria, fall back to any available tool if not preferred_tools: preferred_tools = available_tools # Prioritize tools without namespaces (i.e., not "namespace:tool_name") prioritized_tools = [t for t in preferred_tools if ":" not in t] # If we still need more tools and have prioritized all we could, add namespace tools if len(prioritized_tools) < count: namespace_tools = [t for t in preferred_tools if ":" in t] prioritized_tools.extend(namespace_tools) # Return the requested number of tools (or fewer if not enough are available) return prioritized_tools[:min(count, len(prioritized_tools))] async def demo_single_tool_refinement( gateway: Gateway, tracker: CostTracker, target_tool: Optional[str] = None, refinement_provider: Optional[str] = None, refinement_model: Optional[str] = None, max_iterations: Optional[int] = None ): """Demonstrate refining documentation for a single tool.""" console.print(Rule("[bold cyan]Single Tool Refinement[/bold cyan]", style="cyan")) # Use specified tool or find a suitable one selected_tool = None if target_tool: # Check if specified tool exists tool_list = await gateway.mcp.list_tools() available_tools = [t.name for t in tool_list] if target_tool in available_tools: selected_tool = target_tool else: logger.warning(f"Specified tool '{target_tool}' not found", emoji_key="warning") console.print(f"[yellow]Warning:[/yellow] Specified tool '{target_tool}' not found. Selecting automatically.") # Auto-select if needed if not selected_tool: suitable_tools = await get_suitable_tools(gateway.mcp, count=1, complexity="medium") if suitable_tools: selected_tool = suitable_tools[0] else: logger.error("No suitable tools found for refinement demo", emoji_key="error") console.print("[bold red]Error:[/bold red] No suitable tools found for refinement demo.") return console.print(f"Selected tool for refinement: [cyan]{selected_tool}[/cyan]") # Determine provider and model provider = refinement_provider or Provider.OPENAI.value # Find best available model if not specified if not refinement_model: try: if provider == Provider.OPENAI.value: model = "gpt-4.1" # Prefer this for best results # Check if model is available provider_instance = gateway.providers.get(provider) if provider_instance: models = await provider_instance.list_models() model_ids = [m.get("id") for m in models] if model not in model_ids: model = "gpt-4.1-mini" # Fall back to mini elif provider == Provider.ANTHROPIC.value: model = "claude-3-5-sonnet" else: # Use default model for other providers provider_instance = gateway.providers.get(provider) if provider_instance: model = provider_instance.get_default_model() else: model = None except Exception as e: logger.warning(f"Error determining model for {provider}: {e}", emoji_key="warning") model = None # If we still don't have a model, try a different provider if not model: for fallback_provider in SETTINGS["fallback_providers"]: try: provider_instance = gateway.providers.get(fallback_provider) if provider_instance: model = provider_instance.get_default_model() provider = fallback_provider break except Exception: continue # If still no model, use a reasonable default if not model: model = "gpt-4.1-mini" provider = Provider.OPENAI.value else: model = refinement_model # Define refinement parameters iterations = max_iterations or 2 # Default to 2 for demo params = { "tool_names": [selected_tool], "max_iterations": iterations, "refinement_model_config": { "provider": provider, "model": model, "temperature": 0.2, }, "validation_level": "full", "enable_winnowing": True, "progress_callback": display_refinement_progress, } console.print(Panel( Syntax(json.dumps({k: v for k, v in params.items() if k != "progress_callback"}, indent=2), "json"), title="Refinement Parameters", border_style="dim cyan", expand=False )) # Create a progress display console.print("\n[bold cyan]Refinement Progress:[/bold cyan]") detail_console.print(f"\n[bold]Starting refinement for {selected_tool}...[/bold]") # Estimate cost estimated_cost = 0.03 * iterations # Very rough estimate per iteration console.print(f"[cyan]Estimated cost:[/cyan] ${estimated_cost:.2f} USD") # Check if cost would exceed limit if estimated_cost > SETTINGS["cost_limit"]: console.print(Panel( f"Estimated cost (${estimated_cost:.2f}) exceeds the set limit (${SETTINGS['cost_limit']:.2f}).\n" "Adjusting iterations to stay within budget.", title="⚠️ Cost Limit Warning", border_style="yellow", expand=False )) # Adjust iterations to stay under limit adjusted_iterations = max(1, int(SETTINGS["cost_limit"] / 0.03)) params["max_iterations"] = adjusted_iterations console.print(f"[yellow]Reducing iterations from {iterations} to {adjusted_iterations}[/yellow]") with Progress( TextColumn("[bold blue]{task.description}"), BarColumn(complete_style="green", finished_style="green"), TaskProgressColumn(), TimeElapsedColumn(), console=console, expand=True ) as progress: task_id = progress.add_task("[cyan]Refining tool documentation...", total=100) # Execute the refinement start_time = time.time() try: result = await gateway.mcp.call_tool("refine_tool_documentation", params) # Simulate progress updates (since we can't hook into the actual progress) # The actual progress is displayed through display_refinement_progress elapsed = 0 while progress.tasks[task_id].completed < 100 and elapsed < 60: progress.update(task_id, completed=min(95, elapsed * 1.5)) await asyncio.sleep(0.5) elapsed = time.time() - start_time progress.update(task_id, completed=100) # Track cost if available if isinstance(result, dict) and "total_refinement_cost" in result: tracker.add_generic_cost( cost=result.get("total_refinement_cost", 0.0), description=f"Refinement of {selected_tool}", provider=provider, model=model ) # Display the results display_refinement_result( result, console=console, visualization_level=SETTINGS["visualization_level"], save_to_file=SETTINGS["save_results"], output_dir=SETTINGS["output_dir"] ) return result except Exception as e: progress.update(task_id, completed=100, description="[bold red]Refinement failed!") logger.error(f"Error during single tool refinement: {e}", emoji_key="error", exc_info=True) console.print(f"[bold red]Error during refinement:[/bold red] {escape(str(e))}") return None async def demo_multi_tool_refinement( gateway: Gateway, tracker: CostTracker, target_tools: Optional[List[str]] = None, refinement_provider: Optional[str] = None, refinement_model: Optional[str] = None, max_iterations: Optional[int] = None ): """Demonstrate refining documentation for multiple tools simultaneously.""" console.print(Rule("[bold cyan]Multi-Tool Refinement[/bold cyan]", style="cyan")) # Use specified tools or find suitable ones selected_tools = [] if target_tools: # Check which specified tools exist tool_list = await gateway.mcp.list_tools() available_tools = [t.name for t in tool_list] for tool_name in target_tools: if tool_name in available_tools: selected_tools.append(tool_name) else: logger.warning(f"Specified tool '{tool_name}' not found", emoji_key="warning") console.print(f"[yellow]Warning:[/yellow] Specified tool '{tool_name}' not found. Skipping.") # Auto-select if needed if not selected_tools: # Get various complexity levels for a diverse mix simple_tools = await get_suitable_tools(gateway.mcp, count=1, complexity="simple") medium_tools = await get_suitable_tools(gateway.mcp, count=1, complexity="medium", exclude_tools=simple_tools) complex_tools = await get_suitable_tools(gateway.mcp, count=1, complexity="complex", exclude_tools=simple_tools + medium_tools) selected_tools = simple_tools + medium_tools + complex_tools if not selected_tools: # Fall back to any available tools selected_tools = await get_suitable_tools(gateway.mcp, count=3, complexity="medium") if not selected_tools: logger.error("No suitable tools found for multi-tool refinement demo", emoji_key="error") console.print("[bold red]Error:[/bold red] No suitable tools found for multi-tool refinement demo.") return console.print(f"Selected tools for refinement: [cyan]{', '.join(selected_tools)}[/cyan]") # Determine provider and model provider = refinement_provider or Provider.OPENAI.value # Find best available model if not specified if not refinement_model: try: if provider == Provider.OPENAI.value: model = "gpt-4.1-mini" # Use mini for multi-tool to save cost # Check if model is available provider_instance = gateway.providers.get(provider) if provider_instance: models = await provider_instance.list_models() model_ids = [m.get("id") for m in models] if model not in model_ids: model = provider_instance.get_default_model() elif provider == Provider.ANTHROPIC.value: model = "claude-3-5-haiku" else: # Use default model for other providers provider_instance = gateway.providers.get(provider) if provider_instance: model = provider_instance.get_default_model() else: model = None except Exception as e: logger.warning(f"Error determining model for {provider}: {e}", emoji_key="warning") model = None # If we still don't have a model, try a different provider if not model: for fallback_provider in SETTINGS["fallback_providers"]: try: provider_instance = gateway.providers.get(fallback_provider) if provider_instance: model = provider_instance.get_default_model() provider = fallback_provider break except Exception: continue # If still no model, use a reasonable default if not model: model = "gpt-4.1-mini" provider = Provider.OPENAI.value else: model = refinement_model # Define refinement parameters with variations from the first demo iterations = max_iterations or 1 # Default to 1 for multi-tool params = { "tool_names": selected_tools, "max_iterations": iterations, "refinement_model_config": { "provider": provider, "model": model, "temperature": 0.3, }, # Add an ensemble for better analysis if using full visualization "analysis_ensemble_configs": [ { "provider": Provider.ANTHROPIC.value if provider != Provider.ANTHROPIC.value else Provider.OPENAI.value, "model": "claude-3-5-haiku" if provider != Provider.ANTHROPIC.value else "gpt-4.1-mini", "temperature": 0.2, } ] if SETTINGS["visualization_level"] == "full" else None, "validation_level": "basic", # Use basic validation for speed "enable_winnowing": False, # Skip winnowing for demo speed "progress_callback": display_refinement_progress, } console.print(Panel( Syntax(json.dumps({k: v for k, v in params.items() if k not in ["progress_callback", "analysis_ensemble_configs"]}, indent=2), "json"), title="Multi-Tool Refinement Parameters", border_style="dim cyan", expand=False )) # Estimate cost - higher with multiple tools estimated_cost = 0.02 * iterations * len(selected_tools) console.print(f"[cyan]Estimated cost:[/cyan] ${estimated_cost:.2f} USD") # Check if cost would exceed limit if estimated_cost > SETTINGS["cost_limit"]: console.print(Panel( f"Estimated cost (${estimated_cost:.2f}) exceeds the set limit (${SETTINGS['cost_limit']:.2f}).\n" "Reducing tool count to stay within budget.", title="⚠️ Cost Limit Warning", border_style="yellow", expand=False )) # Reduce the number of tools max_tools = max(1, int(SETTINGS["cost_limit"] / (0.02 * iterations))) selected_tools = selected_tools[:max_tools] params["tool_names"] = selected_tools console.print(f"[yellow]Reducing tools to: {', '.join(selected_tools)}[/yellow]") # Create a progress display console.print("\n[bold cyan]Multi-Tool Refinement Progress:[/bold cyan]") detail_console.print(f"\n[bold]Starting refinement for {len(selected_tools)} tools...[/bold]") # We'll create a task for each tool with Progress( TextColumn("[bold blue]{task.description}"), BarColumn(complete_style="green", finished_style="green"), TaskProgressColumn(), TimeElapsedColumn(), console=console, expand=True ) as progress: # Create a task for overall progress overall_task = progress.add_task("[cyan]Overall progress...", total=100) # Execute the refinement start_time = time.time() try: result = await gateway.mcp.call_tool("refine_tool_documentation", params) # Simulate progress updates # The actual progress is displayed through display_refinement_progress elapsed = 0 while progress.tasks[overall_task].completed < 100 and elapsed < 120: progress.update(overall_task, completed=min(95, elapsed * 0.8)) await asyncio.sleep(0.5) elapsed = time.time() - start_time progress.update(overall_task, completed=100) # Track cost if available if isinstance(result, dict) and "total_refinement_cost" in result: tracker.add_generic_cost( cost=result.get("total_refinement_cost", 0.0), description=f"Multi-tool refinement ({len(selected_tools)} tools)", provider=provider, model=model ) # Display the results display_refinement_result( result, console=console, visualization_level=SETTINGS["visualization_level"], save_to_file=SETTINGS["save_results"], output_dir=SETTINGS["output_dir"] ) return result except Exception as e: progress.update(overall_task, completed=100, description="[bold red]Refinement failed!") logger.error(f"Error during multi-tool refinement: {e}", emoji_key="error", exc_info=True) console.print(f"[bold red]Error during multi-tool refinement:[/bold red] {escape(str(e))}") return None async def demo_custom_test_generation( gateway: Gateway, tracker: CostTracker, target_tool: Optional[str] = None, refinement_provider: Optional[str] = None, refinement_model: Optional[str] = None, max_iterations: Optional[int] = None ): """Demonstrate refinement with custom test generation strategies.""" console.print(Rule("[bold cyan]Custom Test Generation Strategy[/bold cyan]", style="cyan")) # Choose a single tool to refine selected_tool = None if target_tool: # Check if specified tool exists tool_list = await gateway.mcp.list_tools() available_tools = [t.name for t in tool_list] if target_tool in available_tools: selected_tool = target_tool else: logger.warning(f"Specified tool '{target_tool}' not found", emoji_key="warning") console.print(f"[yellow]Warning:[/yellow] Specified tool '{target_tool}' not found. Selecting automatically.") # Auto-select if needed (prefer complex tools for custom test demo) if not selected_tool: complex_tools = await get_suitable_tools(gateway.mcp, count=1, complexity="complex") if complex_tools: selected_tool = complex_tools[0] else: # Fall back to medium complexity medium_tools = await get_suitable_tools(gateway.mcp, count=1, complexity="medium") if medium_tools: selected_tool = medium_tools[0] else: # Last resort - any tool simple_tools = await get_suitable_tools(gateway.mcp, count=1, complexity="simple") if simple_tools: selected_tool = simple_tools[0] if not selected_tool: logger.error("No suitable tools found for custom test generation demo", emoji_key="error") console.print("[bold red]Error:[/bold red] No suitable tools found for custom test generation demo.") return console.print(f"Selected tool for custom test generation: [cyan]{selected_tool}[/cyan]") # Determine provider and model provider = refinement_provider or Provider.OPENAI.value # Find best available model if not specified if not refinement_model: try: if provider == Provider.OPENAI.value: model = "gpt-4.1" # Prefer this for best results # Check if model is available provider_instance = gateway.providers.get(provider) if provider_instance: models = await provider_instance.list_models() model_ids = [m.get("id") for m in models] if model not in model_ids: model = "gpt-4.1-mini" # Fall back to mini elif provider == Provider.ANTHROPIC.value: model = "claude-3-5-sonnet" else: # Use default model for other providers provider_instance = gateway.providers.get(provider) if provider_instance: model = provider_instance.get_default_model() else: model = None except Exception as e: logger.warning(f"Error determining model for {provider}: {e}", emoji_key="warning") model = None # If we still don't have a model, try a different provider if not model: for fallback_provider in SETTINGS["fallback_providers"]: try: provider_instance = gateway.providers.get(fallback_provider) if provider_instance: model = provider_instance.get_default_model() provider = fallback_provider break except Exception: continue # If still no model, use a reasonable default if not model: model = "gpt-4.1-mini" provider = Provider.OPENAI.value else: model = refinement_model # Define refinement parameters with custom test generation strategy iterations = max_iterations or 1 params = { "tool_names": [selected_tool], "max_iterations": iterations, "refinement_model_config": { "provider": provider, "model": model, "temperature": 0.2, }, # Custom test generation strategy "generation_config": { "positive_required_only": 3, # More tests with just required params "positive_optional_mix": 5, # More tests with mixed optional params "negative_type": 4, # More type validation checks "negative_required": 3, # More tests with missing required params "edge_boundary_min": 2, # More tests with boundary values "edge_boundary_max": 2, "llm_realistic_combo": 5, # More LLM-generated realistic tests "llm_ambiguity_probe": 3, # More tests probing ambiguities }, "validation_level": "full", "enable_winnowing": True, "progress_callback": display_refinement_progress, } console.print(Panel( Group( Syntax(json.dumps({k: v for k, v in params.items() if k not in ["progress_callback", "generation_config"]}, indent=2), "json"), "\n[bold cyan]Custom Test Generation Strategy:[/bold cyan]", Syntax(json.dumps(params["generation_config"], indent=2), "json"), ), title="Custom Test Generation Parameters", border_style="dim cyan", expand=False )) # Estimate cost (higher due to more test cases) estimated_cost = 0.04 * iterations console.print(f"[cyan]Estimated cost:[/cyan] ${estimated_cost:.2f} USD") # Check if cost would exceed limit if estimated_cost > SETTINGS["cost_limit"]: console.print(Panel( f"Estimated cost (${estimated_cost:.2f}) exceeds the set limit (${SETTINGS['cost_limit']:.2f}).\n" "Reducing iterations to stay within budget.", title="⚠️ Cost Limit Warning", border_style="yellow", expand=False )) # Adjust iterations to stay under limit params["max_iterations"] = 1 # Create a progress display console.print("\n[bold cyan]Custom Test Generation Progress:[/bold cyan]") detail_console.print(f"\n[bold]Starting refinement with custom test strategy for {selected_tool}...[/bold]") with Progress( TextColumn("[bold blue]{task.description}"), BarColumn(complete_style="green", finished_style="green"), TaskProgressColumn(), TimeElapsedColumn(), console=console, expand=True ) as progress: task_id = progress.add_task("[cyan]Refining with custom test strategy...", total=100) # Execute the refinement start_time = time.time() try: result = await gateway.mcp.call_tool("refine_tool_documentation", params) # Simulate progress updates elapsed = 0 while progress.tasks[task_id].completed < 100 and elapsed < 60: progress.update(task_id, completed=min(95, elapsed * 1.5)) await asyncio.sleep(0.5) elapsed = time.time() - start_time progress.update(task_id, completed=100) # Track cost if available if isinstance(result, dict) and "total_refinement_cost" in result: tracker.add_generic_cost( cost=result.get("total_refinement_cost", 0.0), description=f"Custom test strategy for {selected_tool}", provider=provider, model=model ) # Display the results display_refinement_result( result, console=console, visualization_level=SETTINGS["visualization_level"], save_to_file=SETTINGS["save_results"], output_dir=SETTINGS["output_dir"] ) return result except Exception as e: progress.update(task_id, completed=100, description="[bold red]Refinement failed!") logger.error(f"Error during custom test generation: {e}", emoji_key="error", exc_info=True) console.print(f"[bold red]Error during custom test generation:[/bold red] {escape(str(e))}") return None async def demo_all_tools_refinement( gateway: Gateway, tracker: CostTracker, refinement_provider: Optional[str] = None, refinement_model: Optional[str] = None, max_iterations: Optional[int] = None ): """Demonstrate refining documentation for all available tools.""" console.print(Rule("[bold cyan]All Tools Refinement[/bold cyan]", style="cyan")) # Get all available tools (excluding refine_tool_documentation itself) tool_list = await gateway.mcp.list_tools() available_tools = [ t.name for t in tool_list if t.name != "refine_tool_documentation" ] if not available_tools: logger.error("No tools available for refinement", emoji_key="error") console.print("[bold red]Error:[/bold red] No tools available for refinement.") return console.print(f"[cyan]Found {len(available_tools)} tools available for refinement[/cyan]") # Determine provider and model provider = refinement_provider or Provider.OPENAI.value # Find best available model if not specified if not refinement_model: try: if provider == Provider.OPENAI.value: model = "gpt-4.1-mini" # Use mini for multi-tool to save cost # Check if model is available provider_instance = gateway.providers.get(provider) if provider_instance: models = await provider_instance.list_models() model_ids = [m.get("id") for m in models] if model not in model_ids: model = provider_instance.get_default_model() elif provider == Provider.ANTHROPIC.value: model = "claude-3-5-haiku" else: # Use default model for other providers provider_instance = gateway.providers.get(provider) if provider_instance: model = provider_instance.get_default_model() else: model = None except Exception as e: logger.warning(f"Error determining model for {provider}: {e}", emoji_key="warning") model = None # If we still don't have a model, try a different provider if not model: for fallback_provider in SETTINGS["fallback_providers"]: try: provider_instance = gateway.providers.get(fallback_provider) if provider_instance: model = provider_instance.get_default_model() provider = fallback_provider break except Exception: continue # If still no model, use a reasonable default if not model: model = "gpt-4.1-mini" provider = Provider.OPENAI.value else: model = refinement_model # Define refinement parameters iterations = max_iterations or 1 # Default to 1 for all-tools params = { "refine_all_available": True, # This is the key difference for this demo "max_iterations": iterations, "refinement_model_config": { "provider": provider, "model": model, "temperature": 0.3, }, "validation_level": "basic", # Use basic validation for speed "enable_winnowing": False, # Skip winnowing for demo speed "progress_callback": display_refinement_progress, } console.print(Panel( Syntax(json.dumps({k: v for k, v in params.items() if k != "progress_callback"}, indent=2), "json"), title="All Tools Refinement Parameters", border_style="dim cyan", expand=False )) # Estimate cost - higher with multiple tools estimated_cost = 0.01 * iterations * len(available_tools) # Lower per-tool cost with bulk processing console.print(f"[cyan]Estimated cost:[/cyan] ${estimated_cost:.2f} USD") # Check if cost would exceed limit if estimated_cost > SETTINGS["cost_limit"]: console.print(Panel( f"Estimated cost (${estimated_cost:.2f}) exceeds the set limit (${SETTINGS['cost_limit']:.2f}).\n" "Switching to targeted refinement to stay within budget.", title="⚠️ Cost Limit Warning", border_style="yellow", expand=False )) # Switch to using targeted tool_names instead of refine_all_available max_tools = max(1, int(SETTINGS["cost_limit"] / (0.02 * iterations))) selected_tools = random.sample(available_tools, min(max_tools, len(available_tools))) params["refine_all_available"] = False params["tool_names"] = selected_tools console.print(f"[yellow]Reducing to {len(selected_tools)} randomly selected tools[/yellow]") # Create a progress display console.print("\n[bold cyan]All Tools Refinement Progress:[/bold cyan]") detail_console.print(f"\n[bold]Starting refinement for all {len(available_tools)} tools...[/bold]") with Progress( SpinnerColumn(), TextColumn("[bold blue]{task.description}"), BarColumn(complete_style="green", finished_style="green"), TaskProgressColumn(), TimeRemainingColumn(), console=console, expand=True ) as progress: task_id = progress.add_task("[cyan]Refining all tools...", total=100) # Execute the refinement start_time = time.time() try: result = await gateway.mcp.call_tool("refine_tool_documentation", params) # Simulate progress updates elapsed = 0 while progress.tasks[task_id].completed < 100 and elapsed < 300: # Longer timeout for all tools progress.update(task_id, completed=min(95, elapsed * 0.3)) # Slower progress for more tools await asyncio.sleep(1.0) elapsed = time.time() - start_time progress.update(task_id, completed=100) # Track cost if available if isinstance(result, dict) and "total_refinement_cost" in result: tracker.add_generic_cost( cost=result.get("total_refinement_cost", 0.0), description=f"All tools refinement ({len(available_tools)} tools)", provider=provider, model=model ) # Display the results display_refinement_result( result, console=console, visualization_level=SETTINGS["visualization_level"], save_to_file=SETTINGS["save_results"], output_dir=SETTINGS["output_dir"] ) return result except Exception as e: progress.update(task_id, completed=100, description="[bold red]Refinement failed!") logger.error(f"Error during all tools refinement: {e}", emoji_key="error", exc_info=True) console.print(f"[bold red]Error during all tools refinement:[/bold red] {escape(str(e))}") return None async def demo_schema_focused_refinement( gateway: Gateway, tracker: CostTracker, target_tool: Optional[str] = None, refinement_provider: Optional[str] = None, refinement_model: Optional[str] = None ): """Demonstrate refinement focused specifically on schema improvements.""" console.print(Rule("[bold cyan]Schema-Focused Refinement[/bold cyan]", style="cyan")) # Choose a complex tool to refine selected_tool = None if target_tool: # Check if specified tool exists tool_list = await gateway.mcp.list_tools() available_tools = [t.name for t in tool_list] if target_tool in available_tools: selected_tool = target_tool else: logger.warning(f"Specified tool '{target_tool}' not found", emoji_key="warning") console.print(f"[yellow]Warning:[/yellow] Specified tool '{target_tool}' not found. Selecting automatically.") # Auto-select if needed (prefer complex tools for schema refinement) if not selected_tool: complex_tools = await get_suitable_tools(gateway.mcp, count=1, complexity="complex") if complex_tools: selected_tool = complex_tools[0] else: # Fall back to medium complexity medium_tools = await get_suitable_tools(gateway.mcp, count=1, complexity="medium") if medium_tools: selected_tool = medium_tools[0] else: # Last resort - any tool simple_tools = await get_suitable_tools(gateway.mcp, count=1, complexity="simple") if simple_tools: selected_tool = simple_tools[0] if not selected_tool: logger.error("No suitable tools found for schema-focused refinement demo", emoji_key="error") console.print("[bold red]Error:[/bold red] No suitable tools found for schema-focused refinement demo.") return console.print(f"Selected tool for schema-focused refinement: [cyan]{selected_tool}[/cyan]") # Get tool schema tool_list = await gateway.mcp.list_tools() tool_def = next((t for t in tool_list if t.name == selected_tool), None) if not tool_def or not hasattr(tool_def, "inputSchema"): logger.error(f"Could not get schema for tool {selected_tool}", emoji_key="error") console.print(f"[bold red]Error:[/bold red] Could not get schema for tool {selected_tool}.") return input_schema = getattr(tool_def, "inputSchema", {}) # Display the original schema console.print("[bold cyan]Original Schema:[/bold cyan]") console.print(Panel( Syntax(json.dumps(input_schema, indent=2), "json", theme="default", line_numbers=False), title="Original Input Schema", border_style="dim cyan", expand=False )) # Determine provider and model provider = refinement_provider or Provider.OPENAI.value # Find best available model if not specified if not refinement_model: try: if provider == Provider.OPENAI.value: model = "gpt-4.1" # Prefer this for best schema analysis # Check if model is available provider_instance = gateway.providers.get(provider) if provider_instance: models = await provider_instance.list_models() model_ids = [m.get("id") for m in models] if model not in model_ids: model = "gpt-4.1-mini" # Fall back to mini elif provider == Provider.ANTHROPIC.value: model = "claude-3-5-sonnet" else: # Use default model for other providers provider_instance = gateway.providers.get(provider) if provider_instance: model = provider_instance.get_default_model() else: model = None except Exception as e: logger.warning(f"Error determining model for {provider}: {e}", emoji_key="warning") model = None # If we still don't have a model, try a different provider if not model: for fallback_provider in SETTINGS["fallback_providers"]: try: provider_instance = gateway.providers.get(fallback_provider) if provider_instance: model = provider_instance.get_default_model() provider = fallback_provider break except Exception: continue # If still no model, use a reasonable default if not model: model = "gpt-4.1-mini" provider = Provider.OPENAI.value else: model = refinement_model # Define refinement parameters focused on schema improvements params = { "tool_names": [selected_tool], "max_iterations": 1, # Single iteration focused on schema "refinement_model_config": { "provider": provider, "model": model, "temperature": 0.2, }, # Custom test generation strategy focused on schema edge cases "generation_config": { "positive_required_only": 2, "positive_optional_mix": 3, "negative_type": 4, # More type validation checks "negative_required": 3, # More tests with missing required params "negative_enum": 3, # More enum testing "negative_format": 3, # More format testing "negative_range": 3, # More range testing "negative_length": 3, # More length testing "negative_pattern": 3, # More pattern testing "edge_boundary_min": 3, # More tests with min boundary values "edge_boundary_max": 3, # More tests with max boundary values "llm_ambiguity_probe": 2, # Probe for ambiguities }, "validation_level": "full", # Strict validation "enable_winnowing": False, # No winnowing needed "progress_callback": display_refinement_progress, } console.print(Panel( Syntax(json.dumps({k: v for k, v in params.items() if k not in ["progress_callback", "generation_config"]}, indent=2), "json"), title="Schema-Focused Refinement Parameters", border_style="dim cyan", expand=False )) # Estimate cost estimated_cost = 0.035 # Schema focus costs a bit more due to edge case testing console.print(f"[cyan]Estimated cost:[/cyan] ${estimated_cost:.2f} USD") # Create a progress display console.print("\n[bold cyan]Schema-Focused Refinement Progress:[/bold cyan]") detail_console.print(f"\n[bold]Starting schema-focused refinement for {selected_tool}...[/bold]") with Progress( TextColumn("[bold blue]{task.description}"), BarColumn(complete_style="green", finished_style="green"), TaskProgressColumn(), TimeElapsedColumn(), console=console, expand=True ) as progress: task_id = progress.add_task("[cyan]Refining schema...", total=100) # Execute the refinement start_time = time.time() try: result = await gateway.mcp.call_tool("refine_tool_documentation", params) # Simulate progress updates elapsed = 0 while progress.tasks[task_id].completed < 100 and elapsed < 60: progress.update(task_id, completed=min(95, elapsed * 1.5)) await asyncio.sleep(0.5) elapsed = time.time() - start_time progress.update(task_id, completed=100) # Track cost if available if isinstance(result, dict) and "total_refinement_cost" in result: tracker.add_generic_cost( cost=result.get("total_refinement_cost", 0.0), description=f"Schema-focused refinement of {selected_tool}", provider=provider, model=model ) # Extract schema patches from the result refined_tools = result.get("refined_tools", []) target_tool_result = next((t for t in refined_tools if t.get("tool_name") == selected_tool), None) if target_tool_result and target_tool_result.get("final_proposed_schema_patches"): schema_patches = target_tool_result.get("final_proposed_schema_patches", []) patched_schema = target_tool_result.get("final_schema_after_patches", {}) if schema_patches: console.print("[bold green]Schema Refinement Results:[/bold green]") console.print(Panel( Syntax(json.dumps(schema_patches, indent=2), "json", theme="default", line_numbers=False), title="Applied Schema Patches", border_style="magenta", expand=False )) if patched_schema: console.print(Panel( Syntax(json.dumps(patched_schema, indent=2), "json", theme="default", line_numbers=False), title="Refined Schema", border_style="green", expand=False )) # Generate a side-by-side comparison console.print(create_side_by_side_diff( json.dumps(input_schema, indent=2), json.dumps(patched_schema, indent=2), title="Schema Before/After Comparison" )) else: console.print("[yellow]No schema patches were applied.[/yellow]") # Display the full results display_refinement_result( result, console=console, visualization_level=SETTINGS["visualization_level"], save_to_file=SETTINGS["save_results"], output_dir=SETTINGS["output_dir"] ) return result except Exception as e: progress.update(task_id, completed=100, description="[bold red]Refinement failed!") logger.error(f"Error during schema-focused refinement: {e}", emoji_key="error", exc_info=True) console.print(f"[bold red]Error during schema-focused refinement:[/bold red] {escape(str(e))}") return None async def demo_model_comparison( gateway: Gateway, tracker: CostTracker, target_tool: Optional[str] = None ): """Demonstrate comparing different LLM models for refinement.""" console.print(Rule("[bold cyan]Model Comparison for Refinement[/bold cyan]", style="cyan")) # Choose a single tool to refine selected_tool = None if target_tool: # Check if specified tool exists tool_list = await gateway.mcp.list_tools() available_tools = [t.name for t in tool_list] if target_tool in available_tools: selected_tool = target_tool else: logger.warning(f"Specified tool '{target_tool}' not found", emoji_key="warning") console.print(f"[yellow]Warning:[/yellow] Specified tool '{target_tool}' not found. Selecting automatically.") # Auto-select if needed if not selected_tool: medium_tools = await get_suitable_tools(gateway.mcp, count=1, complexity="medium") if medium_tools: selected_tool = medium_tools[0] else: # Fall back to any available tool simple_tools = await get_suitable_tools(gateway.mcp, count=1, complexity="simple") if simple_tools: selected_tool = simple_tools[0] if not selected_tool: logger.error("No suitable tools found for model comparison demo", emoji_key="error") console.print("[bold red]Error:[/bold red] No suitable tools found for model comparison demo.") return console.print(f"Selected tool for model comparison: [cyan]{selected_tool}[/cyan]") # Define models to compare models_to_compare = [] # Check which models are available for provider_name in SETTINGS["preferred_providers"] + SETTINGS["fallback_providers"]: provider_instance = gateway.providers.get(provider_name) if provider_instance: try: available_models = await provider_instance.list_models() model_ids = [m.get("id") for m in available_models] if provider_name == Provider.OPENAI.value: if "gpt-4.1" in model_ids: models_to_compare.append((provider_name, "gpt-4.1")) if "gpt-4.1-mini" in model_ids: models_to_compare.append((provider_name, "gpt-4.1-mini")) elif provider_name == Provider.ANTHROPIC.value: if "claude-3-5-sonnet" in model_ids: models_to_compare.append((provider_name, "claude-3-5-sonnet")) if "claude-3-5-haiku" in model_ids: models_to_compare.append((provider_name, "claude-3-5-haiku")) elif provider_name == Provider.GEMINI.value: if "gemini-2.0-pro" in model_ids: models_to_compare.append((provider_name, "gemini-2.0-pro")) elif provider_name == Provider.DEEPSEEK.value: if "deepseek-chat" in model_ids: models_to_compare.append((provider_name, "deepseek-chat")) # If we already have 3+ models, stop looking if len(models_to_compare) >= 3: break except Exception as e: logger.warning(f"Error listing models for {provider_name}: {e}", emoji_key="warning") # If we don't have enough models, add some defaults that might work if len(models_to_compare) < 2: fallback_models = [ (Provider.OPENAI.value, "gpt-4.1-mini"), (Provider.ANTHROPIC.value, "claude-3-5-haiku"), (Provider.GEMINI.value, "gemini-2.0-pro") ] for provider, model in fallback_models: if (provider, model) not in models_to_compare: models_to_compare.append((provider, model)) if len(models_to_compare) >= 3: break # Limit to max 3 models for a reasonable comparison models_to_compare = models_to_compare[:3] if not models_to_compare: logger.error("No models available for comparison", emoji_key="error") console.print("[bold red]Error:[/bold red] No models available for comparison.") return console.print(f"Models being compared: [cyan]{', '.join([f'{p}/{m}' for p, m in models_to_compare])}[/cyan]") # Estimate total cost estimated_cost = 0.03 * len(models_to_compare) console.print(f"[cyan]Estimated total cost:[/cyan] ${estimated_cost:.2f} USD") # Check if cost would exceed limit if estimated_cost > SETTINGS["cost_limit"]: console.print(Panel( f"Estimated cost (${estimated_cost:.2f}) exceeds the set limit (${SETTINGS['cost_limit']:.2f}).\n" "Reducing the number of models to compare.", title="⚠️ Cost Limit Warning", border_style="yellow", expand=False )) max_models = max(2, int(SETTINGS["cost_limit"] / 0.03)) models_to_compare = models_to_compare[:max_models] console.print(f"[yellow]Comparing only: {', '.join([f'{p}/{m}' for p, m in models_to_compare])}[/yellow]") # Create a progress display console.print("\n[bold cyan]Model Comparison Progress:[/bold cyan]") # Results storage model_results = {} # Run refinement with each model for provider, model in models_to_compare: detail_console.print(f"\n[bold]Starting refinement with {provider}/{model}...[/bold]") params = { "tool_names": [selected_tool], "max_iterations": 1, "refinement_model_config": { "provider": provider, "model": model, "temperature": 0.2, }, "validation_level": "basic", "enable_winnowing": False, "progress_callback": display_refinement_progress, } with Progress( TextColumn(f"[bold blue]Testing {provider}/{model}..."), BarColumn(complete_style="green", finished_style="green"), TaskProgressColumn(), TimeElapsedColumn(), console=console, expand=True ) as progress: task_id = progress.add_task(f"[cyan]Refining with {model}...", total=100) # Execute the refinement start_time = time.time() try: result = await gateway.mcp.call_tool("refine_tool_documentation", params) # Simulate progress updates elapsed = 0 while progress.tasks[task_id].completed < 100 and elapsed < 60: progress.update(task_id, completed=min(95, elapsed * 1.5)) await asyncio.sleep(0.5) elapsed = time.time() - start_time progress.update(task_id, completed=100) # Track cost if available if isinstance(result, dict) and "total_refinement_cost" in result: tracker.add_generic_cost( cost=result.get("total_refinement_cost", 0.0), description=f"{provider}/{model} refinement of {selected_tool}", provider=provider, model=model ) # Store result for comparison model_results[(provider, model)] = { "result": result, "processing_time": time.time() - start_time, "cost": result.get("total_refinement_cost", 0.0) if isinstance(result, dict) else 0.0 } except Exception as e: progress.update(task_id, completed=100, description=f"[bold red]{model} failed!") logger.error(f"Error during refinement with {provider}/{model}: {e}", emoji_key="error", exc_info=True) console.print(f"[bold red]Error during refinement with {provider}/{model}:[/bold red] {escape(str(e))}") # Compare and display results if model_results: console.print(Rule("[bold blue]Model Comparison Results[/bold blue]", style="blue")) # Create comparison table comparison_table = Table(title="Model Performance Comparison", box=box.ROUNDED) comparison_table.add_column("Model", style="cyan") comparison_table.add_column("Initial Success", style="dim yellow") comparison_table.add_column("Final Success", style="green") comparison_table.add_column("Improvement", style="magenta") comparison_table.add_column("Processing Time", style="blue") comparison_table.add_column("Cost", style="red") for (provider, model), data in model_results.items(): result = data["result"] refined_tools = result.get("refined_tools", []) # Find the specific tool result tool_result = next((t for t in refined_tools if t.get("tool_name") == selected_tool), None) if tool_result: initial_success = tool_result.get("initial_success_rate", 0.0) final_success = tool_result.get("final_success_rate", 0.0) improvement = tool_result.get("improvement_factor", 0.0) comparison_table.add_row( f"{provider}/{model}", f"{initial_success:.1%}", f"{final_success:.1%}", f"{improvement:.2f}x", f"{data['processing_time']:.2f}s", f"${data['cost']:.6f}" ) console.print(comparison_table) # Find the best model best_model = None best_improvement = -1 for (provider, model), data in model_results.items(): result = data["result"] refined_tools = result.get("refined_tools", []) tool_result = next((t for t in refined_tools if t.get("tool_name") == selected_tool), None) if tool_result: improvement = tool_result.get("improvement_factor", 0.0) if improvement > best_improvement: best_improvement = improvement best_model = (provider, model) if best_model: console.print(f"[bold green]Best model:[/bold green] [cyan]{best_model[0]}/{best_model[1]}[/cyan] with {best_improvement:.2f}x improvement") # Show detailed results for the best model best_data = model_results[best_model] console.print("\n[bold cyan]Detailed Results for Best Model:[/bold cyan]") display_refinement_result( best_data["result"], console=console, visualization_level=SETTINGS["visualization_level"], save_to_file=SETTINGS["save_results"], output_dir=SETTINGS["output_dir"] ) return model_results else: console.print("[yellow]No results available for comparison.[/yellow]") return None async def demo_cost_optimization( gateway: Gateway, tracker: CostTracker, target_tool: Optional[str] = None ): """Demonstrate cost optimization techniques for documentation refinement.""" console.print(Rule("[bold cyan]Cost Optimization Techniques[/bold cyan]", style="cyan")) # Choose a single tool to refine selected_tool = None if target_tool: # Check if specified tool exists tool_list = await gateway.mcp.list_tools() available_tools = [t.name for t in tool_list] if target_tool in available_tools: selected_tool = target_tool else: logger.warning(f"Specified tool '{target_tool}' not found", emoji_key="warning") console.print(f"[yellow]Warning:[/yellow] Specified tool '{target_tool}' not found. Selecting automatically.") # Auto-select if needed if not selected_tool: medium_tools = await get_suitable_tools(gateway.mcp, count=1, complexity="medium") if medium_tools: selected_tool = medium_tools[0] else: # Fall back to any available tool simple_tools = await get_suitable_tools(gateway.mcp, count=1, complexity="simple") if simple_tools: selected_tool = simple_tools[0] if not selected_tool: logger.error("No suitable tools found for cost optimization demo", emoji_key="error") console.print("[bold red]Error:[/bold red] No suitable tools found for cost optimization demo.") return console.print(f"Selected tool for cost optimization: [cyan]{selected_tool}[/cyan]") # Create a table of optimization techniques optimization_table = Table(title="Cost Optimization Techniques", box=box.SIMPLE_HEAD) optimization_table.add_column("Technique", style="cyan") optimization_table.add_column("Description", style="white") optimization_table.add_column("Est. Savings", style="green") optimization_table.add_row( "Smaller Models", "Use smaller, faster models for initial iterations or simple tools", "50-80%" ) optimization_table.add_row( "Reduced Iterations", "Single iteration can capture most improvements", "30-60%" ) optimization_table.add_row( "Basic Validation", "Use 'basic' validation level instead of 'full'", "10-20%" ) optimization_table.add_row( "Focused Strategies", "Custom test generation focused on important cases", "20-40%" ) optimization_table.add_row( "Bulk Processing", "Refine multiple related tools at once", "30-50%" ) optimization_table.add_row( "Skip Winnowing", "Disable winnowing for quick improvements", "5-10%" ) console.print(optimization_table) # Define and display standard vs. optimized configurations standard_config = { "tool_names": [selected_tool], "max_iterations": 3, "refinement_model_config": { "provider": Provider.OPENAI.value, "model": "gpt-4.1", "temperature": 0.2, }, "validation_level": "full", "enable_winnowing": True } optimized_config = { "tool_names": [selected_tool], "max_iterations": 1, "refinement_model_config": { "provider": Provider.OPENAI.value, "model": "gpt-4.1-mini", "temperature": 0.3, }, "validation_level": "basic", "enable_winnowing": False, # Focused test generation to save costs "generation_config": { "positive_required_only": 2, "positive_optional_mix": 2, "negative_type": 2, "negative_required": 1, "negative_enum": 0, "negative_format": 0, "negative_range": 0, "negative_length": 0, "negative_pattern": 0, "edge_empty": 0, "edge_null": 0, "edge_boundary_min": 0, "edge_boundary_max": 0, "llm_realistic_combo": 2, "llm_ambiguity_probe": 1, "llm_simulation_based": 0 } } # Compare costs standard_est_cost = 0.09 # 3 iterations with gpt-4.1 optimized_est_cost = 0.015 # 1 iteration with gpt-4.1-mini and reduced tests savings_pct = ((standard_est_cost - optimized_est_cost) / standard_est_cost) * 100 console.print(Panel( Group( "[bold]Standard Config:[/bold]", Syntax(json.dumps(standard_config, indent=2), "json", theme="default", line_numbers=False), f"[yellow]Estimated Cost: ${standard_est_cost:.3f}[/yellow]", "\n[bold]Optimized Config:[/bold]", Syntax(json.dumps(optimized_config, indent=2), "json", theme="default", line_numbers=False), f"[green]Estimated Cost: ${optimized_est_cost:.3f}[/green]", f"\n[bold cyan]Estimated Savings: {savings_pct:.1f}%[/bold cyan]" ), title="Cost Comparison", border_style="dim cyan", expand=False )) # Run the optimized configuration console.print("\n[bold cyan]Running Cost-Optimized Refinement:[/bold cyan]") detail_console.print(f"\n[bold]Starting cost-optimized refinement for {selected_tool}...[/bold]") # Add progress callback optimized_config["progress_callback"] = display_refinement_progress with Progress( TextColumn("[bold blue]{task.description}"), BarColumn(complete_style="green", finished_style="green"), TaskProgressColumn(), TimeElapsedColumn(), console=console, expand=True ) as progress: task_id = progress.add_task("[cyan]Running cost-optimized refinement...", total=100) # Execute the refinement start_time = time.time() try: result = await gateway.mcp.call_tool("refine_tool_documentation", optimized_config) # Simulate progress updates elapsed = 0 while progress.tasks[task_id].completed < 100 and elapsed < 30: progress.update(task_id, completed=min(95, elapsed * 3)) # Faster progress for optimized mode await asyncio.sleep(0.5) elapsed = time.time() - start_time progress.update(task_id, completed=100) # Track cost if available if isinstance(result, dict) and "total_refinement_cost" in result: actual_cost = result.get("total_refinement_cost", 0.0) tracker.add_generic_cost( cost=actual_cost, description=f"Cost-optimized refinement of {selected_tool}", provider=optimized_config["refinement_model_config"]["provider"], model=optimized_config["refinement_model_config"]["model"] ) # Compare estimated vs actual cost console.print("[bold cyan]Cost Analysis:[/bold cyan]") console.print(f"Estimated Cost: ${optimized_est_cost:.3f}") console.print(f"Actual Cost: ${actual_cost:.3f}") console.print(f"Actual Savings vs. Standard: {((standard_est_cost - actual_cost) / standard_est_cost) * 100:.1f}%") # Display the results display_refinement_result( result, console=console, visualization_level=SETTINGS["visualization_level"], save_to_file=SETTINGS["save_results"], output_dir=SETTINGS["output_dir"] ) return result except Exception as e: progress.update(task_id, completed=100, description="[bold red]Refinement failed!") logger.error(f"Error during cost-optimized refinement: {e}", emoji_key="error", exc_info=True) console.print(f"[bold red]Error during cost-optimized refinement:[/bold red] {escape(str(e))}") return None async def demo_practical_testing( gateway: Gateway, tracker: CostTracker ): """Demonstrate practical testing with flawed examples.""" console.print(Rule("[bold cyan]Practical Testing with Flawed Tools[/bold cyan]", style="cyan")) # Check if we have flawed example tools created_tools = await create_flawed_example_tools(gateway.mcp) if not created_tools: logger.error("Failed to create flawed example tools", emoji_key="error") console.print("[bold red]Error:[/bold red] Failed to create flawed example tools for demonstration.") return console.print(f"Created {len(created_tools)} flawed example tools for practical testing:\n" + "\n".join([f"- [cyan]{name}[/cyan]" for name in created_tools])) # Get details on the intentional flaws flaws_table = Table(title="Intentional Documentation Flaws", box=box.ROUNDED) flaws_table.add_column("Tool", style="cyan") flaws_table.add_column("Flaw Type", style="yellow") flaws_table.add_column("Description", style="white") flaws_table.add_row( "flawed_process_text", "Ambiguous Description", "Description is vague and doesn't explain parameters." ) flaws_table.add_row( "flawed_scrape_website", "Missing Parameter Descriptions", "Parameters in schema have no descriptions." ) flaws_table.add_row( "flawed_data_processor", "Confusing Schema & Description Mismatch", "Description calls the tool 'analyzer' but name is 'processor'." ) flaws_table.add_row( "flawed_product_search", "Misleading Examples", "Example shows incorrect parameter name 'sort_by' vs schema 'sort'." ) flaws_table.add_row( "flawed_calculator", "Schema/Implementation Conflict", "Clear description but possible schema type confusion." ) console.print(flaws_table) # Select a flawed tool to demonstrate refinement selected_tool = created_tools[0] # Start with the first one console.print(f"\nSelected tool for demonstration: [cyan]{selected_tool}[/cyan]") # Show the original flawed tool definition tool_list = await gateway.mcp.list_tools() tool_def = next((t for t in tool_list if t.name == selected_tool), None) if tool_def and hasattr(tool_def, "inputSchema") and hasattr(tool_def, "description"): input_schema = getattr(tool_def, "inputSchema", {}) description = getattr(tool_def, "description", "") console.print("[bold cyan]Original Flawed Tool Definition:[/bold cyan]") console.print(Panel( escape(description), title="Original Description", border_style="dim red", expand=False )) console.print(Panel( Syntax(json.dumps(input_schema, indent=2), "json", theme="default", line_numbers=False), title="Original Schema", border_style="dim red", expand=False )) # Run refinement on the flawed tool console.print("\n[bold cyan]Running Refinement on Flawed Tool:[/bold cyan]") detail_console.print(f"\n[bold]Starting refinement for flawed tool {selected_tool}...[/bold]") params = { "tool_names": [selected_tool], "max_iterations": 2, "refinement_model_config": { "provider": Provider.OPENAI.value, "model": "gpt-4.1", # Use the best model for these challenging cases "temperature": 0.2, }, "validation_level": "full", "enable_winnowing": True, "progress_callback": display_refinement_progress, } with Progress( TextColumn("[bold blue]{task.description}"), BarColumn(complete_style="green", finished_style="green"), TaskProgressColumn(), TimeElapsedColumn(), console=console, expand=True ) as progress: task_id = progress.add_task("[cyan]Refining flawed tool...", total=100) # Execute the refinement start_time = time.time() try: result = await gateway.mcp.call_tool("refine_tool_documentation", params) # Simulate progress updates elapsed = 0 while progress.tasks[task_id].completed < 100 and elapsed < 60: progress.update(task_id, completed=min(95, elapsed * 1.5)) await asyncio.sleep(0.5) elapsed = time.time() - start_time progress.update(task_id, completed=100) # Track cost if available if isinstance(result, dict) and "total_refinement_cost" in result: tracker.add_generic_cost( cost=result.get("total_refinement_cost", 0.0), description=f"Flawed tool refinement of {selected_tool}", provider=Provider.OPENAI.value, model="gpt-4.1" ) # Display the results display_refinement_result( result, console=console, visualization_level=SETTINGS["visualization_level"], save_to_file=SETTINGS["save_results"], output_dir=SETTINGS["output_dir"] ) # Highlight identified flaws refined_tools = result.get("refined_tools", []) target_tool_result = next((t for t in refined_tools if t.get("tool_name") == selected_tool), None) if target_tool_result: identified_flaws = [] for iter_data in target_tool_result.get("iterations", []): analysis = iter_data.get("analysis", {}) if analysis: flaws = analysis.get("identified_flaw_categories", []) for flaw in flaws: if flaw not in identified_flaws: identified_flaws.append(flaw) if identified_flaws: console.print("\n[bold cyan]Identified Documentation Flaws:[/bold cyan]") flaw_details = { "MISSING_DESCRIPTION": "Documentation is missing key information", "AMBIGUOUS_DESCRIPTION": "Description is unclear or can be interpreted in multiple ways", "INCORRECT_DESCRIPTION": "Description contains incorrect information", "MISSING_SCHEMA_CONSTRAINT": "Schema is missing important constraints", "INCORRECT_SCHEMA_CONSTRAINT": "Schema contains incorrect constraints", "OVERLY_RESTRICTIVE_SCHEMA": "Schema is unnecessarily restrictive", "TYPE_CONFUSION": "Parameter types are inconsistent or unclear", "MISSING_EXAMPLE": "Documentation lacks necessary examples", "MISLEADING_EXAMPLE": "Examples provided are incorrect or misleading", "INCOMPLETE_EXAMPLE": "Examples are present but insufficient", "PARAMETER_DEPENDENCY_UNCLEAR": "Dependencies between parameters are not explained", "CONFLICTING_CONSTRAINTS": "Schema contains contradictory constraints", "AGENT_FORMULATION_ERROR": "Documentation hinders LLM agent's ability to use the tool", "SCHEMA_PREVALIDATION_FAILURE": "Schema validation issues", "TOOL_EXECUTION_ERROR": "Issues with tool execution", "UNKNOWN": "Unspecified documentation issue" } for flaw in identified_flaws: console.print(f"- [bold yellow]{flaw}[/bold yellow]: {flaw_details.get(flaw, 'No description available')}") return result except Exception as e: progress.update(task_id, completed=100, description="[bold red]Refinement failed!") logger.error(f"Error during flawed tool refinement: {e}", emoji_key="error", exc_info=True) console.print(f"[bold red]Error during flawed tool refinement:[/bold red] {escape(str(e))}") return None async def main(): """Main entry point for the demo.""" try: print("Starting demo...") logger.debug("Starting demo...") args = parse_arguments() print(f"Args parsed: {args}") logger.debug(f"Args parsed: {args}") # Set up gateway print("Setting up gateway...") gateway = await setup_gateway_and_tools(create_flawed_tools=args.create_flawed) # noqa: F841 print("Gateway setup complete") # Initialize cost tracker tracker = CostTracker(limit=SETTINGS["cost_limit"]) # Check if the tool was successfully registered print("Checking if tool is registered...") tool_list = await mcp.list_tools() available_tools = [t.name for t in tool_list] print(f"Available tools: {available_tools}") if "refine_tool_documentation" in available_tools: print("Tool is available, proceeding with demo") logger.info("Tool successfully registered, proceeding with demo", emoji_key="success") # Run the selected demo based on CLI arguments print(f"Running demo: {args.demo}") # Select a demo based on specified arguments if args.demo == "single" or args.demo == "all": print("Running single tool refinement demo") result = await demo_single_tool_refinement( gateway, tracker, target_tool=args.tool, refinement_provider=args.provider, refinement_model=args.model, max_iterations=args.iterations ) if result: logger.success("Single tool refinement demo completed", emoji_key="success") elif args.demo == "multi": print("Running multi-tool refinement demo") result = await demo_multi_tool_refinement( gateway, tracker, target_tools=[args.tool] if args.tool else None, refinement_provider=args.provider, refinement_model=args.model, max_iterations=args.iterations ) if result: logger.success("Multi-tool refinement demo completed", emoji_key="success") elif args.demo == "custom-testing": print("Running custom test generation demo") result = await demo_custom_test_generation( gateway, tracker, target_tool=args.tool, refinement_provider=args.provider, refinement_model=args.model, max_iterations=args.iterations ) if result: logger.success("Custom test generation demo completed", emoji_key="success") elif args.demo == "optimize": print("Running cost optimization demo") result = await demo_cost_optimization( gateway, tracker, target_tool=args.tool ) if result: logger.success("Cost optimization demo completed", emoji_key="success") elif args.demo == "all-tools": print("Running all-tools refinement demo") result = await demo_all_tools_refinement( gateway, tracker, refinement_provider=args.provider, refinement_model=args.model, max_iterations=args.iterations ) if result: logger.success("All-tools refinement demo completed", emoji_key="success") elif args.demo == "schema-focus": print("Running schema-focused refinement demo") result = await demo_schema_focused_refinement( gateway, tracker, target_tool=args.tool, refinement_provider=args.provider, refinement_model=args.model ) if result: logger.success("Schema-focused refinement demo completed", emoji_key="success") elif args.demo == "practical": print("Running practical testing demo") result = await demo_practical_testing(gateway, tracker) if result: logger.success("Practical testing demo completed", emoji_key="success") elif args.demo == "model-comparison": print("Running model comparison demo") result = await demo_model_comparison( gateway, tracker, target_tool=args.tool ) if result: logger.success("Model comparison demo completed", emoji_key="success") elif args.demo == "all": print("Running all demos") console.print(Panel( "Running all demos in sequence. This may take some time.", title="ℹ️ Running All Demos", border_style="cyan", expand=False )) # Run each demo in sequence demos = [ demo_single_tool_refinement(gateway, tracker, target_tool=args.tool, refinement_provider=args.provider, refinement_model=args.model, max_iterations=args.iterations), demo_multi_tool_refinement(gateway, tracker, refinement_provider=args.provider, refinement_model=args.model, max_iterations=args.iterations), demo_custom_test_generation(gateway, tracker, target_tool=args.tool, refinement_provider=args.provider, refinement_model=args.model), demo_cost_optimization(gateway, tracker, target_tool=args.tool), demo_schema_focused_refinement(gateway, tracker, target_tool=args.tool, refinement_provider=args.provider, refinement_model=args.model), demo_model_comparison(gateway, tracker, target_tool=args.tool) ] if args.create_flawed: demos.append(demo_practical_testing(gateway, tracker)) for demo_coro in demos: try: await demo_coro except Exception as e: logger.error(f"Error running demo: {e}", emoji_key="error", exc_info=True) console.print(f"[bold red]Error running demo:[/bold red] {escape(str(e))}") logger.success("All demos completed", emoji_key="success") else: print("No valid demo specified") console.print(Panel( f"The specified demo '{args.demo}' is not recognized.\n" "Available demos: all, single, multi, custom-testing, optimize, all-tools, schema-focus, practical, model-comparison", title="⚠️ Invalid Demo Selection", border_style="yellow", expand=False )) else: print("Tool is not available") # Tool not available, show error message console.print(Panel( "This demo requires the docstring_refiner tool to be properly registered.\n" "Due to known issues with Pydantic definitions, the tool can't be registered in this demo.\n\n" "Check that you have the correct version of the Ultimate MCP Server and dependencies installed.", title="⚠️ Demo Requirements Not Met", border_style="red", expand=False )) # Display cost summary console.print(Rule("[bold green]Total Demo Cost Summary[/bold green]", style="green")) tracker.display_costs(console=console) logger.info("Docstring Refiner Demo completed successfully", emoji_key="success") console.print(Rule("[bold green]Demo Complete[/bold green]", style="green")) print("Demo completed successfully") except Exception as e: print(f"Error in main: {type(e).__name__}: {str(e)}") import traceback traceback.print_exc() return 1 return 0 if __name__ == "__main__": exit_code = asyncio.run(main()) sys.exit(exit_code)

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Dicklesworthstone/llm_gateway_mcp_server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server