Server Configuration
Describes the environment variables required to run the server.
| Name | Required | Description | Default |
|---|---|---|---|
| INSPECT_LOG_DIR | No | Directory containing .eval log files | ./logs |
| INSPECT_LOGS_MCP_MAX_LIMIT | No | Maximum number of logs limit | 500 |
| INSPECT_LOGS_MCP_DEFAULT_LIMIT | No | Default number of logs limit | 50 |
Capabilities
Features and capabilities supported by this server
| Capability | Details |
|---|---|
| tools | {
"listChanged": false
} |
| prompts | {
"listChanged": false
} |
| resources | {
"subscribe": false,
"listChanged": false
} |
| experimental | {} |
Tools
Functions exposed to the LLM to take actions
| Name | Description |
|---|---|
| tool_list_logs | List available evaluation log files with metadata. Lists all .eval log files in the specified directory, sorted by date (newest first). Returns task name, model, status, sample count, and other metadata for each log. Args: log_dir: Directory containing log files. Defaults to INSPECT_LOG_DIR env var or ./logs limit: Maximum number of logs to return (default: INSPECT_LOGS_MCP_DEFAULT_LIMIT or 50, max: INSPECT_LOGS_MCP_MAX_LIMIT or 500) offset: Number of logs to skip for pagination (default: 0) |
| tool_get_eval_summary | Get comprehensive evaluation summary including header, results, and stats. Returns full metadata about an evaluation run: task info, model configuration, scoring results, token usage, duration, and revision info. Args: log_file: Path to log file (absolute or relative to log_dir) log_dir: Optional log directory for relative paths |
| tool_get_sample | Get detailed sample data including full conversation history. Returns the complete sample including input, target, all messages exchanged with the model, output, scores, and metadata. Args: log_file: Path to log file (absolute or relative to log_dir) sample_id: Sample ID to retrieve epoch: Epoch number (default: 1) log_dir: Optional log directory for relative paths include_events: Include event transcript (default: False, can be verbose) |
| tool_search_logs | Search and filter evaluation logs by various criteria. Supports filtering by task name, model, status, date range, and minimum sample count. Task and model filters support wildcards (e.g., 'mind2web*', 'google/*'). Args: log_dir: Directory containing log files task: Filter by task name (supports wildcards like 'mind2web*') model: Filter by model name (supports wildcards like 'google/*') status: Filter by status: 'success', 'error', 'cancelled' date_from: Filter logs from this date (ISO format: YYYY-MM-DD) date_to: Filter logs until this date (ISO format: YYYY-MM-DD) min_samples: Minimum sample count limit: Maximum results (default: INSPECT_LOGS_MCP_DEFAULT_LIMIT or 50, max: INSPECT_LOGS_MCP_MAX_LIMIT or 500) |
| tool_compare_runs | Compare metrics between two evaluation runs. Shows side-by-side comparison of two evaluation runs including task/model info, sample counts, score differences, token usage differences, and duration. Args: log_file_a: Path to first log file log_file_b: Path to second log file log_dir: Optional log directory for relative paths |
| tool_get_aggregate_stats | Get aggregate statistics across multiple evaluation runs. Provides summary statistics grouped by task and model, including success rates, sample counts, token usage totals, and duration averages. Args: log_dir: Directory containing log files task: Filter by task name (supports wildcards) model: Filter by model name (supports wildcards) date_from: Filter logs from this date (ISO format) date_to: Filter logs until this date (ISO format) |
Prompts
Interactive templates invoked by user choice
| Name | Description |
|---|---|
No prompts | |
Resources
Contextual data attached and managed by the client
| Name | Description |
|---|---|
No resources | |