Skip to main content
Glama
bkuri
by bkuri

Server Quality Checklist

50%
Profile completionA complete profile improves this server's visibility in search results.
  • This repository includes a README.md file.

  • This repository includes a LICENSE file.

  • Latest release: v1.2.0

  • No tool usage detected in the last 30 days. Usage tracking helps demonstrate server value.

    Tip: use the "Try in Browser" feature on the server page to seed initial usage.

  • Add a glama.json file to provide metadata about your server.

  • This server provides 69 tools. View schema
  • No known security issues or vulnerabilities reported.

    Report a security issue

  • Are you the author?

  • Add related servers to improve discoverability.

Tool Scores

  • Behavior1/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, yet the description discloses no behavioral traits. It does not indicate computational cost, whether results are cached, side effects, or what the output schema contains. The description carries the full burden and fails completely.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    At four words, the description is technically brief, but this represents under-specification rather than virtuous conciseness. For a tool with 9 parameters including nested objects, this length is inappropriately small.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness1/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the complexity (9 params, nested objects, 3 required fields) and lack of schema descriptions, the description is radically incomplete. It offers no hint about what constitutes valid input for the required object parameters.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters1/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 0%. The description provides zero compensation for this gap, leaving all 9 parameters undocumented—particularly the three required object parameters (pair, backtest_result_1, backtest_result_2) which have 'additionalProperties: true' and no schema constraints.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose2/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description 'Backtest pairs trading strategies' essentially restates the tool name (tautology). While it identifies the domain (pairs trading), it fails to distinguish this tool from siblings like backtest_comprehensive, backtest_monte_carlo, or backtesting_run.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives. Given the numerous backtesting siblings (backtest_comprehensive, backtesting_run, etc.), the absence of selection criteria is a critical gap.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Mentions the analytical goal (detect overfitting) but lacks critical behavioral details: no explanation of the rolling window mechanism (in-sample/out-of-sample), computational intensity, or whether results are cached. No annotations exist to provide this context, so the description bears full responsibility.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While brief (single sentence), it is underspecified rather than efficiently concise. With zero front-loading of critical constraints or parameter logic, the brevity represents a lack of information rather than disciplined editing.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness1/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Despite having an output schema, the description is inadequate for a complex 10-parameter analytical tool. The complete absence of input parameter documentation (0% schema coverage) and lack of behavioral context leave the agent without sufficient information to construct valid invocations.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters1/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 0% and the description fails to compensate, remaining silent on all 10 parameters. Critical parameters like 'in_sample_period', 'out_sample_period', 'step_forward', and 'param_space' receive no explanation despite being essential to the walk-forward methodology.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    States the domain-specific technique (walk-forward analysis) and goal (detect overfitting), but uses weak verb 'Perform' and fails to distinguish from sibling tools like 'optimization_run' or 'optimization_analyze' that also handle strategy optimization.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines1/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Provides no guidance on when to select this tool over alternatives such as 'backtest_monte_carlo', 'optimization_run', or 'optimization_analyze'. No mention of prerequisites or specific use cases requiring walk-forward validation.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden of behavioral disclosure. While it mentions 'transitions' implying change detection, it fails to explain the detection methodology (despite 'detection_method' being a parameter), computational complexity, or whether the operation is deterministic. The mention of 'transitions' provides minimal value beyond the tool name.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness2/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While the single sentence is not verbose, it is inappropriately brief for a 6-parameter analytical tool. The description suffers from underspecification rather than efficient communication—no front-loaded critical context is provided, and the sentence does not earn its place given the schema complexity it must support.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the tool accepts complex nested backtest data and has 6 configuration parameters with 0% inline documentation, the description is inadequate. Although an output schema exists (reducing the need for return value description), the complete absence of input parameter guidance and lack of domain context (pairs trading) leaves significant gaps.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters1/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 0% schema description coverage across all 6 parameters, the description completely fails to compensate. It does not explain the required 'backtest_results' input format (array of objects), what 'n_regimes' controls, available 'detection_method' options beyond the 'hmm' default, or the implications of the boolean flags. This is a critical gap for a tool with complex nested inputs.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description uses a specific verb ('Identify') and resource ('market regimes and transitions'), but fails to contextualize this within the pairs trading domain implied by the tool name 'pairs_regimes'. Crucially, it does not differentiate from the sibling tool 'backtest_analyze_regimes', leaving ambiguity about which regime detection tool to use.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives like 'backtest_analyze_regimes' or 'pairs_correlation'. It does not mention prerequisites (e.g., that backtest_results must be generated first) or when regime detection is appropriate versus other analysis methods.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries full responsibility for behavioral disclosure. It states 'Decompose' but fails to explain the methodology (regression-based? PCA?), computational intensity, whether results are cached, or what the decomposition produces beyond implying factor loadings and residuals via the parameter names.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    While the single sentence avoids verbosity, it is underweight for a tool with 6 undocumented parameters, nested objects, and an output schema. Critical context is missing not because of efficient editing but because of insufficient specification.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the complexity (financial factor analysis, statistical decomposition, 6 parameters with 0% schema coverage), the description is inadequate. It does not leverage the existence of an output schema to justify brevity regarding returns, nor does it explain the input requirements sufficiently.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters1/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 0% for 6 complex parameters including nested objects (backtest_result, factor_returns). The description completely fails to compensate, mentioning none of the parameters (e.g., include_residuals, confidence_level, analysis_period) or their semantic relationships.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description identifies the core action ('Decompose') and domain ('systematic factors'), but fails to contextualize it within the pairs trading ecosystem suggested by the tool name and sibling tools like pairs_backtest. It also omits that this operates on backtest results despite requiring a 'backtest_result' parameter.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this versus similar analysis tools like backtest_analyze_regimes or pairs_regimes. No prerequisites are mentioned (e.g., that a backtest must be run first), nor are there any exclusion criteria or alternative suggestions.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries full burden but discloses nothing about computational intensity (10k simulations default), side effects (storage of results), idempotency, or required preprocessing steps for the 'backtest_result' parameter.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The single sentence is appropriately brief, but suffers from under-specification rather than efficient information density. It wastes the opportunity to front-load critical constraints or prerequisites.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a complex statistical tool with nested objects and multiple configuration options, the description is inadequate. While an output schema exists (reducing the need for return value documentation), the lack of parameter semantics or behavioral context leaves significant gaps.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters1/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 0%, and the description adds no meaning for any of the 7 parameters (e.g., what 'block_size' controls, valid 'resample_method' values, or the structure of required 'backtest_result'). Complete failure to compensate for schema deficiency.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states the core action ('Generate Monte Carlo simulations') and domain ('risk analysis'), but uses vague qualifiers ('comprehensive') and critically fails to distinguish from sibling tool 'backtest_monte_carlo', which likely performs similar functionality on backtest data.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus alternatives like 'backtest_monte_carlo', 'risk_stress_test', or 'risk_var'. Given the overlapping sibling names, explicit selection criteria are essential but absent.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure. It does not clarify whether this is a read-only operation, what computational resources it consumes (e.g., Monte Carlo simulations), or that it aggregates multiple risk analyses. The agent cannot determine safety characteristics from the description alone.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The single sentence is concise, but given the complexity (6 parameters, numerous siblings, nested objects), it is underspecified. The sentence does not earn its place by conveying sufficient information for agent decision-making.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Despite having an output schema that excuses return value description, the tool has 6 undocumented parameters (0% schema coverage) and operates in a crowded namespace of risk tools. The description inadequately addresses this complexity, leaving critical gaps in understanding the tool's scope and inputs.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters1/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 0%, requiring the description to compensate for 6 parameters including complex toggles (include_monte_carlo, include_var_analysis) and a required nested object (backtest_result). The description mentions none of these parameters or their semantics.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states the tool generates risk assessments and recommendations, providing a verb and resource. However, it fails to distinguish this tool from specific risk analysis siblings (risk_var, risk_stress_test, risk_monte_carlo) despite this being an aggregating report that can include those analyses.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this comprehensive report tool versus the individual risk analysis tools (risk_var, risk_stress_test, etc.). The description does not mention the required backtest_result input or prerequisites.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations provided, so description carries full burden. Fails to disclose what calculation methods are available (historical, parametric, Monte Carlo), computational cost implications of monte_carlo_sims=10000, or whether results are cached.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Single sentence is not verbose, but severely under-specified for a 5-parameter quantitative tool requiring complex nested objects. Conciseness becomes inadequacy given the schema complexity.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Despite having an output schema (reducing description burden for returns), the description inadequately explains input requirements. The mandatory 'backtest_result' parameter—accepting arbitrary additionalProperties—desperately needs documentation that is absent.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters1/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema coverage is 0% and description adds no parameter documentation. Critically fails to explain the required 'backtest_result' object structure, valid values for 'method' (defaults to 'all'), or semantics of confidence_levels/time_horizons arrays.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    States the specific financial metric (Value at Risk) and mentions multiple methods, but fails to differentiate from sibling tool 'risk_monte_carlo' which likely overlaps in functionality. The phrase 'multiple methods' is vague without enumeration.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Provides no guidance on when to use this tool versus alternatives like 'risk_monte_carlo', 'risk_stress_test', or 'risk_analyze_portfolio'. Given the crowded risk tool namespace, explicit selection criteria are needed.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries full disclosure burden. It mentions 'concurrently' but fails to disclose execution characteristics: whether it performs a cartesian product of symbols/timeframes/hyperparameters, failure handling (if one backtest fails, do others continue?), typical duration, or caching behavior.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Single sentence is front-loaded and non-redundant, but severely undersized given the tool complexity (7 parameters, batch operation, 0% schema coverage). The brevity harms utility rather than demonstrating efficiency.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    While an output schema exists (reducing need for return value description), the definition is incomplete for the input side. With 5 required parameters and 0% schema coverage, the description should explain parameter semantics, valid values, and the batch execution model, but provides none of this.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters1/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema has 0% description coverage (no parameter descriptions), and the description completely fails to compensate. No explanation of hyperparameters array structure, date formats, concurrent_limit implications, or how symbols/timeframes combine. Critical gap for a 7-parameter tool.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description uses specific verb 'Run' and resource 'backtests', and includes 'concurrently' which distinguishes it from sequential siblings like 'backtesting_run'. However, it fails to clarify the relationship to 'optimization_run' or explain that 'strategy' is singular (testing one strategy across multiple parameters/symbols) versus comparing different strategies.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance on when to use this vs. 'optimization_run' (single optimization) or 'backtest_optimize_parameters'. Missing prerequisites (e.g., strategy must exist in system) and no mention of resource intensity or when to reduce concurrent_limit.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations provided, so description carries full disclosure burden. States 'Download' implying external API call and data storage, but omits critical behavioral details: whether data overwrites existing candles, append behavior, rate limiting, storage location, or idempotency. Has output schema but doesn't explain the download side effects.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Single sentence of 8 words is appropriately front-loaded with the action verb. However, extreme brevity is inappropriate given zero schema documentation and lack of annotations—more explanatory content is needed to compensate for missing structured metadata.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Despite having an output schema (reducing need to describe return values), the description remains incomplete for a 4-parameter data import tool. Fails to explain the data lifecycle (where candles are stored, how they're accessed by other tools like backtesting_run), which is essential context given the sibling tool ecosystem.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters2/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema has 0% description coverage (4 undocumented parameters). Description mentions 'exchange' implicitly, but fails to specify date formats for start_date/end_date, symbol formatting conventions, or semantics of end_date being null (open-ended range?). Heavy burden on description given schema gap, which it doesn't meet.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    States specific action 'Download' and resource 'candle data' with scope 'from exchange via Jesse REST API'. Distinguishes from sibling backtesting tools by specifying data import functionality vs. analysis or execution. However, could explicitly mention this is for backtesting data preparation.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Provides no guidance on when to use this tool versus alternatives like backtesting_exchanges (which likely lists available markets). No mention of prerequisites such as exchange connectivity validation or whether data persistence is permanent/temporary.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden. While 'Read' implies a safe, non-destructive operation, the description fails to disclose error behavior (e.g., what happens if the strategy doesn't exist), return format, or whether this accesses files or a database.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely brief (four words) with zero redundancy. However, it may be overly terse given the lack of schema documentation and annotations, failing to provide sufficient information density for an agent to use the tool confidently.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Despite having an output schema (reducing the need to describe return values), the description is inadequate due to complete lack of parameter documentation and behavioral context. A tool reading source code should clarify the identifier format and access permissions.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters2/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema coverage is 0% (the 'name' parameter lacks a description). The description implies the parameter refers to a strategy name through context, but does not explicitly document what constitutes a valid name or how to obtain one. Minimal compensation for the schema deficiency.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description uses a specific verb ('Read') and resource ('strategy source code'), clearly distinguishing this tool from siblings like 'backtesting_run' or 'strategy_create'. However, it omits domain context that would clarify these are trading strategies.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this versus 'strategy_metadata' or 'backtesting_list_strategies'. Does not mention that 'backtesting_list_strategies' may be needed first to obtain valid names.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure. While 'Analyze' implies a read-only operation, the description fails to specify what constitutes 'opportunities' (e.g., correlation breakdowns, mean-reversion signals), computational intensity, or the specific correlation methodology used (Pearson, Spearman, etc.).

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The single sentence is efficiently written with no wasted words, earning its place. However, it is inappropriately sized for a 6-parameter tool with complex inputs; the brevity creates information gaps rather than effective front-loading of critical details.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the presence of 6 parameters (one required and complex), zero schema coverage, and numerous similar sibling tools, the description is incomplete. It does not explain the input data format, output structure (despite an output schema existing), or how this analysis fits into the broader pairs trading workflow.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters2/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 0% schema description coverage, the description must compensate but minimally does so. It mentions 'correlations' which provides context for correlation_threshold, but offers no guidance on the complex required parameter backtest_results (an array of objects), lookback_period units, or the purpose of include_heatmap/rolling flags.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description uses specific verbs ('Analyze', 'identify') and resources ('cross-asset correlations', 'pairs trading opportunities') that clearly convey the tool's function. It implies distinction from sibling tools like pairs_backtest and pairs_factors by focusing specifically on correlation analysis rather than backtesting or factor analysis, though it does not explicitly name alternatives.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No explicit guidance is provided on when to use this tool versus the numerous sibling alternatives (e.g., pairs_backtest for validation, pairs_factors for factor-based selection). The description lacks prerequisites, such as requiring existing backtest data, or exclusion criteria.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations provided, so description carries full burden. It mentions the return type (Dict with orders list) but fails to disclose if this is a read-only operation, error handling behavior, or whether it returns all orders or only active ones.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Uses structured docstring format (Args/Returns) which is efficient, but content is minimal with redundant 'Session ID' text. Could add more value without sacrificing brevity.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Fails to leverage sibling tools context (e.g., trading_live_sessions) to explain session_id provenance. Given simple single-parameter structure, should clarify order types returned and link to session management tools.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters2/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema has 0% description coverage. Description provides only 'Session ID' under Args, which merely repeats the parameter name without explaining format, constraints, or that this ID likely comes from trading_live_sessions.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description provides a clear verb (Get) and resource (orders) with scope (live trading session). It distinguishes from siblings like trading_live_equity or trading_live_logs by specifying orders, though it doesn't explicitly clarify the difference from trading_live_check.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this versus alternatives like trading_live_check or trading_live_status. No mention of prerequisites such as needing an active session from trading_live_sessions.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations provided, so description carries full burden. It mentions return type ('Dict with update status') and implies constraints ('safe modifications'), but fails to disclose critical mutation behaviors: whether this affects running sessions, what 'safe' specifically means, idempotency, or error handling. For an update operation, this lack of safety/disclosure context is problematic.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Uses clear docstring structure (Args/Returns) with the main purpose front-loaded in the first sentence. Appropriately brief for a 2-parameter tool. However, 'limited to safe modifications' consumes space without adding actionable detail, slightly weakening the value density.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Inadequate for a mutation tool with zero annotations and zero schema coverage. The description claims to update 'session parameters' (plural) but the schema reveals only 'notes' is modifiable—mismatch suggests incomplete specification. Missing: side effects on active sessions, validation rules, and relationship to the paper trading lifecycle.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 0% schema description coverage, the Args section provides basic compensation by documenting both 'session_id' and 'notes'. However, it adds minimal semantic value beyond naming—no format guidance for session_id, no explanation of notes' purpose or behavior when null. Sufficient to prevent total ambiguity but barely adequate.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose3/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    States the basic action (update session parameters) but omits critical domain context. Given the tool name 'trading_paper_update' and siblings like 'trading_paper_start', the description should explicitly mention 'paper trading session' rather than the generic 'session parameters'. The phrase 'limited to safe modifications' hints at constraints but remains vague.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Provides no explicit guidance on when to use this tool versus siblings like 'trading_paper_start' or 'trading_paper_stop'. The 'safe modifications' caveat is too vague to guide proper usage. No mention of prerequisites (e.g., session must exist) or error conditions.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden. While 'Estimate' and 'Analyzes' imply a read-only analytical operation, the description does not explicitly confirm safety, disclose side effects, mention computational cost, or clarify whether this modifies any strategy state.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Uses structured docstring format (Args/Returns) that efficiently conveys input and output expectations. The two summary sentences front-load the core purpose effectively. Slightly odd for MCP description format but wastes no words.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Adequate for a single-parameter analysis tool. The description conceptually covers the return value ('Impact analysis with testing recommendations'), compensating for lack of output schema details. However, missing safety annotations and usage context prevent a higher score.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 0% schema description coverage, the description compensates by documenting the single parameter as 'Description of the proposed change.' This is minimal (essentially restating the parameter name) but meets the baseline requirement for coverage where the schema fails.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the tool 'Estimate[s] the impact of a proposed strategy change' and analyzes 'benefits, risks, and testing methodology.' However, it does not differentiate from similar siblings like `strategy_refine`, `strategy_suggest_improvements`, or `optimization_analyze`.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance provided on when to use this tool versus the numerous alternatives (e.g., `strategy_optimize_pair_selection`, `optimization_analyze`). No prerequisites or conditions are mentioned despite the presence of many related strategy and optimization tools.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure but fails to state whether this is a read-only operation, whether it triggers new backtests or reads existing results, or what constitutes valid 'strategy' identifiers. The mention of 'Returns: Comparative analysis' hints at output but lacks operational context.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description uses a structured Args/Returns format that is appropriately sized and front-loaded with the core purpose. However, the parameter descriptions ('First strategy name') are somewhat tautological, slightly reducing the value of each sentence.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a two-parameter tool with an existing output schema, the description covers basic functionality but leaves significant gaps regarding domain specifics—particularly what qualifies as a valid strategy name and whether the comparison requires prior backtesting data. The presence of an output schema excuses detailed return value explanation.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Given 0% schema description coverage, the Args section compensates by labeling 'strategy1' as 'First strategy name' and 'strategy2' as 'Second strategy name.' While this confirms the parameters represent strategy identifiers, it offers minimal domain context (e.g., whether names are case-sensitive, must exist in a specific registry, or follow a particular format).

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the tool 'Compare[s] two strategies to identify best practices' and 'Analyzes performance differences,' providing specific verbs and resources. However, it does not explicitly differentiate from similar sibling tools like 'backtest_compare_timeframes' or 'strategy_analyze_optimization_impact,' leaving potential ambiguity about which comparison tool to select.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives (e.g., 'strategy_analyze_optimization_impact' for single-strategy analysis), nor does it mention prerequisites such as whether strategies must exist or have been backtested prior to comparison.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations provided, so description carries full burden. Only discloses return type ('Dict with log entries'). Fails to mention if this retrieves real-time streaming logs or historical batches, pagination behavior, or performance implications of fetching 'all' logs.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Front-loaded one-sentence purpose followed by structured Args/Returns sections. While the docstring format is slightly redundant given schema/output_schema exist, it is justified here due to 0% schema coverage. No extraneous text.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Adequate for a 2-parameter tool with output schema, but lacks important trading-specific context: whether logs are persistent, if this affects session performance, or the relationship between live vs paper trading logs (given trading_live_paper exists as sibling).

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema coverage is 0%, requiring description to compensate. The Args section provides critical enum values for log_type ('all', 'info', 'error', 'warning') not present in the schema. However, session_id description ('Session ID') is tautological and provides no format guidance.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    Clear verb-resource combination ('Get logs from a live trading session') that identifies the specific resource. Distinguishes from other trading_live_* siblings (orders, equity, status) by specifying 'logs', but does not differentiate from logs_analyze_history or logs_strategy_performance.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance on when to use this tool versus alternatives like logs_analyze_history. No mention of prerequisites (e.g., requiring an active session_id from trading_live_sessions) or when not to use (e.g., for backtesting logs).

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure. It partially compensates by listing specific returned metrics (total_return, sharpe_ratio, max_drawdown, win_rate), but fails to disclose operational traits such as whether the operation is read-only, computationally expensive, or requires specific session states.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description uses a structured docstring format (Args/Returns) that efficiently organizes information. The first sentence front-loads the purpose, and there is minimal redundancy or unnecessary verbosity despite the structured sections.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a single-parameter tool with an existing output schema, the description adequately covers the basic contract by listing representative metrics. However, it lacks completeness regarding error conditions (e.g., invalid session_id) and behavioral constraints (e.g., session lifecycle requirements).

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters2/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 0% schema description coverage, the description must fully compensate for the lack of parameter documentation. It provides only the tautological label 'Session ID' for session_id, without clarifying the expected format, how to obtain it (e.g., from trading_paper_start), or that it specifically refers to a paper trading session ID.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the tool 'Get[s] calculated performance metrics for a paper trading session,' specifying the verb, resource, and domain. It distinguishes itself from sibling tools like trading_paper_trades or trading_paper_equity_history by emphasizing 'calculated performance metrics' (aggregated statistics) rather than raw trade lists or equity curves.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives like trading_paper_equity_history or trading_paper_status, nor does it specify prerequisites such as whether the session must be active or completed. No workflow context is provided.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries full responsibility for behavioral disclosure. It states what the tool does functionally but omits critical operational details: whether results are persisted to disk, computational intensity/duration expectations, caching behavior, or side effects. 'Tests performance' describes output, not behavior.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description uses a clear docstring-style structure with Args and Returns sections. It is appropriately sized without redundant prose. The key information (parameter formats and examples) is easily scannable, though the Returns section is vague.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the presence of an output schema, the description appropriately avoids detailing return values. Parameter documentation is complete despite zero schema coverage. However, for a complex analytical tool with many siblings, the description lacks necessary context about tool selection criteria and operational constraints.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 0% description coverage, making the Args section in the description essential. It successfully documents all four parameters, providing crucial semantic context including the date format (YYYY-MM-DD) and a concrete example for the trading pair ('BTCUSDT'), effectively compensating for the schema's lack of metadata.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the core action ('Run a comprehensive backtest') and what the tool produces (performance tests, metrics, analysis). However, it fails to differentiate from siblings like 'backtesting_run' or 'backtest_monte_carlo', leaving ambiguity about what makes this specific tool 'comprehensive'.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    There is no guidance on when to use this tool versus the numerous backtesting alternatives (backtesting_run, pairs_backtest, etc.), nor any mention of prerequisites like requiring imported candles or existing strategy definitions. The description assumes the user knows when to select this specific variant.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Without annotations, the description carries the full burden and explains the simulation methodology ('tests many outcome variations') and return format ('confidence intervals'). However, it omits operational details like whether results are persisted, execution duration, or if this is a read-only computation versus a job submission.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description uses a docstring format with Args/Returns sections that efficiently organizes information, but contains redundancy between 'Perform Monte Carlo simulation for strategy robustness analysis' and 'Tests many outcome variations to assess robustness.'

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the simple 2-parameter schema and presence of an output schema, the description provides sufficient context for basic invocation. However, it misses constraints on input values (e.g., strategy name format, pair syntax) and operational context that would be expected given the lack of annotations.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 0% schema description coverage, the description provides basic semantic labels ('Strategy to analyze', 'Trading pair') that add meaning beyond the raw parameter names. However, it lacks format specifications, examples, or constraint details that would fully compensate for the schema deficiency.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the tool performs Monte Carlo simulation specifically for strategy robustness analysis, using specific verbs and identifying the domain. While it doesn't explicitly contrast with sibling tool risk_monte_carlo, the 'backtest' prefix and 'strategy' focus provide adequate differentiation.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives like backtest_comprehensive or risk_monte_carlo, nor does it mention prerequisites such as requiring an existing strategy or historical data availability.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure but fails to mention critical traits: computational cost, execution duration, whether results are cached, or side effects. While it mentions 'sensitivity analysis' in the returns section, it does not explain what behavioral characteristics the agent should expect (e.g., long-running operation, resource-intensive backtesting).

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description uses a structured Args/Returns format that efficiently conveys parameter information. The first two sentences are slightly redundant ('Find... through systematic testing' vs 'Tests... to identify'), but overall it avoids verbosity while covering the necessary parameter documentation given the schema gaps.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    The description adequately covers the input parameters (compensating for the schema) and acknowledges the return value (sensitivity analysis), which is acceptable since an output schema exists. However, for a computationally intensive optimization tool with no annotations, it lacks important contextual details like execution time expectations, error conditions, or differentiation from similar optimization utilities in the same server.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Given the schema has 0% description coverage, the description compensates effectively by documenting all 4 parameters with clear meanings and helpful examples (e.g., 'period', 'threshold' for param_name; '10-100' for param_range). This adds significant value beyond the raw schema, though it could further clarify format constraints for the range parameter.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the tool finds optimal parameter values through systematic testing and identifies best settings. It specifies the domain (parameter optimization) and method (systematic testing/range testing). However, it does not explicitly distinguish this from sibling optimization tools like 'optimization_run' or 'optimization_walk_forward', which would help clarify its specific niche.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus the numerous sibling optimization tools (optimization_run, optimization_walk_forward, strategy_analyze_optimization_impact, etc.). It lacks 'when-to-use' or 'when-not-to-use' criteria, and does not mention prerequisites like requiring an existing strategy or historical data availability.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description must carry the full burden of behavioral disclosure and succeeds partially by specifying the analytical dimensions (correlation, concentration, diversification) and that it returns recommendations. However, it omits operational characteristics such as whether the analysis is read-only, computationally expensive, or requires real-time market data. The safety profile and idempotency are not addressed.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description uses a standard docstring format with Args and Returns sections that structure the information clearly. However, the initial sentence 'Analyze risk profile of a trading portfolio' is somewhat generic and could front-load the specific risk dimensions earlier. The content is not verbose but could be more information-dense.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the tool has a single parameter and an output schema exists, the description adequately covers basic functionality but leaves gaps in sibling differentiation and input specifications. The mention of 'recommendations' in the return description adds useful context beyond the schema, though operational safety characteristics are missing given the lack of annotations.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The schema has 0% description coverage, and while the description compensates by identifying the parameter as 'List of trading pairs to analyze,' it lacks critical details such as expected string format (e.g., 'BTC-USD'), case sensitivity, or list size constraints. The semantic meaning is present but insufficient for proper usage given the schema's lack of documentation.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description specifies the tool analyzes risk profiles using clear verbs and identifies the resource as a trading portfolio. It distinguishes itself from sibling risk tools by specifying it evaluates correlation, concentration, and diversification rather than other risk metrics like VaR or drawdown. However, it does not explicitly contrast with similar risk analysis tools such as risk_var or risk_stress_test to guide selection.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives like risk_var, risk_stress_test, or risk_report. It does not mention prerequisites such as portfolio composition requirements or data availability. No exclusion criteria or usage boundaries are specified.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries full burden for behavioral disclosure. While it mentions the confirmation safety mechanism, it fails to state whether deletion is permanent/irreversible, what happens if the strategy doesn't exist, or whether associated data (logs, backtests) are also removed.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Uses a clear docstring format with Args and Returns sections. The one-line summary is front-loaded. The Returns section is minimal but acceptable since an output schema exists. Structure is readable though slightly verbose for a two-parameter tool.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Covers the essential purpose, parameters, and return type for a deletion tool. However, given this is a destructive operation with no annotations and no indication of reversibility or side effects, the description leaves significant behavioral gaps that could lead to unsafe usage.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 0% schema description coverage, the description adequately compensates by documenting both parameters: 'name' specifies it is the strategy name to delete, and 'confirm' explains the safety requirement (must be True) and default value (False).

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the tool deletes a strategy with a specific verb and resource. However, it does not distinguish from sibling tool 'strategy_create_cancel', which may have different semantics (canceling creation vs. deleting existing).

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description documents the 'confirm' parameter requirement (must be True to actually delete), but provides no guidance on when to use this tool versus alternatives like 'strategy_create_cancel', or prerequisites such as whether the strategy must be stopped first.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden of behavioral disclosure. While it mentions the return value ('Detailed improvement recommendations with expected impact'), it fails to disclose whether this is a read-only analysis, what data sources it queries (backtest vs live), computational cost, or any side effects. The phrase 'Analyzes current performance' is vague about the data source.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description follows a clear structured format with summary sentence, Args section, and Returns section. It is appropriately sized at three sentences plus structured metadata. No sentences feel wasted, though the Args/Returns sections duplicate what should ideally be in the schema (but aren't), making their inclusion here necessary rather than redundant.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given that an output schema exists (indicated by context signals), the description adequately covers the return value concept ('Detailed improvement recommendations'). However, for a 2-parameter analytical tool with no annotations, it lacks context about data requirements (does it need existing backtests?) and scope limitations, leaving gaps in the agent's understanding of prerequisites.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 0% schema description coverage, the description effectively compensates by documenting both parameters in the 'Args' section: 'strategy_name: Name of the strategy to analyze' and 'pair: Trading pair to test on (e.g., 'BTCUSDT')'. It provides a concrete example for the pair parameter, adding value beyond the raw schema types.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the tool 'Get[s] specific improvement suggestions for a trading strategy' and 'Analyzes current performance and recommends concrete changes.' It identifies the verb (suggest/analyze) and resource (trading strategy) clearly. However, it does not explicitly differentiate from similar siblings like 'strategy_refine' or 'strategy_analyze_optimization_impact', leaving some ambiguity about when to choose this over alternatives.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives like 'strategy_refine' or 'strategy_optimize_pair_selection'. It does not mention prerequisites (e.g., whether the strategy needs prior backtesting data or live trading history) or conditions where this tool might not be appropriate.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of disclosure but only mentions the return type ('Dict with cancellation status'). It fails to clarify side effects (what happens to open orders?), reversibility, or that this is a destructive/high-risk operation for live trading.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The docstring format with explicit 'Args' and 'Returns' sections is clear and appropriately sized. No sentences are wasted, though the structured format is necessary only because the schema lacks descriptions.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a 2-parameter tool, the description adequately covers inputs and return type. However, given the high-stakes nature of live trading cancellation, the omission of behavioral warnings, side effects, or error conditions leaves significant gaps.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Given 0% schema description coverage, the description successfully compensates by documenting both parameters in the Args section: session_id is identified as the 'Session ID to cancel' and paper_mode clarifies the boolean logic ('True if paper trading session, False if live').

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the specific action ('Cancel') and resource ('running live/paper trading session'). However, it does not explicitly differentiate from siblings like 'trading_paper_stop' or mention when to prefer this over other session management tools.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives (e.g., 'trading_paper_stop'), nor does it mention prerequisites such as requiring an active session or the implications of cancelling live vs paper trading.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure. While it mentions the return type ('Dict with session details'), this is redundant with the existing output schema. It fails to disclose error behaviors (e.g., session not found), rate limits, or whether this is a read-only operation.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description uses a structured docstring format (Args/Returns) that is front-loaded with the core purpose. While slightly verbose for MCP conventions, it efficiently conveys information without redundant prose.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given a single parameter, no annotations, and an existing output schema, the description covers the minimum viable information by explaining the parameter and referencing the return structure. However, it lacks richness regarding possible status values, error conditions, or session lifecycle context.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 0% description coverage. The description compensates by documenting the single parameter in the Args section: 'session_id: Session ID to check'. This adds essential semantic meaning absent from the schema, though it could further specify the ID format or source.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the tool 'Get[s] status of a specific live trading session', providing a specific verb and resource. However, it does not explicitly differentiate from similar siblings like 'trading_live_check' or 'trading_live_sessions'.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives (e.g., 'trading_live_sessions' to list all sessions or 'trading_live_check' for health checks). There are no prerequisites or exclusion criteria mentioned.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries full disclosure burden. While it states the code is not saved, it fails to describe what validation actually entails, error response formats, execution time expectations, or side effects like temporary resource usage.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is extremely concise at four words with no redundant content. However, given the lack of annotations and schema descriptions, this brevity leaves significant documentation gaps that could be addressed with one additional sentence on validation behavior.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the presence of an output schema, the description appropriately omits return value details. For a single-parameter tool, it covers the core persistence distinction, but remains incomplete regarding validation mechanics and parameter specifics given the 0% schema coverage.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 0% schema description coverage, the description must compensate entirely. It successfully maps the 'code' parameter to 'strategy code', providing crucial semantic context. However, it omits format specifications, language constraints, or examples that would fully compensate for the schema deficiency.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the verb (Validate) and resource (strategy code), and distinguishes from persistence-oriented siblings like strategy_create or backtesting_run via 'without saving'. However, it lacks specificity on what 'validate' entails (syntax check, compilation, or dry-run simulation).

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines3/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The phrase 'without saving' implicitly guides usage toward pre-persistence validation scenarios, suggesting when to use this versus tools that persist strategies. However, it lacks explicit 'when to use' guidance or named alternatives for comparison.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure but omits critical traits: it does not specify the time range (last 7 days vs. calendar week), whether the report is persisted to storage, computational cost, or idempotency. While it mentions a return value, it lacks safety profile information (read-only vs. destructive) that annotations would typically provide.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description consists of exactly two sentences with efficient front-loading: the first states the action and target, the second describes the return value. There is no redundant or wasted text; every sentence earns its place in the minimal structure.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the tool has zero parameters and an output schema exists, the description adequately covers the basic contract. However, it remains incomplete regarding operational context: it fails to define the temporal scope ('weekly' is ambiguous) or filtering criteria for included backtests, leaving gaps that could cause incorrect usage despite the simple interface.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema contains zero parameters with 100% description coverage. According to scoring guidelines, zero-parameter tools receive a baseline score of 4. The description correctly omits parameter discussion since none exist, aligning with the schema structure.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states a clear verb ('Generate') and resource ('weekly backtest activity report'), distinguishing it from sibling tools like 'logs_analyze_history' and 'logs_strategy_performance' through the specific 'weekly' temporal scope. However, it lacks specificity regarding what constitutes 'activity' (e.g., runs, errors, performance metrics).

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus sibling alternatives like 'logs_analyze_history' or 'logs_strategy_performance', nor does it mention prerequisites such as existing backtest data requirements. No explicit when-to-use or when-not-to-use conditions are present.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations provided, so description carries full burden. It discloses return format ('List of identified risks with severity levels') but omits operational context: data freshness, market coverage, caching behavior, rate limits, or whether this requires specific market data subscriptions.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Extremely efficient two-sentence structure. First sentence establishes purpose, second discloses return type. No redundant phrases or unnecessary elaboration given the simple tool signature.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Adequate for a zero-parameter tool with existing output schema, but incomplete given the dense sibling ecosystem. Lacks differentiation from similar monitoring tools ('monitor_daily_scan') and analytical risk tools ('risk_report').

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Zero parameters present with 100% schema coverage (empty object). Baseline score applies as there are no parameters requiring semantic clarification beyond the schema.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    States specific action (Identify) and resource (current market risks and warning signs). The term 'current' suggests real-time monitoring scope, distinguishing it from historical analysis tools, though it could explicitly differentiate from sibling 'risk_' analysis tools.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Provides no guidance on when to use this versus numerous risk-related siblings like 'risk_report', 'risk_var', or 'monitor_daily_scan'. No prerequisites or exclusions mentioned despite complex ecosystem of 60+ tools.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations provided, so description carries full burden. It mentions return format ('List of trading opportunities with signals and confidence') but omits safety profile, side effects, performance characteristics, or whether this triggers actual trades versus analysis-only.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Uses efficient Args/Returns docstring format. Three sentences total with zero waste, front-loaded with purpose statement. Appropriate size for the tool's complexity.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Adequate for a 2-parameter tool with output schema (structured return data available elsewhere). Missing: differentiation from monitor_daily_scan, valid strategy enumeration, and behavioral safety details. Meets minimum viable threshold but has clear gaps.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Excellent compensation for 0% schema description coverage by documenting that both parameters expect 'Comma-separated list' format. However, it does not specify valid strategy values (e.g., referencing backtesting_list_strategies) or symbol format constraints beyond the defaults shown.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    Clear verb 'Scan' with specific resource 'trading opportunities' and scope 'across symbols and strategies'. However, it does not differentiate from sibling tool 'monitor_daily_scan' which appears to have similar functionality.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No explicit guidance on when to use this tool versus alternatives like 'monitor_daily_scan' or the backtesting suite. No prerequisites or conditions mentioned.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden. It discloses what risk dimensions are evaluated (liquidation, margin, safe levels) and mentions the return type includes 'recommendations.' However, it omits whether this is read-only, computationally expensive, requires specific market data access, or has side effects.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description uses a standard docstring format with clear Args and Returns sections. It is appropriately sized at four lines of actual content, front-loaded with the core purpose, and avoids unnecessary verbosity while documenting the parameters that the schema fails to describe.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given that an output schema exists (indicated by context signals), the brief mention of return values is acceptable. The two parameters are documented, but given the tool's complexity (financial risk assessment) and lack of annotations, the description lacks guidance on prerequisites, valid input ranges, or behavioral constraints that would make it fully complete.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 0%, requiring the description to compensate. It provides 'Leverage multiplier (e.g., 2.0 for 2x)' for the leverage parameter, which is helpful, but only offers 'Strategy using leverage' for strategy_name, which is minimally informative. It partially compensates for the schema deficiency but doesn't provide comprehensive parameter documentation.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the tool 'Assess[es] the risks of trading with leverage' and specifies it evaluates 'liquidation risk, margin requirements, and safe levels.' This distinguishes it from sibling tools like risk_analyze_portfolio or risk_var by focusing specifically on leverage-related risks.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description explains what the tool evaluates but provides no explicit guidance on when to use this tool versus alternatives like risk_stress_test or risk_analyze_portfolio. There are no prerequisites, exclusions, or workflow guidance mentioned.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden. It discloses pagination defaults (limit: 50, offset: 0) and return type (Dict with sessions list), but lacks safety information (read-only vs destructive), rate limits, or what constitutes a 'session' lifecycle.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The information is present but uses docstring formatting (Args:, Returns:) which is less ideal than natural language for MCP contexts. The content is not overly verbose, but the structure could be more front-loaded and conversational.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a simple 2-parameter list operation with an output schema available, the description is minimally adequate. However, given the complex sibling ecosystem (40+ trading tools), it should clarify filtering capabilities (can it filter by paper/real? by date?) which are absent.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Given 0% schema description coverage, the description effectively compensates by documenting both parameters (limit and offset) with their default values in the Args section. It provides sufficient semantic meaning that the bare schema lacks.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states 'Get list of live trading sessions' with a specific verb and resource. However, it does not clarify whether 'live' means 'real money' (vs paper) or 'currently active', nor does it distinguish from siblings like trading_live_status or trading_live_paper, leaving some ambiguity about scope.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this versus the many sibling tools (e.g., trading_live_paper, trading_live_real, trading_live_status). The description does not indicate if this lists paper vs real sessions, active vs historical, or how it relates to session management workflows.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations provided, so description carries full burden. Documents return value structure (Dict with id, type, status, etc.) which aids response handling, but omits safety traits (read-only assumption), rate limits, or pagination behavior beyond the limit parameter.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Uses efficient docstring format (Args/Returns). Three compact sections with no redundant filler. Returns clause is information-dense but readable.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Adequate for a 2-parameter list tool. Output schema exists per context signals, yet description helpfully enumerates return fields. Missing: definition of 'recent' timeframe, enumeration of valid job_type values, and permission requirements.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Effectively compensates for 0% schema description coverage by documenting both parameters: job_type (optional filter) and limit (max count, default 50). Clarifies optionality and defaults not present in schema descriptions.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    States a clear action (List) and resource (jobs), though 'recent' is ambiguous regarding time window. Does not explicitly differentiate from sibling status-checking tools like backtesting_status or trading_live_status.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Provides no guidance on when to use this generic job list versus specific status tools (e.g., backtesting_status, strategy_create_status). No prerequisites or exclusion criteria mentioned.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Without annotations, the description carries the burden of behavioral disclosure. It adds valuable scope context ('across all backtests') and aggregation behavior ('aggregated by strategy'), but fails to disclose performance characteristics, caching behavior, or whether this operation is read-only/safe.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is appropriately brief for a zero-parameter tool, with two distinct sentences covering action and return value. Minor inefficiency in the 'Returns:' formatting with indentation, but no redundant or wasted content.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the potential complexity of aggregating performance across all backtests, the description is minimally sufficient. While it leverages the existence of an output schema to omit return value details, it omits important context about computational cost, result freshness, and the inability to filter by specific backtests or date ranges.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema contains zero parameters, establishing a baseline of 4 per the scoring rules. The description appropriately does not fabricate parameter semantics where none exist.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the verb ('Analyze'), resource ('strategy performance'), and scope ('across all backtests'). It implicitly distinguishes from siblings like 'logs_analyze_history' (general logs) and 'backtest_comprehensive' (execution vs. analysis) through its specific focus on performance aggregation.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives like 'strategy_compare_strategies' or 'logs_analyze_history'. It does not mention prerequisites, required state, or exclusions.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden. It discloses return value composition ('Daily report with opportunities, risks, and recommendations') and internal behavior ('Scans multiple symbols and strategies'). However, it lacks critical operational context: whether this modifies state, execution duration expectations, rate limiting, or if results are cached/persisted.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description uses a structured docstring format (Args/Returns) that efficiently organizes information. The main narrative is brief (two sentences) though slightly redundant ('identify trading opportunities' vs 'find signals'). The Args section is necessary given the schema deficiency, earning its place.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the tool's complexity (multi-strategy scanning, comprehensive reporting) and lack of annotations, the description provides baseline adequacy. It leverages the existence of an output schema (per context signals) by not over-explaining return structures, yet misses operational context like execution time or differentiation from monitor_scan_opportunities that would make it complete.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 0% description coverage (properties lack descriptions), but the description effectively compensates via the Args section, clarifying that 'symbols' is a 'Comma-separated list.' This adds essential formatting context missing from the schema. With only one parameter and zero schema coverage, this compensation is adequate.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states a clear purpose with specific verb ('Run') and resource ('daily market scan') to identify trading opportunities. It distinguishes from siblings like monitor_get_risks or monitor_get_sentiment by covering multiple aspects (opportunities, risks, recommendations). However, it fails to clarify how this differs from monitor_scan_opportunities, which appears to have overlapping functionality.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives like monitor_scan_opportunities or monitor_get_sentiment. It does not mention prerequisites (e.g., market data availability), scheduling expectations (despite 'daily' in the name), or when to prefer this comprehensive scan over targeted monitoring tools.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden. It discloses the Bayesian method and that it 'tests many parameter combinations,' implying iterative behavior. However, it fails to disclose operational characteristics like whether this is asynchronous, computationally expensive, or state-modifying—critical gaps for a 14-parameter optimization tool.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness3/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The second sentence ('Automatically tests many parameter combinations...') largely repeats the first sentence's meaning without adding new information. The example is valuable but formatted as a dense text block with dashes rather than clear structure. Overall adequate but contains redundancy.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Despite having an output schema (reducing the need to describe returns), the description is incomplete given the tool's complexity. With 14 parameters including trading-specific ones (leverage, fee, exchange_type) and 0% schema coverage, the description should document critical parameters or data formats beyond just the param_space example.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters2/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 0%, requiring heavy compensation. The description provides an example illustrating param_space structure (ranges as arrays) and metric naming, but leaves 12 other parameters (symbol, timeframe, dates, exchange, leverage, fees, etc.) completely undocumented with no format hints or domain context.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose5/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the tool auto-optimizes strategy hyperparameters using Bayesian optimization, providing specific verb, resource, and algorithmic method that distinguishes it from sibling optimization tools like backtest_optimize_parameters or optimization_walk_forward.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines3/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Provides a concrete example showing param_space structure and metric selection, which offers implicit guidance. However, it lacks explicit when-to-use guidance comparing this to sibling optimization tools (e.g., optimization_batch, optimization_walk_forward) or prerequisites for the strategy parameter.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden. It discloses what the tool examines ('recovery timelines') and returns ('improvement suggestions'), but omits safety characteristics (read-only vs. destructive), performance costs, or side effects.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The docstring-style format (summary + Args + Returns) is well-structured and front-loaded. The first two sentences are slightly redundant ('Analyze... patterns' vs 'Examines... timelines'), but overall it efficiently conveys necessary information without excessive verbosity.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a single-parameter analysis tool, the description adequately covers the input and mentions the output structure. However, given zero schema coverage and no annotations, it lacks necessary details like valid strategy_name formats or whether the analysis is run against live or backtested data.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Given 0% schema description coverage, the Args section compensates by documenting the strategy_name parameter as 'Strategy to analyze.' While basic, this adds essential meaning absent from the raw schema, though it lacks format constraints or examples.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the tool analyzes 'historical drawdowns and recovery patterns' and 'when and why losses occur,' providing a specific verb and resource. However, it does not explicitly differentiate from siblings like risk_stress_test or risk_analyze_portfolio within the text itself.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this tool versus alternatives like risk_stress_test or backtest_analyze_regimes, nor are there prerequisites mentioned (e.g., whether the strategy requires prior backtesting).

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden. It adds valuable behavioral context by stating the tool returns 'risk mitigation strategies' (not just raw data), but fails to disclose computational cost, whether this requires pre-loaded portfolio data, or if the operation is read-only vs. state-modifying.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is well-structured with clear Args/Returns sections and front-loaded purpose. The 'etc.' in the second sentence is slightly vague, but overall there is minimal waste and the docstring format appropriately compensates for the poor schema documentation.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    While the Args/Returns sections provide basic coverage for the 2 parameters and mention of output format, the description lacks prerequisites (e.g., whether portfolio data must be pre-loaded) and behavioral context expected for a complex financial analysis tool with no annotations.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Given 0% schema description coverage, the Args section compensates effectively by documenting both parameters: 'pairs' are identified as 'Trading pairs' and 'scenario' includes concrete examples ('market crash', 'flash crash'). It could be improved by specifying the format/syntax for trading pairs (e.g., 'BTC-USD' vs 'BTC/USD').

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the tool 'stress test[s] a portfolio under extreme market scenarios' with specific examples (market crashes, liquidity crises). This distinguishes it from sibling risk tools like risk_var or risk_monte_carlo by emphasizing 'extreme' scenario analysis rather than general statistical risk measurement.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no explicit guidance on when to use this tool versus the numerous sibling risk tools (risk_var, risk_monte_carlo, risk_analyze_portfolio, etc.). While examples imply usage for crash scenarios, there is no 'when to use' or 'when not to use' guidance.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure. It partially compensates by detailing the return dictionary structure (version, test_count, certified status, etc.), which helps predict output. However, it lacks information about side effects, error handling (e.g., strategy not found), caching behavior, or whether this is a read-only operation.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description uses a clean docstring format with distinct Args and Returns sections. Every sentence earns its place: one for purpose, one for parameter documentation, and one for return value structure. No redundancy or fluff is present.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the tool has only one parameter and an output schema exists (per context signals), the description adequately covers the parameter and provides return value details. However, with zero annotations and no mention of error conditions or authentication requirements, there are gaps in operational context for an agent deciding whether and how to invoke the tool.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 0%, requiring the description to compensate. The Args section provides 'Strategy name' for the 'name' parameter, which supplies basic semantic meaning absent from the schema. However, it lacks format constraints, examples, or clarification on whether this accepts display names, IDs, or file paths.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the tool 'Get[s] metadata and version info for a strategy' using a specific verb and resource. However, it does not explicitly differentiate from sibling tools like backtesting_read_strategy or strategy_analyze_optimization_impact, which could help the agent select the correct tool for metadata versus code inspection.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives (e.g., when to check metadata before deployment versus when to run backtest_analyze_regimes). There are no prerequisites, conditions, or exclusions mentioned.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden. It discloses the return value format ('Ranked list of pairs with performance metrics') but fails to indicate whether this is a read-only analysis, whether it modifies strategy configuration, computational cost, or side effects. It mentions analysis occurs but lacks safety profile details critical for a tool named 'optimize'.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The docstring format with explicit Args and Returns sections is appropriate and well-structured. The first two sentences are slightly redundant ('Find the best...' vs 'recommend the best ones'), but overall each section earns its place by compensating for the schema's lack of descriptions.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a single-parameter tool with an implied output schema, the description covers the essential input-output contract. However, it lacks context on valid strategy_name values, expected performance metric types, or error conditions. It meets minimum viability but leaves operational gaps given the complexity of trading strategy optimization.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The schema has 0% description coverage. The Args section compensates by defining strategy_name as 'Strategy to optimize pair selection for', adding necessary semantic context missing from the raw schema. While it doesn't specify valid strategy name formats or where to retrieve them, it adequately documents the single required parameter.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the tool 'Find[s] the best trading pairs for a given strategy' using specific verbs (Find, Analyzes) and resource (trading pairs). It sufficiently distinguishes from siblings like backtest_optimize_parameters (which optimizes parameters, not pair selection) and pairs_correlation (which analyzes correlation, not performance optimization).

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no explicit guidance on when to use this tool versus alternatives like pairs_backtest, optimization_run, or strategy_analyze_optimization_impact. There are no 'when-not-to-use' exclusions or prerequisites mentioned beyond the Args section implying a strategy must exist first.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden. It discloses the three-step workflow (reads, refines, re-validates) and return structure (Dict with status, iterations, changes), but fails to clarify critical mutation semantics: whether it overwrites the original strategy, creates a new version, or what happens if re-validation fails.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is well-structured with clear sections (summary, workflow, Args, Returns) and appropriately concise. The repetition of workflow steps adds clarity without excessive verbosity, though the Args/Returns sections are necessary only due to the schema's lack of descriptions.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    While the description covers the basic operation, parameters, and return values, it lacks completeness for a mutation tool: no safety warnings, no error handling details, and no clarification of the 're-validates' step (whether it invokes backtesting_validate or performs internal validation).

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Given 0% schema description coverage, the Args section effectively compensates by documenting all four parameters (name, feedback, focus_area, max_iterations) with their purposes and default values. This provides essential semantic context missing from the structured schema.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the tool 'Refine[s] existing strategy based on feedback,' using a specific verb and resource. It distinguishes from siblings like strategy_create (new vs. existing) and strategy_delete through the 'refine' operation, though it doesn't explicitly differentiate from strategy_suggest_improvements.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description lacks explicit guidance on when to use this tool versus alternatives like strategy_suggest_improvements or strategy_create. While it implies a workflow (reads → refines → re-validates), there are no 'use when' conditions, prerequisites, or exclusions stated.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden. It adds value by disclosing return value contents ('session status, equity, positions, and metrics'), but fails to state whether this is read-only, what happens if the session_id is invalid, or rate limiting concerns.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Uses a clear docstring format with Args and Returns sections. Every sentence earns its place: the opening defines purpose, the Args section documents the single parameter, and Returns clarifies output structure. Slightly formal but efficient.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a single-parameter status tool with an output schema present, describing the return dictionary fields provides adequate context. However, missing error handling behavior and session lifecycle context (e.g., can query stopped sessions?) leaves gaps for an agent operating without prior domain knowledge.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 0%, requiring the description to compensate. The Args section provides 'session_id: Session ID to check', which successfully adds semantic meaning beyond the bare schema. However, it lacks format details or guidance on how to obtain valid session IDs.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the tool 'Get[s] current status of a paper trading session' with specific verb and resource. It distinguishes from live trading siblings via the 'paper' modifier, though it could better differentiate from related paper tools like 'trading_paper_list' or 'trading_paper_metrics'.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No explicit guidance on when to use this tool versus alternatives (e.g., 'trading_paper_list' to discover sessions first), nor prerequisites such as requiring an active session ID. The description provides operational syntax but no strategic usage context.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden. It discloses the return structure (Dict with total_trades and trades list) and pagination behavior, which is valuable. However, it lacks information on read-only safety, rate limits, or whether the session must be active/stopped to query trades.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description uses a standard docstring format (Args/Returns) that is appropriately structured and front-loaded with the purpose. While the Args section repeats default values already present in the schema, it is necessary given the 0% schema description coverage. No sentences are wasted.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a tool with 3 simple parameters and an output schema (indicated in context), the description adequately covers inputs and return type. However, it lacks detail on what fields constitute a 'trade' object in the returned list and omits error handling scenarios (e.g., invalid session_id).

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 0%, so the description must compensate. The Args section documents all three parameters: session_id (though tautological as 'Session ID'), limit (with default behavior), and offset (explicitly noting pagination). This successfully fills the schema documentation gap despite the weak session_id description.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The first sentence 'Get trades executed in a paper trading session' provides a clear verb (Get) and resource (trades) with scope (paper trading). It distinguishes from siblings like trading_paper_equity_history or trading_paper_metrics by focusing specifically on executed trades, though it could more explicitly differentiate from trading_live_orders.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives like trading_paper_list (likely lists sessions vs trades) or trading_live_orders. There is no mention of prerequisites, such as needing an active session_id from trading_paper_start, or when pagination is necessary.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure. It successfully identifies the three market regimes tested (trending, ranging, volatile) and mentions that the output includes 'adaptation suggestions.' However, it lacks information about execution time, whether results are cached, side effects, or whether this is a read-only analysis operation.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description uses a clear docstring structure with distinct Args and Returns sections. The content is appropriately front-loaded with the purpose statement. The second sentence ('Tests performance...') adds specific regime details that somewhat overlap with the first sentence but provides necessary specificity about the regimes tested.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness4/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the tool has only two simple string parameters and an output schema exists (per context signals), the description provides adequate completeness. It explains the input parameters, the specific analysis scope (three regime types), and the nature of the return value (performance analysis with adaptation suggestions) without needing to detail the full output structure.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 0% schema description coverage, the description compensates by providing basic semantic definitions for both parameters: strategy_name is the 'Strategy to analyze' and pair is the 'Trading pair.' While minimal, these descriptions provide necessary context missing from the schema, though they could be enhanced with format examples or constraints.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the tool analyzes strategy performance across specific market regimes (trending, ranging, volatile). It distinguishes itself from sibling backtest tools by focusing on regime-specific analysis rather than timeframes, parameters, or comprehensive testing. However, it doesn't explicitly differentiate from similar tools like pairs_regimes or backtest_comprehensive.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives like backtest_comprehensive, backtest_compare_timeframes, or pairs_regimes. There are no stated prerequisites, exclusions, or selection criteria to help the agent choose this specific analysis type.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden. It discloses the return structure ('Dict with equity_curve containing timestamp, equity, drawdown data'), but omits safety profile (read-only vs. destructive), rate limits, or error handling behavior.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Uses a standard docstring format (Args/Returns) that is appropriately front-loaded with the core action. The Args section is necessary given 0% schema coverage, though 'Session ID' wastes a line. Returns section efficiently summarizes output structure.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a 2-parameter retrieval tool, it adequately covers the output format despite the existence of an output schema. However, it lacks enumeration of valid 'resolution' values (e.g., '1h', '1d') and omits error scenarios (e.g., invalid session_id behavior).

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema coverage is 0%, requiring the description to compensate. It provides minimal context for 'session_id' (tautological: 'Session ID') but adds meaningful domain context for 'resolution' (applies to 'equity curve data'). It meets baseline viability but does not fully compensate for the schema gap.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose5/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description opens with a specific verb ('Get') + resource ('equity curve data') + scope ('paper trading session'), clearly distinguishing it from siblings like 'trading_paper_trades' (individual trades) and 'trading_paper_metrics' (summary statistics).

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this versus alternatives like 'trading_paper_metrics' or 'trading_paper_list', nor are prerequisites (like obtaining a valid session_id) mentioned.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden. It discloses that the tool returns a 'list of strategy names,' providing basic output context. However, it omits details about caching behavior, rate limiting, or explicit safety confirmation (read-only nature), which are relevant given the lack of annotations.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Three sentences with zero waste: purpose is front-loaded, followed by usage guidance, then return value context. Every sentence earns its place with no redundancy or filler content.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the presence of an output schema, the description appropriately avoids detailing return structures. However, the complete lack of parameter documentation (in both schema and description) creates a gap for a tool that nominally 'lists all' but actually filters based on the undocumented 'include_test_strategies' flag.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters2/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 0% schema description coverage for the 'include_test_strategies' boolean parameter, the description fails to compensate by explaining what this parameter does (e.g., whether to include experimental or test strategies in the list). This leaves the single optional parameter effectively undocumented.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the tool 'List[s] all available trading strategies' with a specific verb and resource. It partially distinguishes from siblings like 'backtesting_read_strategy' by emphasizing it returns names for use with other functions, though it could explicitly contrast with single-strategy retrieval tools.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines4/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Explicitly states 'Use this as the FIRST step to see what strategies are available to test,' providing clear sequencing guidance. It also mentions downstream tools ('backtest(), analyze_results(), etc.') indicating the workflow context, though it lacks explicit 'when not to use' guidance.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden. It discloses the data source ('Jesse database') and return value ('Backtest history with metrics and trends'), but fails to mention safety characteristics (read-only nature), rate limits, or whether this queries a local vs remote database.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is efficiently structured with a clear first sentence stating purpose, followed by concise Args and Returns sections. Every sentence earns its place with no redundant information.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness4/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the tool's simplicity (single optional parameter) and the existence of an output schema, the description is appropriately complete. It documents the parameter (compensating for the bare schema) and identifies the specific database context ('Jesse').

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Despite 0% schema description coverage, the Args section compensates effectively by documenting the 'days' parameter semantics ('Number of days to analyze') and its default value (30), providing sufficient context for invocation.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the action ('Analyze') and resource ('recent backtest history from Jesse database'), distinguishing it from sibling tools like 'logs_strategy_performance' or 'backtest_comprehensive' by focusing specifically on historical backtest data rather than live performance or running new tests.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    There is no guidance on when to prefer this tool over alternatives like 'backtest_analyze_regimes' or 'logs_weekly_report', nor any mention of prerequisites such as requiring existing backtest data in the Jesse database.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden. It discloses return content ('Fear & Greed score and rating with historical context'), which adds value beyond the schema. However, it lacks operational details such as data freshness, caching behavior, rate limits, or whether this is a read-only operation.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is appropriately compact with two efficient sentences. It front-loads the action ('Get current market sentiment') and uses a Returns section to structure the output description without verbosity.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness4/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the tool's simplicity (no parameters) and the presence of an output schema, the description provides sufficient context. It identifies the data source (Fear & Greed Index) and hints at the return structure, which is adequate for a stateless data retrieval operation.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema contains zero parameters, which establishes a baseline score of 4. The description correctly implies no configuration is needed for the basic sentiment retrieval, making this appropriate for the schema structure.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the tool 'Get[s] current market sentiment from Fear & Greed Index' with a specific verb and resource. However, it does not explicitly differentiate from sibling monitoring tools like monitor_get_risks or monitor_daily_scan, though the specific 'Fear & Greed Index' reference provides implicit scoping.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives. It does not indicate whether this should be used for real-time trading decisions, historical analysis, or how it compares to monitor_scan_opportunities or other market data tools.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden. It successfully documents the return format (recommendations with sizing and cost-benefit analysis) but fails to disclose safety properties (read-only vs. destructive), authentication requirements, or side effects.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The docstring format (Args/Returns) is well-structured and appropriately sized. The first two sentences are slightly redundant ('Recommend' vs 'Suggests'), but overall every section earns its place by conveying distinct information.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a 2-parameter tool with simple strings and documented returns, the description is minimally adequate. However, it omits constraints (valid risk_type values beyond examples), error conditions, or data freshness expectations that would make it fully complete.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters5/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 0%, requiring the description to fully compensate. The Args section excellently documents both parameters with clear examples (e.g., 'BTCUSDT', 'downside', 'volatility'), providing necessary semantic context that the schema lacks.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the tool recommends hedging strategies and suggests instruments/positioning, using specific verbs and resources. However, it does not explicitly differentiate from sibling risk analysis tools (e.g., risk_stress_test, risk_analyze_portfolio) beyond the hedging focus.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no explicit guidance on when to use this tool versus other risk assessment siblings, nor does it mention prerequisites or exclusions. Usage must be inferred solely from the functional description.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations provided, so description carries full burden. States it uses no real money and returns session_id/status, but omits critical operational details: whether it runs asynchronously (likely given `jobs_list` sibling), session persistence, or how to monitor/stop the session using related tools.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Well-structured with Args and Returns sections. Minor redundancy between 'simulated, no real money' opening and 'simulates trades without using real funds' second sentence. Otherwise efficient with no wasted words.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given 7 parameters and session complexity, description covers basics adequately. However, missing operational context crucial for a session-starting tool: session lifecycle, connection to monitoring siblings (`trading_live_status`, `trading_live_cancel`), or runtime behavior expectations.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters5/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Excellent compensation for 0% schema description coverage. Description documents all 7 parameters with semantics (e.g., 'must exist in strategies directory'), format examples ('BTC-USDT', '1h'), and source context ('ID of stored exchange API key in Jesse').

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    Clearly states it starts a simulated paper trading session with specific verb and resource. Mentions 'before going live' implying distinction from live trading siblings. However, fails to clarify relationship to sibling `trading_paper_start`, leaving ambiguity about which paper trading tool to use.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines3/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Provides implicit guidance ('Use this to test strategies before going live') establishing the simulation context. Lacks explicit when-not-to-use guidance or comparison to alternatives like `trading_live_real` or `trading_paper_start`.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden. It successfully discloses the intellectual purpose (detecting luck/overfitting) and references return values ('Statistical significance assessment'), but omits operational details like computational cost, caching behavior, or whether this modifies existing backtest records.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The docstring format with Args/Returns sections is structurally clear and appropriately sized. The content is front-loaded with the core purpose in the first sentence, though the Args/Returns sections in the description text are somewhat redundant given the structured schema fields.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness4/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a 2-parameter analysis tool with an output schema present, the description provides adequate context. It covers the validation purpose, parameter meanings, and return type. Missing only prerequisite context (e.g., requires existing backtest results) to be fully complete.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Given 0% schema description coverage, the description compensates by providing basic semantics: 'strategy_name: Strategy to validate' and 'pair: Trading pair.' While functional, it lacks format examples, constraints, or enumeration details that would make parameter usage unambiguous.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the tool validates 'backtest results are statistically significant,' specifying the statistical focus. However, it doesn't explicitly differentiate from similar siblings like 'backtesting_validate' or 'backtest_monte_carlo,' leaving some ambiguity about when to choose this specific validation method.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines3/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The phrase 'Ensures results are real, not luck or overfitting' implies when to use this tool (when concerned about overfitting), but lacks explicit guidance on prerequisites (e.g., requires completed backtests) or comparison to alternative validation approaches available in siblings.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden but only discloses that the operation cancels a job and returns a status dict. It omits critical behavioral details like idempotency, error conditions (invalid job_id), whether cancellation is graceful/immediate, and if partial strategy data persists.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Uses a structured docstring format (Args/Returns) that is front-loaded and efficient. Every line serves a purpose, though the formal structure is slightly more verbose than prose for a single-parameter tool.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Adequate for a simple cancellation tool with one parameter and an output schema (mentioned in context signals). However, lacking annotations, it should disclose more about error handling and state management to be fully complete.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Despite 0% schema description coverage, the Args section fully compensates by clearly defining job_id as the 'Job identifier to cancel', providing necessary semantic context the schema lacks.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose5/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description provides a specific verb (Cancel) and resource (async strategy creation job), clearly distinguishing it from siblings like strategy_create (which initiates jobs) and strategy_delete (which removes completed strategies).

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines3/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description implies usage context (async jobs) but lacks explicit when-to-use guidance, such as specifying this is for stopping pending/running jobs or clarifying behavior if the job is already complete.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full disclosure burden. It adequately describes the return value ('Dict with equity curve data points'), but fails to disclose safety characteristics (read-only vs. destructive), error conditions (invalid session_id), rate limits, or whether 'real-time' implies streaming vs. polling behavior.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description uses a standard docstring format (Args/Returns) that efficiently organizes information. The opening sentence states the purpose immediately, followed by structured parameter and return value documentation. There is no redundant or wasted text.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    While the description adequately covers the three parameters and mentions the return type (sufficient given the output schema exists), it lacks critical contextual details for a financial trading tool: no safety hints, no error handling guidance, and no distinction between paper and live trading sessions despite the complex sibling tool landscape.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters5/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Given 0% schema description coverage, the description fully compensates by documenting all three parameters in the Args section: 'session_id' is identified as the session identifier, and the time range parameters are explained with units ('milliseconds') and optionality. This provides essential semantic context missing from the raw schema.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description opens with a clear verb ('Get') and specific resource ('real-time equity curve (P&L)'), clarifying the scope ('for a session'). The term 'real-time' helps distinguish from historical analysis tools, though it could explicitly clarify whether this applies to both paper and real-money sessions given the sibling tool naming.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides no guidance on when to use this tool versus alternatives like 'trading_paper_equity_history' or backtesting analysis tools. It omits prerequisites (e.g., requiring an active session) and fails to indicate whether this is for monitoring active trades or post-session analysis.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden. It successfully documents the return value ('Dict with number of entries cleared per cache') and explains the null behavior, but fails to explicitly disclose that this is a destructive operation with data loss implications or mention performance impacts.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The Args/Returns structure is highly scannable. The opening sentence is front-loaded with purpose. Every line earns its place—no redundant text. The format efficiently conveys parameter constraints and return type in minimal space.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness4/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a single-parameter tool with straightforward functionality, the description is complete. It documents the input parameter, valid values, default behavior, and return structure. Given the absence of annotations, it adequately covers the behavioral contract despite missing explicit safety warnings.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The schema has 0% description coverage, but the description excellently compensates by documenting the valid enum-like values ('strategy_list, backtest') and explaining the default null behavior ('clears all caches'). This adds critical semantic meaning beyond the raw schema.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the action ('Clear cache(s)') and purpose ('to free memory or force fresh data'). It distinguishes itself from read-only cache operations like 'cache_stats' by emphasizing the clearing action, though it could more explicitly contrast with sibling tools.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines3/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides implied usage context ('free memory or force fresh data') explaining when to use it, but lacks explicit guidance on when NOT to use it or comparisons to alternatives like 'cache_stats'. The Args section effectively documents the parameter behavior.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden. It discloses the return structure ('Dict with sessions list and total count'), which adds necessary context, but omits read-only/safety characteristics, pagination behavior, or rate limiting considerations.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Extremely efficient two-sentence structure. The first sentence states the action and target; the second documents the return value. No redundant or filler text present.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness4/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Adequate for a simple list operation with zero parameters. The return value description compensates somewhat for lack of annotations. Could be improved by noting pagination limits or the maximum number of sessions returned, but sufficient for basic invocation.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Input schema has zero parameters with 100% coverage (trivially). Description implicitly confirms there are no filtering options by stating 'List all', which aligns with the empty schema. Baseline 4 applies for zero-parameter tools.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    States specific verb 'List' and resource 'paper trading sessions' with scope 'all', making the core function unambiguous. However, it does not explicitly differentiate from siblings like 'trading_paper_status' (which likely checks a specific session) or 'trading_live_sessions', relying instead on naming conventions.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines3/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Provides no explicit guidance on when to use this versus alternatives (e.g., 'trading_live_sessions' for live trading, or 'trading_paper_status' for specific session details). Usage must be inferred from the name and description alone.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations provided, so description carries full burden. It discloses the specific timeframes tested (1m through 1d) and that it returns an optimal recommendation. However, it lacks safety disclosure (read-only vs destructive), execution duration expectations, or side effects typical for backtesting operations.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Uses efficient docstring format (Args/Returns). Every sentence earns its place: first sentence establishes purpose, second enumerates specific timeframes, Args section handles parameter semantics. No redundancy or wasted words.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness4/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the tool has an output schema (per context signals), the brief Returns section is adequate. With only 2 parameters and 0% schema coverage, the description successfully documents both. Could improve by noting this is an analysis/read-only operation, but sufficient for selection and invocation.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema has 0% description coverage. The description compensates by defining both parameters: 'strategy_name: Strategy to test' and 'pair: Trading pair'. While it clarifies semantic meaning, it omits format guidance (e.g., pair syntax like 'BTC-USD') and how to discover valid strategy names.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose5/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    Description opens with specific verb 'Compare' + resource 'strategy performance' + scope 'across different timeframes'. The explicit enumeration of tested timeframes (1m, 5m, 15m, 1h, 4h, 1d) effectively distinguishes this from siblings like backtest_comprehensive or backtest_run.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No explicit guidance on when to use this tool versus alternatives like backtest_comprehensive, backtest_run, or backtest_optimize_parameters. The description states what it does but not the selection criteria or prerequisites (e.g., whether strategy must be validated first).

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full disclosure burden. While 'Get' implies a read-only operation, the description lacks explicit safety guarantees, idempotency statements, or performance characteristics (e.g., cached vs. real-time). It compensates partially by documenting the return structure ('Dict with exchanges list...').

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is appropriately front-loaded with the core action in the first sentence. The 'Returns' block adds useful structure, though sentences 1 and 2 are slightly redundant ('Get list' vs 'Returns a list'). Overall efficient with no extraneous content.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness4/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the tool's simplicity (zero parameters), the presence of an output schema, and the clear documentation of return values in the description, the definition is complete. It adequately covers what the tool does, when to use it, and what it returns.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The tool has zero parameters and the schema description coverage is effectively 100% (empty schema). With zero parameters, the baseline score is 4 as no additional semantic explanation is required in the description.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the tool 'Get[s] list of supported exchanges in Jesse' with a specific verb and resource. While it implies distinction from action-oriented siblings like 'backtesting_run' by focusing on validation, it does not explicitly differentiate from similar lookup tools.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines4/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides clear guidance: 'Use this to validate exchange names before creating trading workflows or backtests.' This establishes the specific prerequisite context for invocation, though it does not explicitly name alternatives to avoid.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure. It explains what metrics are returned (execution time, candles/second), but omits operational details such as whether the benchmark creates temporary resources, requires specific permissions, or has side effects on the system.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description uses a clear structured format with distinct Args and Returns sections. It is appropriately sized with minimal redundancy, though listing defaults in text that also exist in the schema's default fields is slightly repetitive (but acceptable given the lack of schema descriptions).

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness4/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the tool's low complexity (4 optional parameters, simple types, no nesting), the description is sufficiently complete. It documents all inputs and describes the return dictionary's contents, which is adequate given that an output schema exists (per context signals) to define the exact structure.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters5/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Excellent compensation for the 0% schema description coverage. The Args section provides clear semantic meaning for all 4 parameters (symbol, timeframe, days, exchange), explaining what each represents along with their default values, effectively documenting parameters missing from the structured schema.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the tool runs a 'benchmark backtest' to measure 'execution time' and 'candles/second performance', distinguishing it from sibling tools like backtesting_run that likely perform actual strategy evaluation. However, it could more explicitly contrast with the numerous other backtest-related siblings.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines3/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description states it is 'Useful for understanding how long different backtests will take', providing implied context for when to use it. However, it lacks explicit when-not-to-use guidance or clear differentiation from the regular backtesting_run tool for users who might confuse benchmarking with strategy testing.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden. It discloses that this returns performance metrics (total_return, sharpe_ratio, etc.) and is for single execution vs batch. However, it omits operational details: computational cost, idempotency, caching behavior, or side effects (e.g., whether results are persisted to storage).

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Well-structured with clear sections: purpose, returns, use cases, and example workflow. Front-loaded with the core action. Slightly redundant in listing return metrics since output schema exists, but the example flow adds concrete value without excessive length.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given 14 undocumented parameters, the description is insufficient for parameter usage despite strong workflow context. However, for an output-schema-backed tool, listing key metrics and providing the A/B testing workflow example provides adequate (though not comprehensive) contextual coverage for the tool's ecosystem role.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters2/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema has 0% description coverage across 14 parameters, requiring the description to compensate. It fails to do so, only referencing 'specified parameters' generically. While the A/B example shows 'strategy' parameter syntax, no semantic explanation is provided for required params (symbol, timeframe, dates) or optional flags like 'include_trades'.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose5/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    Opens with specific verb 'Run' and resource 'backtest on a strategy'. The qualifier 'single' effectively distinguishes it from siblings like 'backtest_comprehensive', 'backtest_optimize_parameters', and 'backtest_monte_carlo'. The purpose is immediately clear and scoped.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines5/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Excellent 'Use this to:' section with four specific scenarios including A/B testing. Explicitly names successor tools ('analyze_results', 'monte_carlo', 'risk_report') to establish workflow sequence. Provides concrete 3-step example flow. Clear when-to-use vs alternatives.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden. It compensates partially by disclosing the return structure ('Dict with cache status...'), but fails to mention safety profile (read-only), side effects, or rate limits.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Two sentences total. Purpose is front-loaded in the first sentence. The 'Returns:' section is appropriately placed and every sentence conveys distinct information (operation purpose and output structure).

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness5/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given zero parameters and the existence of an output schema (per context signals), the description is complete. It adequately covers the tool's simple function and return value without requiring additional behavioral context beyond what is provided.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has zero parameters. Per guidelines, 0 params = baseline 4. No parameter semantics are needed or provided.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose5/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description uses a specific verb ('Get') and resource ('cache statistics') and details specific metrics included ('hit/miss ratio and cache sizes'). The action clearly distinguishes it from sibling 'cache_clear' (read stats vs. destructive clear).

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No explicit guidance on when to use this tool versus alternatives (e.g., when to check stats vs. clearing cache), nor any prerequisites or conditions mentioned. Only states what it does, not when to do it.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries the full burden. It successfully documents what analysis is returned (trade metrics, risk summary) but fails to disclose safety characteristics (read-only vs destructive), performance characteristics, or side effects. It adds value by describing the analysis content but lacks operational behavioral traits.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Well-structured with clear bullet points for output categories and a numbered workflow. Front-loaded with purpose and input/output contract. Minor fluff ('deep insights', 'understand what happened') prevents a 5, but overall efficient with no wasted sentences.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the moderate complexity (5 parameters, nested objects, output schema exists), the description provides strong workflow integration and sibling context but insufficient parameter documentation. The output schema exists so detailed return value explanation isn't required, but with 0% schema coverage on inputs, the description should have documented the optional flags.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters2/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 0% schema description coverage, the description must compensate significantly. It only mentions 'backtest result dict' (covering the required parameter) but completely omits the 4 optional parameters: analysis_type, include_trade_analysis, include_correlation, and include_monte_carlo. There is no mapping between the described analysis features and the boolean flags that control them.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose5/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the tool extracts insights from backtest results, specifying the input (backtest result dict) and output (trade metrics, performance breakdown, risk metrics). It effectively distinguishes itself from sibling tools like 'optimization_run' or 'backtest_monte_carlo' by positioning itself as the analytical step that follows backtesting.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines5/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Excellent explicit guidance with 'Use this after backtest()' and a numbered workflow (1. backtest -> 2. analyze_results -> 3. monte_carlo -> 4. risk_report). This clearly establishes when to use the tool versus alternatives and prerequisites, directly addressing the common workflow sequence.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden. It successfully explains the simulation nature (no real funds), but omits critical behavioral details: whether the session runs asynchronously/backgrounded, persistence characteristics, lifecycle management (relation to trading_paper_stop), or error handling behavior.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The docstring format (Args/Returns) is well-structured and front-loaded with the core action. The content is efficient with no wasted sentences, though the Returns section is partially redundant given the tool has a formal output schema (per context signals).

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    While parameter documentation is thorough, the description lacks operational context for this complex 9-parameter tool: no mention of session management, prerequisites for strategy existence, or interaction with the broader trading session lifecycle (siblings like trading_paper_list/status/stop). The Returns summary is adequate given the formal output schema exists.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters5/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Given 0% schema description coverage, the Args section fully compensates by documenting all 9 parameters with concrete examples (e.g., 'SMACrossover', 'BTC-USDT', '1h') and noting optionality. This adds substantial semantic value beyond the bare schema types and defaults.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose5/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description opens with 'Start a paper trading session' providing a specific verb and resource. It clearly distinguishes the scope by explaining it 'simulates trades without using real funds' and is for testing 'before going live', implicitly differentiating it from sibling live trading tools like trading_live_real.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines3/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    It provides basic usage context ('Use this to test strategies before going live'), but lacks explicit guidance on when to choose this over similar siblings like backtest_run (offline testing) or trading_live_paper (paper trading via live exchange connection). No prerequisites are mentioned (e.g., whether the strategy must exist or candles must be imported).

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden. It discloses the external service dependency (ntfy.sh) and return structure (status and timestamp), but omits critical behavioral traits like rate limits, authentication requirements, failure modes, or network timeout behavior for the external API call.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Uses a structured docstring format (Args/Returns) that efficiently organizes information. All four sentences earn their place. The format is slightly verbose but maximally clear, with no redundant or tautological statements.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness4/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the presence of an output schema (per context signals), the description appropriately provides only a high-level summary of returns ('status and timestamp') rather than duplicating schema details. All parameters are documented despite zero schema coverage, making this complete for a simple 3-parameter external API wrapper.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters5/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Despite 0% schema description coverage, the Args section comprehensively documents all 3 parameters. It adds crucial semantic meaning beyond the schema: message content purpose, valid priority enum values (default, low, high, urgent), and the comma-separated format for tags. This fully compensates for the schema's lack of descriptions.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose5/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description explicitly states 'Send a notification alert via ntfy.sh' - providing a specific verb (Send), resource (notification alert), and external service context (ntfy.sh). It clearly distinguishes itself from the 60+ sibling trading/backtesting tools by being the sole alerting utility.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines3/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    While there are no explicit 'when to use' instructions or alternatives listed, the tool's purpose is clear from context among trading-focused siblings. However, it lacks guidance on when to prefer this over other potential alerting mechanisms or prerequisites like ntfy.sh topic configuration.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior4/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden and discloses key behaviors: the async/sync execution modes, the iterative refinement process, and the exact return dictionary structure. It does not mention error states or file system side effects implied by the 'path' return value.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Well-structured with clear sections (summary, async behavior, Args, Returns). Information is front-loaded with behavioral modes early. The Args/Returns format is slightly verbose but necessary given the lack of schema annotations.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness4/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a complex 10-parameter tool with mutation behavior and async capabilities, the description is comprehensive. It covers inputs, operational modes, and outputs (despite the existence of an output schema, the Returns section ensures clarity). Minor gap in prerequisite/permission documentation.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters5/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Given 0% schema description coverage, the Args section comprehensively compensates by documenting all 10 parameters with clear semantics, default values (e.g., risk_per_trade 0.02 = 2%), and optionality. This is essential for agent usage given the schema provides no descriptions.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the tool creates a strategy with 'iterative refinement' (specific verb + resource). The 'Ralph Wiggum loop' reference, while jargon, implies a specific iterative process that distinguishes this from simple creation. However, the cultural reference slightly obscures clarity for those unfamiliar with the term.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines3/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description provides operational guidance on when to use async_mode (returns job_id immediately) versus synchronous execution (progress logging). However, it lacks explicit differentiation from siblings like 'strategy_refine' or 'strategy_delete' to guide tool selection.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the burden of disclosing behavior. It documents the return structure (stopped_at, duration, final_metrics) but omits side effects like whether the session can be resumed, if data is persisted, or cleanup actions.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Uses a standard docstring format with Args and Returns sections. The first sentence states the core purpose immediately, and there is no redundant or wasted text.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness4/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a single-parameter tool with output schema information provided in the Returns section, the description is adequately complete. A reference to trading_paper_start or trading_paper_list for obtaining the session_id would improve it further.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 0%, so the description compensates by documenting the single parameter in the Args section ('Session ID to stop'). While minimal, it provides sufficient semantic context for a straightforward identifier parameter.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose5/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description opens with a specific verb ('Stop') + resource ('paper trading session') + outcome ('return final metrics'), clearly distinguishing it from siblings like trading_paper_start or trading_paper_status.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines3/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The docstring structure and return value description imply usage (call this to terminate a session), but there is no explicit 'when to use' guidance, prerequisites (e.g., session must be active), or contrast with trading_paper_update.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior4/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full disclosure burden. It compensates by detailing the exact return structure (enabled, rate_per_second, available_tokens, etc.), providing transparency into what the tool produces. It does not explicitly state safety (read-only nature) but this is strongly implied by 'Get'.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Two distinct sections: single-sentence purpose declaration followed by structured 'Returns:' documentation. No filler text. Information density is high with zero redundancy.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness5/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given zero parameters and the presence of an output schema (per context signals), the tool is inherently low-complexity. The description adequately covers both invocation (purpose) and result structure, which is complete for a simple status-monitoring utility.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Input schema has zero parameters. Per scoring rules, 0 params establishes baseline 4. The description correctly implies no inputs are needed by omitting parameter discussion and leading directly with the action.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose5/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description uses specific verb 'Get' with clear resource 'rate limiter status' and scope 'Jesse API calls'. It clearly distinguishes from siblings which focus on trading, backtesting, and risk operations rather than API infrastructure monitoring.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No explicit guidance on when to invoke this tool versus alternatives, nor any mention of prerequisites or triggering conditions (e.g., 'use before bulk operations'). The description assumes the agent knows when to check rate limits.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior4/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description effectively discloses the async polling pattern ('Poll') and provides detailed return value structure including the conditional 'result (when complete)' field, giving clear expectations about response shape and completion behavior.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description uses a clean, structured format with clear 'Args:' and 'Returns:' sections. Every line earns its place; no redundancy or filler text. The one-sentence purpose statement is immediately followed by structured technical details.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness4/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a single-parameter polling tool, the description is adequately complete: it explains the input, the async behavior, and the output structure (acting as the output schema documentation). It could mention error states or polling intervals, but the core functionality is fully covered.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema coverage is 0%, but the description compensates by explaining that 'job_id' is a 'Job identifier from strategy_create', providing necessary semantic context about the parameter's origin and nature that the schema lacks.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose5/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the specific action ('Poll async job progress') and scope ('for strategy creation'), using precise verbs that distinguish it from the sibling 'strategy_create' tool which actually creates the strategy.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines3/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description implies the workflow by stating the job_id comes from 'strategy_create', indicating when to use the tool. However, it lacks explicit guidance on when NOT to use it (e.g., vs. generic 'jobs_list') or alternatives for checking status.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior4/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations provided, so description carries full burden. It discloses the return structure ('Dict with available boolean and optional error message'), which is critical behavioral info. Implicitly read-only via verb choice, though could explicitly state safety characteristics.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Two clear statements: purpose declaration followed by return specification. No redundant or wasted text. Information is front-loaded with the action verb.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness5/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Adequate for a simple zero-parameter check tool. Documents what the tool does, what it returns (available boolean, error message), and distinguishes itself from trading operations. Given the low complexity, nothing essential is missing.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Zero parameters present, which establishes a baseline of 4 per scoring rules. Description correctly focuses on behavior and return values rather than parameters.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose5/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    Specific verb 'Check' + resource 'jesse-live plugin' + clear scope 'installed and available'. Distinct from siblings like trading_live_real or trading_live_status which perform actual trading operations or check session state, whereas this checks plugin installation.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines3/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Usage is implied by the name and sibling context (prerequisite check before using other trading_live_* tools), but lacks explicit guidance like 'call this before trading_live_real' or 'use when you need to verify plugin availability'.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior4/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations provided, so description carries full burden. It discloses return values (connection status, version, strategies count) and implies read-only safety through 'Check' semantics. Could enhance by noting idempotency or error states, but adequately covers the core behavior for a health-check tool.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Three sentences with zero waste: purpose statement, return value disclosure, and usage guideline. Front-loaded with the core action and perfectly sized for the tool's simplicity.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness5/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given zero parameters and existence of output schema (context signals), the description provides sufficient context. It explains what the tool does, what it returns at a high level, and the operational context (verify reachability), making it complete for a health check endpoint.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Input schema contains zero parameters (empty object) with 100% coverage. Baseline score of 4 applies as per rubric for parameterless tools; description correctly omits parameter discussion since none exist.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose5/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    Specific verb 'Check' paired with clear resource 'Jesse REST API health and connection status'. Distinguishes clearly from execution-oriented siblings like backtesting_run or backtest_comprehensive by focusing on connectivity/health rather than operations.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines5/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Explicitly states when to use: 'before running backtests or optimizations'. Provides clear sequencing guidance relative to sibling tools, helping the agent understand this is a prerequisite check rather than an action tool.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior5/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations are provided, so the description carries full burden. It comprehensively discloses destructive financial risk ('funds are at risk of total loss'), prerequisites (backtesting/paper trading requirements), and return values (session_id and status or error details). The warning symbols and risk section provide critical behavioral context beyond the schema.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Excellent structure with front-loaded warning symbols, clear section headers (RISKS, REQUIREMENTS, Args, Returns), and zero wasted words. Every sentence serves a critical purpose for a high-stakes financial tool, with risks prominently displayed before usage requirements.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness5/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the high complexity (9 parameters, real money at risk, destructive potential) and lack of annotations, the description is remarkably complete. It covers safety warnings, prerequisites, all parameter semantics, and return value structure, providing sufficient context for safe tool invocation.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters5/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 0%, but the description compensates perfectly by documenting all 9 parameters with semantic meaning: strategy location ('must exist in strategies directory'), symbol/timeframe formats (examples provided), confirmation exact string requirement, and permission level options ('paper_only', 'confirm_required', 'full_autonomous').

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose5/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description explicitly states 'Start LIVE trading with REAL MONEY' with specific verb and resource. It clearly distinguishes from sibling tools like 'trading_live_paper' by emphasizing real funds versus simulation, and contrasts with backtesting tools by focusing on live execution.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines5/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Explicitly lists four requirements including 'Successful paper trading first' and 'Thoroughly backtested strategy,' which clearly indicates when to use this tool versus siblings like 'trading_live_paper' or 'backtesting_run'. The confirmation parameter requirement further enforces appropriate usage guardrails.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

GitHub Badge

Glama performs regular codebase and documentation scans to:

  • Confirm that the MCP server is working as expected.
  • Confirm that there are no obvious security issues.
  • Evaluate tool definition quality.

Our badge communicates server capabilities, safety, and installation instructions.

Card Badge

jesse-mcp MCP server

Copy to your README.md:

Score Badge

jesse-mcp MCP server

Copy to your README.md:

How to claim the server?

If you are the author of the server, you simply need to authenticate using GitHub.

However, if the MCP server belongs to an organization, you need to first add glama.json to the root of your repository.

{
  "$schema": "https://glama.ai/mcp/schemas/server.json",
  "maintainers": [
    "your-github-username"
  ]
}

Then, authenticate using GitHub.

Browse examples.

How to make a release?

A "release" on Glama is not the same as a GitHub release. To create a Glama release:

  1. Claim the server if you haven't already.
  2. Go to the Dockerfile admin page, configure the build spec, and click Deploy.
  3. Once the build test succeeds, click Make Release, enter a version, and publish.

This process allows Glama to run security checks on your server and enables users to deploy it.

How to add a LICENSE?

Please follow the instructions in the GitHub documentation.

Once GitHub recognizes the license, the system will automatically detect it within a few hours.

If the license does not appear on the server after some time, you can manually trigger a new scan using the MCP server admin interface.

How to sync the server with GitHub?

Servers are automatically synced at least once per day, but you can also sync manually at any time to instantly update the server profile.

To manually sync the server, click the "Sync Server" button in the MCP server admin interface.

How is the quality score calculated?

The overall quality score combines two components: Tool Definition Quality (70%) and Server Coherence (30%).

Tool Definition Quality measures how well each tool describes itself to AI agents. Every tool is scored 1–5 across six dimensions: Purpose Clarity (25%), Usage Guidelines (20%), Behavioral Transparency (20%), Parameter Semantics (15%), Conciseness & Structure (10%), and Contextual Completeness (10%). The server-level definition quality score is calculated as 60% mean TDQS + 40% minimum TDQS, so a single poorly described tool pulls the score down.

Server Coherence evaluates how well the tools work together as a set, scoring four dimensions equally: Disambiguation (can agents tell tools apart?), Naming Consistency, Tool Count Appropriateness, and Completeness (are there gaps in the tool surface?).

Tiers are derived from the overall score: A (≥3.5), B (≥3.0), C (≥2.0), D (≥1.0), F (<1.0). B and above is considered passing.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/bkuri/jesse-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server