Skip to main content
Glama

Server Details

Calibrated world model for AI agents. 40 tools: world state, markets, trading. Kalshi + Polymarket.

Status
Healthy
Last Tested
Transport
Streamable HTTP
URL
Repository
spfunctions/simplefunctions-cli
GitHub Stars
4
Server Listing
SimpleFunctions

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client
Glama
MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.
Tool DescriptionsB

Average 3.4/5 across 56 of 56 tools scored. Lowest: 2.5/5.

Server CoherenceB
Disambiguation2/5

Many tools have overlapping or ambiguous purposes, causing potential confusion. For example, 'get_trade_ideas', 'get_edges', 'scan_markets', and 'screen_markets' all relate to finding trading opportunities but with unclear distinctions. Similarly, 'get_world_state' and 'get_world_delta' serve similar functions with incremental vs. full updates, and 'enrich_content' vs. 'monitor_the_situation' both cross-reference content with markets but differ in scope and authentication. The descriptions help somewhat, but the high tool count exacerbates the overlap.

Naming Consistency4/5

Tool names generally follow a consistent verb_noun pattern (e.g., 'create_intent', 'list_theses', 'update_position'), with clear actions like get, create, update, list. However, there are minor deviations such as 'augment_tree' (verb_noun but less common), 'browse_public_skills' (browse vs. list), 'enrich_content' (enrich vs. get), and 'what_if' (non-standard phrasing). Overall, the naming is mostly predictable and readable, with only a few inconsistencies.

Tool Count2/5

With 56 tools, the count is excessive for a single server, making it overwhelming and difficult to navigate. While the domain (prediction market trading and analysis) is broad, many tools could be consolidated or grouped (e.g., multiple 'get_' tools for market data, advanced vs. basic functions). This high number suggests poor scoping and likely includes redundant or overly granular operations that hinder usability.

Completeness5/5

The tool set provides comprehensive coverage for prediction market trading and analysis, with no obvious gaps. It includes full lifecycle management (create, update, list, delete for theses, intents, positions), data retrieval (market prices, edges, context, forecasts), execution tools (intents, strategies), and auxiliary functions (forum, skills, legislation). The surface supports end-to-end workflows from discovery to execution and monitoring, ensuring agents can perform all necessary operations without dead ends.

Available Tools

56 tools
add_positionBInspect

Record a new position in a thesis for tracking. Use after an intent fills or a manual trade.

ParametersJSON Schema
NameRequiredDescriptionDefault
sizeNoPosition size (contracts)
venueYesExchange venue
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
thesisIdYesThesis ID
directionYesPosition direction
rationaleNoWhy this position
entryPriceYesEntry price in cents
marketTitleYesHuman-readable market name
externalMarketIdYesMarket ticker
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. While it implies a write operation ('Record a new position'), it lacks details on permissions, side effects, error handling, or response format. For a mutation tool with zero annotation coverage, this is a significant gap in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise and front-loaded, consisting of just two sentences that directly state the purpose and usage context. Every word earns its place, with no redundant or unnecessary information, making it efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (a write operation with 9 parameters, 7 required), no annotations, and no output schema, the description is incomplete. It doesn't address behavioral aspects like what happens on success/failure, authentication needs (implied by 'apiKey' but not explained), or how it integrates with sibling tools, leaving significant gaps for an AI agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the input schema already documents all parameters thoroughly. The description doesn't add any additional meaning or context beyond what the schema provides (e.g., it doesn't explain relationships between parameters like 'thesisId' and 'externalMarketId'). Baseline 3 is appropriate when the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('Record') and resource ('new position in a thesis for tracking'), making it understandable. However, it doesn't explicitly differentiate from sibling tools like 'update_position' or 'close_position', which would require more specific context about when to use each.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides implied usage guidelines by stating 'Use after an intent fills or a manual trade,' which gives some context for when to invoke this tool. However, it doesn't explicitly mention alternatives (e.g., when to use 'update_position' instead) or exclusions, leaving room for ambiguity compared to sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

augment_treeAInspect

Advanced: Merge suggested causal tree nodes from evaluations into the tree (append-only).

ParametersJSON Schema
NameRequiredDescriptionDefault
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
dryRunNoPreview without applying (default: false)
thesisIdYesThesis ID
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses the 'append-only' behavioral trait, which is valuable context beyond the basic 'merge' action. However, it doesn't mention permissions, side effects, rate limits, or what happens on failure. The 'Advanced' label hints at complexity but lacks specifics.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence with zero waste. It's front-loaded with the core action and includes a qualifier ('append-only') that adds necessary context without verbosity. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description is minimal but covers the core purpose. It lacks details on permissions, error handling, or return values, which are important for an 'Advanced' operation. The 100% schema coverage helps, but behavioral context is incomplete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so parameters are well-documented in the schema. The description doesn't add any parameter-specific semantics beyond what the schema provides (e.g., it doesn't explain how 'dryRun' interacts with merging). Baseline 3 is appropriate when the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('merge suggested causal tree nodes from evaluations') and resource ('into the tree'), with the qualifier 'append-only' indicating a specific type of merge. It distinguishes from siblings like 'update_nodes' by focusing on merging from evaluations rather than general updates, though it doesn't explicitly name alternatives.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when there are 'suggested causal tree nodes from evaluations' to be merged, but doesn't specify when to use this versus alternatives like 'update_nodes' or 'update_thesis'. No explicit when-not or prerequisite guidance is provided, leaving usage context somewhat implied.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_public_skillsAInspect

Browse public skills from the community. No auth required. Filter by category, search, or sort by popularity.

ParametersJSON Schema
NameRequiredDescriptionDefault
qNoSearch skills by name/description
sortNoSort order: popular or new (default: new)
categoryNoFilter by category: custom, trading, research, monitoring
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It adds valuable context: 'No auth required' clarifies access requirements, and 'Filter by category, search, or sort by popularity' hints at functionality beyond basic listing. It doesn't cover rate limits or response format, but provides useful operational details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences with zero waste: the first states the core purpose and auth requirement, the second lists capabilities. It is appropriately sized and front-loaded, with every sentence earning its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a read-only tool with 3 parameters and 100% schema coverage but no output schema, the description is reasonably complete. It covers purpose, auth, and capabilities, though it doesn't explain return values or pagination. Given the tool's moderate complexity, it provides sufficient context for an agent to use it effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all three parameters (q, sort, category). The description mentions filtering and sorting but adds no syntax or format details beyond what the schema provides. Baseline 3 is appropriate when the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('browse') and resource ('public skills from the community'), making the purpose specific and unambiguous. It distinguishes this tool from siblings like 'list_skills' or 'explore_public' by emphasizing the public/community aspect and browsing functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool ('browse public skills from the community') and mentions filtering/sorting capabilities. However, it does not explicitly state when not to use it or name alternatives among siblings (e.g., 'list_skills' might be for private skills), leaving some ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

cancel_intentBInspect

Cancel an active intent. Stops trigger evaluation and prevents execution.

ParametersJSON Schema
NameRequiredDescriptionDefault
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
intentIdYesIntent ID to cancel
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions 'Stops trigger evaluation and prevents execution', which adds some behavioral context about halting processes. However, it lacks details on permissions, reversibility, side effects, or response format, leaving significant gaps for a mutation tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with two short sentences that directly state the tool's purpose and effect. Every word contributes meaning, with no wasted text, making it front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description is incomplete. It lacks information on success/failure outcomes, error conditions, or broader system impact, which are critical for an agent to use this tool effectively in context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with both parameters ('apiKey', 'intentId') well-documented in the schema. The description adds no additional parameter details beyond what the schema provides, so it meets the baseline of 3 without compensating for any gaps.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Cancel') and resource ('an active intent'), specifying what the tool does. It distinguishes from siblings like 'create_intent' or 'list_intents' by focusing on termination rather than creation or listing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It does not mention prerequisites (e.g., needing an active intent) or compare to other tools like 'close_position' or 'update_strategy', leaving usage context unclear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

close_positionCInspect

Delete a position record from a thesis.

ParametersJSON Schema
NameRequiredDescriptionDefault
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
thesisIdYesThesis ID
positionIdYesPosition ID
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It states 'Delete,' implying a destructive mutation, but doesn't disclose behavioral traits like whether deletion is permanent, requires specific permissions, has side effects on related data, or returns confirmation. This leaves significant gaps in understanding the tool's impact.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, direct sentence with zero wasted words, efficiently conveying the core action and target. It's appropriately sized and front-loaded, making it easy for an agent to parse quickly without unnecessary detail.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity as a destructive operation with no annotations and no output schema, the description is incomplete. It fails to address critical aspects like return values, error conditions, or confirmation of deletion, leaving the agent with insufficient information for reliable invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with clear parameter descriptions in the input schema (e.g., 'Thesis ID,' 'Position ID,' 'SimpleFunctions API key'). The description adds no additional semantic context beyond what the schema provides, so it meets the baseline for adequate but not enhanced parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Delete') and target resource ('a position record from a thesis'), making the purpose unambiguous. However, it doesn't explicitly differentiate from sibling tools like 'update_position' or 'add_position', which would require more specific context about what distinguishes deletion from modification or creation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives like 'update_position' or other position-related operations. The description lacks context about prerequisites, such as ensuring the position exists or isn't actively in use, leaving the agent without usage direction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

configure_heartbeatAInspect

Configure the 24/7 heartbeat engine: news scan interval, X scan interval, LLM model tier, monthly budget, pause/resume. Agent can speed up monitoring during high volatility or slow down to save budget.

ParametersJSON Schema
NameRequiredDescriptionDefault
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
pausedNoPause/resume heartbeat
thesisIdYesThesis ID
xIntervalMinNoX/social scan interval in minutes (60-1440, default 240)
evalModelTierNoLLM model for evaluations
newsIntervalMinNoNews scan interval in minutes (15-1440, default 60)
monthlyBudgetUsdNoMonthly budget cap in USD (0 = unlimited)
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It mentions the tool configures settings but doesn't disclose behavioral traits like whether changes are immediate, reversible, require specific permissions, or have side effects. The description adds some context about volatility/budget scenarios but lacks details about the mutation's impact, error conditions, or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized with two sentences. The first sentence front-loads the purpose and lists key parameters efficiently. The second sentence provides usage guidance. There's minimal waste, though the parameter listing could be slightly more structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a configuration/mutation tool with 7 parameters, no annotations, and no output schema, the description is adequate but has gaps. It covers what the tool does and when to use it but lacks details about the mutation's behavior, response format, error handling, or prerequisites. The schema provides parameter documentation, but the description doesn't compensate for missing behavioral context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 7 parameters thoroughly. The description lists parameter categories (news scan interval, X scan interval, etc.) but doesn't add meaningful semantic context beyond what's in the schema descriptions. Baseline 3 is appropriate when schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool configures the '24/7 heartbeat engine' with specific configuration parameters (news scan interval, X scan interval, LLM model tier, monthly budget, pause/resume). It specifies the verb 'configure' and resource 'heartbeat engine' but doesn't explicitly differentiate from sibling tools like 'get_heartbeat_status' or 'monitor_the_situation'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear usage context: 'Agent can speed up monitoring during high volatility or slow down to save budget.' This gives practical guidance on when to adjust settings. However, it doesn't explicitly state when NOT to use this tool or mention alternatives among the many sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_intentAInspect

Declare an execution intent: "buy X when condition Y, expire at Z." Intents are the single gateway for all order execution. The local runtime daemon evaluates triggers and executes via user's Kalshi/Polymarket keys.

ParametersJSON Schema
NameRequiredDescriptionDefault
venueYesExchange venue
actionYesTrade action
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
sourceNoIntent source: agent, manual, ideaagent
expireAtNoISO timestamp when intent expires (default: +24h)
marketIdYesMarket ticker (e.g. KXFEDDEC-25DEC31-T100)
maxPriceNoMax price per contract in cents (1-99). Omit for market order.
sourceIdNoSource reference (idea ID, thesis ID, etc.)
directionYesContract direction
rationaleNoWhy this trade — logged for audit trail
autoExecuteNoAuto-execute without human confirmation
marketTitleYesHuman-readable market name
triggerTypeNoWhen to executeimmediate
triggerPriceNoPrice trigger threshold in cents (for price_below/price_above)
targetQuantityYesNumber of contracts
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions that intents are evaluated and executed via user's keys, hinting at external dependencies and potential financial actions, but lacks details on permissions, rate limits, error handling, or what happens upon execution (e.g., order confirmation). This leaves gaps for a tool with significant real-world impact.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is highly concise and front-loaded, using just two sentences that efficiently convey the core purpose and mechanism without any wasted words. Every sentence earns its place by defining the tool's role and execution process clearly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (15 parameters, financial trading context) and lack of annotations or output schema, the description is incomplete. It adequately explains the high-level purpose but misses critical details like return values, error cases, or behavioral nuances needed for safe use in a trading environment, leaving room for improvement.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the input schema already documents all 15 parameters thoroughly. The description adds minimal value by implying parameters like 'buy X' (action/market), 'condition Y' (trigger), and 'expire at Z' (expireAt), but does not provide additional semantics beyond what the schema offers, such as usage examples or constraints not in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('declare an execution intent') and resources ('buy X when condition Y, expire at Z'), explicitly distinguishing it from siblings by noting it's 'the single gateway for all order execution.' This goes beyond just restating the name to explain its unique role in the system.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context that this tool is for declaring intents that are evaluated and executed by a local runtime daemon, implying it should be used for automated trading setups. However, it does not explicitly state when not to use it or name specific alternatives among the many sibling tools, such as 'add_position' or 'update_position,' which might handle similar trading actions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_skillBInspect

Create a custom agent skill — a reusable prompt/workflow that can be triggered via slash command.

ParametersJSON Schema
NameRequiredDescriptionDefault
autoNoAuto-trigger condition
nameYesSkill name (e.g. "precheck")
tagsNoTags for discovery
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
promptYesThe full prompt/instructions for the skill
triggerYesSlash command trigger (e.g. "/precheck")
categoryNoCategory: custom, trading, research, monitoring
toolsUsedNoSF tools this skill uses
descriptionYesWhat this skill does
estimatedTimeNoEstimated run time
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden but only states the basic function. It doesn't disclose behavioral traits such as permissions needed (e.g., API key usage), whether creation is reversible, rate limits, or what happens on success/failure. This is inadequate for a mutation tool with zero annotation coverage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose. Every word earns its place without redundancy, making it easy to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a creation tool with 10 parameters, no annotations, and no output schema, the description is insufficient. It lacks context on usage scenarios, error handling, or what the tool returns (e.g., a skill ID). This leaves significant gaps for an agent to operate effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 10 parameters thoroughly. The description adds no additional meaning beyond what's in the schema (e.g., it doesn't clarify relationships between parameters like 'trigger' and 'name'). Baseline 3 is appropriate when schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Create') and resource ('custom agent skill'), specifying it's a reusable prompt/workflow triggered via slash command. It distinguishes from siblings like 'fork_skill' (which duplicates) or 'list_skills' (which retrieves).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives like 'fork_skill' (for duplicating) or 'publish_skill' (for making public). The description implies usage for initial creation but lacks context on prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_strategyAInspect

Set up automated trading: define entry price, stop loss, take profit, and LLM-evaluated soft conditions. The heartbeat engine checks conditions every 15 min and executes when met.

ParametersJSON Schema
NameRequiredDescriptionDefault
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
marketYesHuman-readable market name
horizonNoTime horizonmedium
marketIdYesMarket ticker e.g. KXWTIMAX-26DEC31-T150
stopLossNoStop loss: bid <= this value (cents)
thesisIdYesThesis ID
directionYesTrade direction
rationaleNoFull logic description
entryAboveNoEntry trigger: ask >= this value (cents, for NO direction)
entryBelowNoEntry trigger: ask <= this value (cents)
takeProfitNoTake profit: bid >= this value (cents)
maxQuantityNoMax total contracts
softConditionsNoLLM-evaluated conditions
perOrderQuantityNoContracts per order
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It adds valuable context about the 'heartbeat engine' checking conditions every 15 minutes and executing when met, which reveals automation behavior. However, it doesn't cover important aspects like authentication needs (apiKey is documented in schema but not described in behavior), rate limits, error handling, or what happens on execution failure.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized with two sentences that efficiently convey the core functionality. The first sentence establishes the purpose, and the second adds important behavioral context about the heartbeat engine. There's minimal waste, though it could be slightly more structured for clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex 14-parameter tool with no annotations and no output schema, the description provides adequate but incomplete context. It covers the automation aspect well but doesn't address important considerations like what the tool returns, error conditions, or how it interacts with other trading tools. The description compensates somewhat for the lack of structured metadata but leaves gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 14 parameters thoroughly. The description mentions key parameters like 'entry price, stop loss, take profit, and LLM-evaluated soft conditions' but doesn't add meaningful semantic context beyond what the schema provides. It establishes the purpose but doesn't clarify parameter relationships or usage nuances.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('Set up automated trading') and resources ('define entry price, stop loss, take profit, and LLM-evaluated soft conditions'), distinguishing it from siblings like 'create_intent' or 'update_strategy' by focusing on automated trading strategy creation with heartbeat monitoring.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context ('automated trading', 'heartbeat engine checks conditions every 15 min') but doesn't explicitly state when to use this tool versus alternatives like 'create_intent' or 'update_strategy'. It provides some operational context but lacks explicit guidance on prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_thesisCInspect

Create a new prediction market thesis. Builds a causal tree and scans for mispriced contracts.

ParametersJSON Schema
NameRequiredDescriptionDefault
syncNoWait for formation to complete
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
thesisYesYour thesis statement
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions 'Builds a causal tree and scans for mispriced contracts', which hints at computational behavior, but it lacks critical details like whether this is a read-only or mutating operation, what permissions or API keys are required (though the schema covers the apiKey parameter), rate limits, or what the output looks like. For a creation tool with zero annotation coverage, this is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose and adds implementation details without redundancy. Every word earns its place, making it highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of creating a prediction market thesis with no annotations and no output schema, the description is incomplete. It lacks behavioral context (e.g., mutation effects, error handling), output details, and usage guidelines, leaving significant gaps for the agent to navigate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all three parameters (sync, apiKey, thesis) with descriptions. The description doesn't add any meaning beyond what the schema provides, such as explaining the relationship between parameters or usage examples. The baseline score of 3 is appropriate when the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Create a new prediction market thesis') and the resource ('thesis'), and it adds implementation details about building a causal tree and scanning for mispriced contracts. However, it doesn't explicitly differentiate this tool from sibling tools like 'create_intent', 'create_skill', or 'create_strategy', which all involve creation operations in the same domain.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites, compare it to similar tools like 'create_intent' or 'create_skill', or specify scenarios where it's appropriate. The agent must infer usage from the purpose alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

enrich_contentBInspect

Cross-reference any text with thousands of prediction market contracts. Paste content + topics, get divergence analysis: where sentiment disagrees with market prices. No auth, no Firecrawl needed. Demo/trial entry point.

ParametersJSON Schema
NameRequiredDescriptionDefault
modelNoLLM model for digest generation. Default: google/gemini-2.5-flash
topicsYesTopics to search in prediction markets, e.g. ["iran", "oil"]
contentYesText content to cross-reference (up to 50K chars)
includeIndexNoInclude SF Index
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It adds useful context: 'No auth, no Firecrawl needed' indicates accessibility and integration simplicity, and 'Demo/trial entry point' hints at potential limitations or trial nature. However, it lacks details on rate limits, error handling, or what 'divergence analysis' entails in practice, leaving gaps for a tool with no output schema.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and front-loaded, starting with the core functionality. Each sentence adds value: the first explains the tool's purpose, the second specifies input/output, and the third provides contextual details. However, the phrase 'Demo/trial entry point' could be more precise, and the structure slightly mixes operational details with purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of cross-referencing text with prediction markets and no output schema, the description is moderately complete. It covers the tool's purpose, input requirements, and some behavioral context (e.g., no auth needed), but lacks details on output format, error cases, or how 'divergence analysis' is presented. For a tool with 4 parameters and no annotations, more completeness would be beneficial.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters thoroughly. The description adds minimal value beyond the schema, mentioning 'Paste content + topics' which aligns with the required parameters but doesn't provide additional syntax or format details. With high schema coverage, the baseline score of 3 is appropriate as the description doesn't significantly enhance parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Cross-reference any text with thousands of prediction market contracts' and 'get divergence analysis: where sentiment disagrees with market prices.' It specifies the verb ('cross-reference', 'get divergence analysis') and resource ('text', 'prediction market contracts'), though it doesn't explicitly differentiate from sibling tools like 'scan_markets' or 'screen_markets' which might have overlapping functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context by mentioning 'Demo/trial entry point' and 'No auth, no Firecrawl needed,' suggesting it's for initial exploration or testing. However, it doesn't provide explicit guidance on when to use this tool versus alternatives like 'scan_markets' or 'query', nor does it specify prerequisites or exclusions beyond the implied trial context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

explore_publicAInspect

Browse public theses from other users. No auth required. Pass a slug to get details, or omit to list all.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugNoSpecific thesis slug, or empty to list all
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses that 'No auth required' and implies read-only behavior ('Browse'), but lacks details on rate limits, pagination for listing, or error handling. This is adequate but has gaps for a tool with no annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences with zero waste: it states the purpose, auth requirement, and parameter usage efficiently. Every sentence adds value, and it's front-loaded with key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations, no output schema, and a simple input schema, the description is moderately complete. It covers purpose, auth, and parameter usage but lacks details on return values, error cases, or behavioral constraints like rate limits. This is adequate but not fully comprehensive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, so the schema already documents the 'slug' parameter. The description adds minimal value by restating the parameter's purpose ('Pass a slug to get details, or omit to list all'), which aligns with the schema but doesn't provide additional semantics like format examples. Baseline 3 is appropriate here.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Browse') and resource ('public theses from other users'), making the purpose understandable. However, it doesn't explicitly differentiate from sibling tools like 'list_theses' or 'browse_public_skills', which would require a 5.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context on when to use the tool: 'Pass a slug to get details, or omit to list all.' It also mentions 'No auth required,' which is helpful. However, it doesn't specify when to use this tool over alternatives like 'list_theses' or 'browse_public_skills', preventing a score of 5.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

fork_skillBInspect

Fork a public skill into your collection. No slug needed — just the skill ID.

ParametersJSON Schema
NameRequiredDescriptionDefault
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
skillIdYesPublic skill ID to fork
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It mentions the operation ('fork') which implies a copy/create action, but doesn't describe what happens during forking (e.g., whether it creates an editable copy, maintains links to original, includes dependencies). No information about permissions needed, rate limits, error conditions, or what the forked skill inherits. The description is minimal and leaves significant behavioral aspects unspecified.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise at just two sentences, with zero wasted words. It's front-loaded with the core purpose, followed by a brief parameter guidance. Every sentence earns its place by providing essential information about the tool's function and parameter expectations.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description is incomplete for a mutation tool. While concise, it lacks critical information about what the fork operation actually does behaviorally, what permissions are required, what the response contains, or how the forked skill relates to the original. For a tool that presumably creates a copy of someone else's work, more context about the operation's implications would be valuable.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so both parameters are documented in the schema. The description adds minimal value beyond the schema: it mentions 'No slug needed' which clarifies what's NOT required, and 'just the skill ID' reinforces the skillId parameter's purpose. However, it doesn't provide additional context about parameter relationships, validation rules, or usage patterns that aren't already in the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Fork') and resource ('a public skill into your collection'), providing a specific verb+resource combination. It distinguishes from siblings like 'create_skill' by specifying it's for forking existing public skills rather than creating new ones from scratch. However, it doesn't explicitly differentiate from 'fork_thesis' which is a similar operation on a different resource type.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context by specifying 'public skill' and 'into your collection,' suggesting this is for copying shared skills to personal use. It mentions 'No slug needed' which provides some guidance about parameter requirements. However, it doesn't explicitly state when to use this versus alternatives like 'create_skill' or 'browse_public_skills,' nor does it provide exclusion criteria or prerequisites beyond having the skill ID.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

fork_thesisCInspect

Fork a public thesis into your collection. Copies the thesis statement and causal tree. You then run formation to scan for fresh edges.

ParametersJSON Schema
NameRequiredDescriptionDefault
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
idOrSlugYesPublic thesis ID or slug to fork
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It mentions copying content and running 'formation to scan for fresh edges,' which hints at post-fork automation, but fails to disclose critical behavioral traits like authentication needs (implied by apiKey), rate limits, side effects, or what 'formation' entails. This leaves significant gaps for a mutation tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and front-loaded with the main action, using three sentences that each add value: forking, copying content, and post-fork automation. There is no wasted text, though it could be slightly more structured for clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description is incomplete. It misses details on authentication (only implied by apiKey), error handling, what 'formation' does, and the return value. Given the complexity and lack of structured data, more context is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters (apiKey and idOrSlug). The description adds no additional meaning about parameters beyond implying 'idOrSlug' refers to a public thesis, which is minimal value. Baseline 3 is appropriate as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Fork a public thesis into your collection') and specifies what gets copied ('thesis statement and causal tree'), which distinguishes it from generic creation tools. However, it doesn't explicitly differentiate from sibling tools like 'fork_skill' or 'create_thesis' beyond mentioning 'public thesis'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for forking public theses but provides no guidance on when to use this versus alternatives like 'create_thesis' or 'fork_skill', nor does it mention prerequisites or exclusions. It lacks explicit when/when-not instructions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_balanceCInspect

Advanced: Kalshi account balance and portfolio value.

ParametersJSON Schema
NameRequiredDescriptionDefault
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions 'Advanced:' which might imply complexity or additional requirements, but doesn't explain what that entails (e.g., rate limits, authentication needs, or data freshness). It also doesn't describe the return format (e.g., numeric balance, structured portfolio data) or potential errors. For a financial tool with zero annotation coverage, this is a significant gap in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is brief with two phrases, but 'Advanced:' is vague and doesn't earn its place by adding useful information—it could be omitted without loss. The core purpose is stated efficiently, but the structure isn't front-loaded with the most critical details (e.g., it doesn't immediately clarify this is a read-only tool for account data). It's concise but could be more effectively structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of a financial tool with no annotations and no output schema, the description is incomplete. It doesn't explain what the tool returns (e.g., balance in dollars, portfolio breakdown), behavioral aspects like authentication requirements or rate limits, or how it fits with sibling tools. For a tool that likely provides critical account information, this leaves too much unspecified for reliable agent use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage for its single parameter (apiKey), detailing it as a 'SimpleFunctions API key' with a URL for acquisition. The description adds no parameter-specific information beyond what the schema provides, such as how the apiKey relates to the Kalshi account or any additional context. With high schema coverage, the baseline score of 3 is appropriate as the description doesn't enhance parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool retrieves 'Kalshi account balance and portfolio value', which specifies the resource (account/portfolio) and the action (get/retrieve). However, it's prefixed with 'Advanced:' which adds ambiguity rather than clarity, and it doesn't distinguish this from potential sibling tools like 'get_fills' or 'get_settlements' that might also relate to account data. The purpose is clear but not optimally specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites (e.g., needing an API key for authentication), context for usage (e.g., checking account status before trading), or exclusions (e.g., not for real-time market data). With many sibling tools present, this lack of differentiation leaves the agent to infer usage from the name alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_contextAInspect

START HERE. Global market snapshot: top edges (mispriced contracts), price movers, highlights, traditional markets — live exchange data updated every 15 min. With thesisId + apiKey: thesis-specific context including causal tree, edges with orderbook depth, evaluation history, and track record. No auth needed for global context.

ParametersJSON Schema
NameRequiredDescriptionDefault
apiKeyNoSimpleFunctions API key. Required only for thesis-specific context. Get one at https://simplefunctions.dev/dashboard/keys or run: sf login
thesisIdNoThesis ID. Omit for global market snapshot (no auth needed).
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well by disclosing key behavioral traits: data freshness ('live exchange data updated every 15 min'), authentication requirements ('No auth needed for global context'), and the two distinct operational modes. It doesn't mention rate limits, error conditions, or response format details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with front-loaded information about global context, followed by thesis-specific details. Every sentence adds value, though the 'START HERE' prefix is slightly redundant and the description could be more streamlined.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 2 parameters, no annotations, and no output schema, the description provides good coverage of functionality and usage modes. However, it lacks details about response format, error handling, and doesn't explain what 'edges' or 'causal tree' mean in this context, which could be important for proper tool selection.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The description adds meaningful context beyond the schema by explaining the functional relationship between parameters: 'With thesisId + apiKey: thesis-specific context' and 'Omit for global market snapshot'. This clarifies how parameter combinations affect tool behavior.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides market data and thesis-specific context, with specific resources mentioned (mispriced contracts, price movers, highlights, traditional markets, causal tree, edges with orderbook depth, evaluation history, track record). However, it doesn't explicitly differentiate from siblings like 'get_edges' or 'get_world_state' which might provide overlapping market data.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use different modes: 'No auth needed for global context' and 'With thesisId + apiKey: thesis-specific context'. It clearly distinguishes between global snapshot and thesis-specific access, though it doesn't mention alternatives among sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_edgesAInspect

Top mispriced markets across all theses, ranked by edge size. Shows where your model disagrees with the market. No auth = public thesis edges. Auth = private + public.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMax edges to return
venueNoFilter: kalshi or polymarket
apiKeyNoSimpleFunctions API key. Omit for public edges only.
minEdgeNoMinimum edge in cents
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses key behavioral traits: the ranking mechanism (by edge size), the effect of auth (public vs. private+public edges), and the focus on model-market disagreement. However, it lacks details on rate limits, error handling, or output format, which would be needed for a perfect score.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose in the first sentence, followed by clarifying details in two concise sentences. Every sentence adds value (e.g., explaining auth impact), with no wasted words or redundancy, making it highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (4 parameters, no annotations, no output schema), the description is adequate but has gaps. It covers the purpose, auth context, and ranking logic, but lacks details on output format, error cases, or deeper behavioral aspects like pagination. This makes it minimally viable for a tool with this parameter count and no structured support.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters (limit, venue, apiKey, minEdge). The description adds marginal value by implying the apiKey's role in auth (private vs. public edges), but does not provide additional semantics beyond what the schema offers, such as explaining 'edge' units or venue specifics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: it retrieves 'Top mispriced markets across all theses, ranked by edge size' and explains that it 'Shows where your model disagrees with the market.' This specifies the verb (get/retrieve), resource (edges/mispriced markets), and scope (across all theses), distinguishing it from siblings like get_markets or get_forecast by focusing on edge-based ranking of mispricing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use it: to find mispriced markets based on edge size. It distinguishes between public and private edges based on auth presence, but does not explicitly mention when not to use it or name specific alternatives among siblings (e.g., scan_markets or screen_markets might be related but are not referenced).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_evaluation_historyBInspect

Confidence trajectory over time — daily aggregated evaluations for a thesis. Use to analyze trends, detect convergence/divergence.

ParametersJSON Schema
NameRequiredDescriptionDefault
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
thesisIdYesThesis ID
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It describes the tool as retrieving aggregated data over time, which implies a read-only operation, but does not specify aspects like rate limits, authentication needs (beyond the apiKey parameter), data format, or pagination. This leaves gaps in understanding how the tool behaves in practice.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and front-loaded, consisting of two short sentences that directly state the tool's function and usage. There is no wasted language, and it efficiently communicates the core purpose without unnecessary elaboration.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (a read operation with 2 parameters) and the absence of annotations and output schema, the description is moderately complete. It explains what data is retrieved but lacks details on return values, error handling, or behavioral constraints. This is adequate for a basic tool but leaves room for improvement in guiding the agent fully.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with both parameters (thesisId and apiKey) documented in the schema. The description does not add any additional meaning or context beyond what the schema provides, such as explaining how the thesisId relates to evaluations or the aggregation process. Baseline 3 is appropriate as the schema handles parameter documentation adequately.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: to retrieve 'daily aggregated evaluations for a thesis' and analyze 'confidence trajectory over time.' It specifies the resource (thesis evaluations) and the verb (get/analyze), but does not explicitly differentiate from siblings like 'get_milestones' or 'get_world_state,' which might also involve thesis-related data.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions 'use to analyze trends, detect convergence/divergence,' which implies a context for usage but does not provide explicit guidance on when to use this tool versus alternatives. No sibling tools are referenced, and there is no mention of prerequisites or exclusions, leaving the agent to infer usage based on the purpose alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_fillsCInspect

Advanced: Recent trade fills on Kalshi.

ParametersJSON Schema
NameRequiredDescriptionDefault
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
tickerNoFilter by market ticker
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions 'recent trade fills' but doesn't specify time ranges, pagination, rate limits, authentication needs beyond the apiKey parameter, or what 'Advanced:' entails operationally. This leaves significant gaps for a tool that likely involves financial data access.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise—a single sentence with no wasted words. It's front-loaded with the key information ('recent trade fills on Kalshi'), and the 'Advanced:' prefix, while minimal, adds context efficiently. Every part earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations, no output schema, and a tool that likely returns financial data (trade fills), the description is incomplete. It doesn't explain return values, error conditions, or behavioral traits like data recency or access constraints. For a tool with potential complexity in a trading context, this is inadequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with both parameters (apiKey and ticker) well-documented in the schema. The description adds no additional parameter semantics beyond implying filtering by 'recent' and 'Kalshi', which are already covered or inferred. This meets the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('get') and resource ('recent trade fills on Kalshi'), making the purpose specific and understandable. It distinguishes from siblings like 'get_orders' or 'get_settlements' by focusing on trade fills, though it doesn't explicitly contrast with them. The 'Advanced:' prefix adds context but doesn't detract from clarity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It mentions 'Advanced:' which might imply it's for experienced users, but offers no explicit when/when-not rules or references to sibling tools like 'get_orders' or 'get_settlements' for comparison. Usage is implied rather than stated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_forecastCInspect

Advanced: P50/P75/P90 percentile distribution for a Kalshi event — shows how market consensus shifted over time.

ParametersJSON Schema
NameRequiredDescriptionDefault
daysNoDays of history (default 7)
apiKeyYesSimpleFunctions API key (needed for authenticated Kalshi calls). Get one at https://simplefunctions.dev/dashboard/keys
eventTickerYesKalshi event ticker (e.g. KXWTIMAX-26DEC31)
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions the tool shows 'how market consensus shifted over time' which implies historical trend analysis, but doesn't disclose authentication requirements (though the apiKey parameter hints at this), rate limits, data freshness, or what format the percentile distribution is returned in. For a financial data tool with zero annotation coverage, this is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that conveys the core functionality. The 'Advanced:' prefix could be considered slightly redundant but doesn't significantly detract. The sentence is front-loaded with the main purpose and doesn't waste words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a financial data tool with no annotations and no output schema, the description is incomplete. It doesn't explain what the tool returns (beyond mentioning percentile distributions), doesn't specify data formats or units, and provides no context about Kalshi events themselves. Given the complexity of financial forecasting data and lack of structured output documentation, more completeness is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all three parameters thoroughly. The description doesn't add any parameter-specific information beyond what's in the schema - it doesn't explain how the 'days' parameter relates to the percentile calculations or provide examples of event ticker formats. Baseline 3 is appropriate when the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool retrieves percentile distribution data (P50/P75/P90) for a Kalshi event and shows how market consensus shifted over time. It specifies the resource (Kalshi event) and verb (get/show distribution), but doesn't explicitly differentiate from sibling tools like 'get_markets' or 'scan_markets' which might provide related market data.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention sibling tools like 'get_markets' or 'scan_markets' that might provide overlapping functionality, nor does it specify prerequisites beyond the implied need for a Kalshi event ticker. The 'Advanced:' prefix suggests specialized use but offers no concrete usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_heartbeat_statusCInspect

Get heartbeat config + this month's cost summary for a thesis. See news/X scan intervals, model tier, budget usage.

ParametersJSON Schema
NameRequiredDescriptionDefault
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
thesisIdYesThesis ID
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It implies a read-only operation ('Get'), but doesn't specify if it requires specific permissions, has rate limits, returns real-time or cached data, or details error conditions. The mention of 'scan intervals' and 'budget usage' hints at monitoring behavior, but lacks operational clarity for safe invocation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose. The additional detail about 'news/X scan intervals, model tier, budget usage' is relevant but could be slightly more integrated. Overall, it's appropriately sized with minimal waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a read-only tool with 2 parameters and no output schema, the description covers the basic purpose but lacks completeness. Without annotations or output schema, it should ideally specify return format, error handling, or data freshness. The mention of specific data points helps, but doesn't fully compensate for missing behavioral context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with both parameters (thesisId, apiKey) well-documented in the schema. The description adds no parameter-specific information beyond what the schema provides, such as format examples or constraints. Baseline 3 is appropriate as the schema handles parameter documentation adequately.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool retrieves 'heartbeat config + this month's cost summary for a thesis', specifying both the resource (thesis) and what data is returned (config and cost summary). It distinguishes from siblings like 'get_balance' or 'get_milestones' by focusing on heartbeat-specific monitoring data, though it doesn't explicitly contrast with them.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'get_balance' (for general balance) or 'configure_heartbeat' (for setup). It mentions 'See news/X scan intervals, model tier, budget usage' as data points, but this is output detail, not usage context. No exclusions or prerequisites are stated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_marketsAInspect

Get traditional market prices via Databento. Default: SPY, VIX, TLT, GLD, USO. Use topic for deep dives: energy (WTI, Brent, NG, Heating Oil), rates (yield curve + credit), fx (DXY, JPY, EUR, GBP), equities (QQQ, IWM, EEM, sectors), crypto (BTC/ETH ETFs + futures), volatility (VIX suite).

ParametersJSON Schema
NameRequiredDescriptionDefault
topicNoDeep dive topic: energy, rates, fx, equities, crypto, volatility. Omit for core 5.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It describes what data is returned (market prices) and the default behavior (core 5 markets). However, it doesn't disclose important behavioral traits like whether this is a read-only operation, potential rate limits, authentication requirements, data freshness, or error handling. The description adds some context but leaves significant gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately concise with two sentences. The first sentence states the core purpose and default behavior. The second sentence explains the parameter usage with specific examples. Every sentence earns its place, though the second sentence is somewhat dense with multiple topic examples packed together.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (1 optional parameter, no output schema, no annotations), the description is somewhat complete but has gaps. It explains what data is returned and how to use the parameter, but doesn't describe the return format, data structure, or any limitations. For a data retrieval tool with no output schema, more information about the response would be helpful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage for its single parameter, so the baseline is 3. The description adds significant value by explaining the parameter's purpose ('Use topic for deep dives') and providing concrete examples of what each topic value returns (e.g., 'energy (WTI, Brent, NG, Heating Oil)'). This goes beyond the schema's generic description and helps the agent understand the parameter's practical use.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Get traditional market prices via Databento.' It specifies the verb ('Get') and resource ('traditional market prices'), and mentions the data source. However, it doesn't explicitly distinguish this tool from sibling tools like 'scan_markets' or 'screen_markets', which appear related but have different purposes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear usage guidance: 'Default: SPY, VIX, TLT, GLD, USO. Use topic for deep dives...' It explains when to use the optional parameter (for specific topic areas) versus when to omit it (for core 5 markets). However, it doesn't explicitly state when NOT to use this tool or mention alternatives among siblings like 'scan_markets' or 'query_databento'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_milestonesAInspect

Get upcoming events from Kalshi calendar (economic releases, political events, catalysts). No auth required.

ParametersJSON Schema
NameRequiredDescriptionDefault
hoursNoHours ahead to look (default 168 = 1 week)
categoryNoFilter by category (e.g. Economics, Politics, Sports)
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses that 'No auth required' which is valuable behavioral context about authentication requirements. However, it doesn't mention rate limits, pagination behavior, error conditions, or what format the events are returned in. The description adds some value but leaves significant behavioral aspects unspecified.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise - just two sentences that each earn their place. The first sentence states the purpose and scope, the second provides important behavioral context about authentication. There's zero wasted language or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a read-only tool with 2 parameters and 100% schema coverage but no output schema, the description provides adequate but minimal information. It covers the purpose and authentication context but doesn't describe the return format or data structure. Given the lack of output schema, the description could do more to explain what the tool returns.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already fully documents both parameters. The description doesn't add any parameter-specific information beyond what's in the schema. According to guidelines, when schema coverage is high (>80%), the baseline is 3 even with no param info in the description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Get') and resource ('upcoming events from Kalshi calendar'), specifying the content type ('economic releases, political events, catalysts'). It distinguishes from siblings like get_schedule or get_markets by focusing on calendar events rather than trading schedules or market data.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool ('Get upcoming events from Kalshi calendar') and mentions 'No auth required' which is helpful guidance. However, it doesn't explicitly state when NOT to use it or name specific alternatives among the many sibling tools, though the purpose naturally differentiates it from most siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_ordersCInspect

Advanced: Current resting orders on Kalshi.

ParametersJSON Schema
NameRequiredDescriptionDefault
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
statusNoFilter by status: resting, canceled, executed. Default: restingresting
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It mentions 'resting orders' which implies read-only retrieval of pending orders, but doesn't specify authentication requirements (beyond the apiKey parameter), rate limits, pagination, error conditions, or what data is returned. For a financial API tool with zero annotation coverage, this is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise (6 words) and front-loaded with the core functionality. However, the 'Advanced:' prefix adds minimal value and could be considered slightly wasteful. Overall, it's efficient but could be more informative.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a financial trading API tool with no annotations and no output schema, the description is inadequate. It doesn't explain what data is returned (order details, format, structure), doesn't mention authentication requirements beyond the parameter, and provides no context about Kalshi's platform. The description should do more to compensate for the lack of structured metadata.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents both parameters (apiKey and status). The description doesn't add any parameter semantics beyond what's in the schema - it doesn't explain what 'resting orders' means in relation to the status parameter or provide context about Kalshi's order lifecycle. Baseline 3 is appropriate when schema does the documentation work.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('get') and resource ('current resting orders on Kalshi'), making the purpose specific and understandable. It doesn't explicitly distinguish from sibling tools like 'get_fills' or 'get_settlements', but the resource specificity ('resting orders') provides implicit differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'get_fills' (for executed orders) or 'cancel_intent' (for order management). The 'Advanced:' prefix is vague and doesn't clarify appropriate contexts or prerequisites for usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_scheduleBInspect

Get exchange status and trading hours

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It implies a read-only operation ('get') but doesn't specify whether it requires authentication, has rate limits, returns real-time or cached data, or details the response format. This leaves significant gaps for a tool that likely interacts with market data.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's purpose without any fluff or redundancy. It's front-loaded and wastes no words, making it ideal for quick comprehension.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's likely complexity (providing exchange status and trading hours), the description is minimal. With no annotations and no output schema, it lacks details on authentication needs, data freshness, or return structure. However, for a zero-parameter tool, it meets a basic threshold but could be more informative about behavioral aspects.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The tool has 0 parameters with 100% schema description coverage, so no parameter documentation is needed. The description appropriately doesn't mention parameters, earning a baseline high score since it doesn't need to compensate for any gaps.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('get') and resources ('exchange status and trading hours'), making it immediately understandable. However, it doesn't differentiate from sibling tools like 'get_markets' or 'get_balance' that might also provide related market information, preventing a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'get_markets' or 'get_forecast', nor does it mention any prerequisites or exclusions. It merely states what the tool does without contextual usage information.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_settlementsCInspect

Advanced: Settled contracts with realized P&L.

ParametersJSON Schema
NameRequiredDescriptionDefault
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
tickerNoFilter by market ticker
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions 'advanced' and 'realized P&L' but doesn't disclose behavioral traits like whether this is a read-only operation, authentication requirements beyond the apiKey parameter, rate limits, or what the output format looks like. For a financial data tool with no annotations, this leaves significant gaps in understanding its behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that gets straight to the point without unnecessary words. It's appropriately sized for a simple data retrieval tool, though it could be slightly more structured by separating purpose from context.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description is incomplete. It doesn't explain what 'settled contracts' entail, how 'realized P&L' is calculated, or what the return data looks like. For a financial tool with potential complexity, this leaves too much undefined for effective agent use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters (apiKey and ticker). The description doesn't add any parameter-specific information beyond what's in the schema, such as explaining how ticker filtering works or providing examples. With high schema coverage, the baseline is 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool retrieves 'settled contracts with realized P&L,' which is a clear verb+resource combination. It specifies 'advanced' context and includes 'realized P&L' to differentiate from generic settlement data. However, it doesn't explicitly distinguish from siblings like get_fills or get_orders, which might also involve financial transactions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It mentions 'advanced' but doesn't clarify what makes it advanced or when to prefer it over other financial data tools like get_fills or get_orders. There are no explicit when/when-not statements or named alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_trade_ideasAInspect

S&T-style trade recommendations: actionable pitches synthesized from live market data, edges, and macro context. Each idea has conviction level, catalyst timing, direction, and risk. No auth required. START HERE when looking for what to trade.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMax ideas (1-10)
categoryNoFilter: macro, geopolitics, crypto, policy, event
freshnessNoMax cache age: 1h, 6h, 12h12h
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It discloses key behavioral traits: 'No auth required' (permissions), and details about idea structure ('conviction level, catalyst timing, direction, and risk'). However, it doesn't mention rate limits, pagination, or response format specifics that would be helpful for a tool returning recommendations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with three sentences that each serve distinct purposes: defining the tool, detailing its output characteristics, and providing usage guidance. No wasted words or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with no annotations and no output schema, the description does well by explaining the nature of the recommendations and their components. However, it doesn't describe the return format or structure of the trade ideas, which would be valuable given the absence of an output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all three parameters. The description doesn't add any parameter-specific information beyond what's in the schema. This meets the baseline expectation when schema coverage is complete.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'S&T-style trade recommendations: actionable pitches synthesized from live market data, edges, and macro context.' It specifies the verb ('recommendations') and resource ('trade ideas'), and distinguishes it from siblings by emphasizing it's the starting point for finding trades.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidance: 'START HERE when looking for what to trade.' This clearly indicates when to use this tool versus alternatives among the many sibling tools, establishing it as the primary entry point for trade discovery.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_world_deltaAInspect

Incremental world state update — only what changed since a timestamp. ~30-50 tokens vs 800 for full state. For periodic refresh during long tasks.

ParametersJSON Schema
NameRequiredDescriptionDefault
sinceYesRelative (30m, 1h, 6h, 24h) or ISO timestamp
formatNoOutput formatmarkdown
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses key behavioral traits: the incremental nature (only changes since timestamp), performance characteristics (~30-50 tokens vs 800 for full state), and the intended use case (periodic refresh). However, it doesn't mention error handling, rate limits, or authentication needs.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise and front-loaded: the first sentence states the core purpose, the second provides performance context, and the third gives usage guidance. Every sentence earns its place with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (2 parameters, no output schema, no annotations), the description is fairly complete. It explains what the tool does, when to use it, and performance characteristics. However, it lacks details on return values or error conditions, which would be helpful since there's no output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters thoroughly. The description doesn't add any parameter-specific information beyond what's in the schema, maintaining the baseline score of 3 for adequate coverage without extra value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Incremental world state update — only what changed since a timestamp.' It specifies the verb ('update'), resource ('world state'), and scope ('incremental'), and distinguishes it from sibling tools like 'get_world_state' by emphasizing the delta approach.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use this tool: 'For periodic refresh during long tasks.' It also implies an alternative by contrasting with 'full state' (likely referring to 'get_world_state'), providing clear context for usage vs. other options.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_world_stateAInspect

Real-time world model for agents. ~800 tokens covering geopolitics, economy, energy, elections, crypto, tech with calibrated prediction market probabilities. Anchor contracts (recession, Fed, Iran) always present. No auth needed.

ParametersJSON Schema
NameRequiredDescriptionDefault
focusNoComma-separated topics for deeper coverage: geopolitics, economy, energy, elections, crypto, tech. Same token budget concentrated on fewer topics.
formatNoOutput format. markdown (default) is optimized for LLM context.markdown
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure and does so effectively. It discloses key behavioral traits: 'real-time world model,' token budget (~800 tokens), content scope, prediction market probabilities, always-present anchor contracts, and explicitly states 'No auth needed.' This provides comprehensive operational context beyond what the input schema covers.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is exceptionally concise and well-structured. Every sentence adds value: first establishes the core purpose, second specifies content scope and token budget, third mentions key features (prediction markets, anchor contracts), fourth states authentication status. Zero wasted words, perfectly front-loaded with the most important information first.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (real-time world model with multiple content domains) and no annotations or output schema, the description provides excellent coverage. It explains what the tool returns (world model with specific content areas), format considerations (token budget), key features (prediction markets, anchor contracts), and operational constraints (no auth needed). The only minor gap is lack of explicit output format details, though the input schema covers this.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already fully documents both parameters. The description doesn't add any parameter-specific information beyond what's in the schema descriptions. The baseline score of 3 is appropriate when the schema does all the parameter documentation work.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: providing a 'real-time world model for agents' with specific content coverage (~800 tokens covering geopolitics, economy, energy, elections, crypto, tech) and includes calibrated prediction market probabilities. It distinguishes itself from siblings by focusing on aggregated world state information rather than specific operations like trading, analysis, or content management.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool: for obtaining real-time world model information with prediction market probabilities. It mentions 'anchor contracts (recession, Fed, Iran) always present' which helps understand consistent coverage. However, it doesn't explicitly state when NOT to use it or name specific alternatives among the many sibling tools, though the content focus distinguishes it from most siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

inject_signalBInspect

Feed an observation into a thesis — news, price move, or external event. Consumed in next evaluation cycle to update confidence and edges.

ParametersJSON Schema
NameRequiredDescriptionDefault
typeNoSignal typeuser_note
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
contentYesSignal content
thesisIdYesThesis ID
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It mentions the tool feeds observations that are 'consumed in next evaluation cycle', implying asynchronous processing and potential delay in effect. However, it doesn't disclose critical behavioral traits: whether this is a write operation (likely, given 'inject'), authentication requirements beyond the apiKey parameter, rate limits, error conditions, or what happens if the thesisId is invalid. The description adds some context but leaves significant gaps for a mutation tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two concise sentences that are front-loaded with the core purpose. Every word earns its place: the first sentence defines the action and examples, the second explains the outcome. No wasted words or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given this is a mutation tool (injecting signals) with no annotations and no output schema, the description is incomplete. It doesn't cover authentication needs (implied by apiKey but not explained), error handling, return values, or side effects. The context about the 'next evaluation cycle' is helpful, but for a tool that modifies thesis state, more behavioral disclosure is needed to be complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters well. The description doesn't add any parameter-specific semantics beyond what's in the schema (e.g., it doesn't elaborate on 'content' format or 'thesisId' sourcing). With high schema coverage, the baseline is 3, and the description doesn't compensate with extra parameter insights.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Feed an observation into a thesis' with specific examples (news, price move, external event). It distinguishes from siblings by focusing on signal injection rather than creation, evaluation, or querying of theses. However, it doesn't explicitly differentiate from tools like 'update_thesis' or 'trigger_evaluation' that might also affect thesis state.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context ('Consumed in next evaluation cycle to update confidence and edges'), suggesting this tool is for adding observational data to inform thesis evaluations. However, it doesn't provide explicit guidance on when to use this versus alternatives like 'update_thesis', 'create_thesis', or 'trigger_evaluation', nor does it mention prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

legislationCInspect

Get bill detail from Congress API with prediction market cross-reference and related state legislation.

ParametersJSON Schema
NameRequiredDescriptionDefault
billIdYesBill ID, e.g. "119-hr-22"
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions data sources (Congress API, prediction markets, state legislation) but fails to describe key traits like rate limits, authentication needs, error handling, or response format. This is inadequate for a tool with external API dependencies.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose ('Get bill detail') and includes all key components without waste. Every word contributes to understanding the tool's scope and data sources.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of integrating multiple data sources (Congress API, prediction markets, state legislation) and no output schema, the description is incomplete. It lacks details on response structure, error cases, or behavioral constraints, which are critical for effective tool invocation in this context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with the single parameter 'billId' documented as 'Bill ID, e.g. "119-hr-22"'. The description adds no additional parameter details beyond this, so it meets the baseline score of 3 where the schema handles the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Get bill detail') and resources ('Congress API', 'prediction market cross-reference', 'related state legislation'), making the purpose specific and understandable. However, it doesn't explicitly differentiate from sibling tools like 'query_gov' or 'query', which might have overlapping functionality, preventing a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, such as 'query_gov' or other query-related siblings. It lacks context about prerequisites, exclusions, or specific use cases, leaving the agent without clear usage instructions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_intentsCInspect

List execution intents — pending, armed, triggered, executing, filled. Shows the full execution pipeline status.

ParametersJSON Schema
NameRequiredDescriptionDefault
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
statusNoFilter: pending, armed, triggered, executing, filled, expired, cancelled
activeOnlyNoOnly show active intents (default true)
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It mentions the tool lists intents and shows 'full execution pipeline status', but lacks details on behavioral traits: it doesn't specify if it's read-only (implied by 'List'), authentication requirements beyond the apiKey parameter, rate limits, pagination, or what 'full' means in terms of data returned. This is a significant gap for a tool with no annotation coverage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose. It could be slightly more structured (e.g., separating filter details), but there's no wasted text—every word contributes to understanding the tool's function.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (3 parameters, no annotations, no output schema), the description is incomplete. It doesn't explain the return format, error handling, or how the 'full execution pipeline status' is presented. For a tool with no structured output and behavioral gaps, more context is needed to make it fully usable by an AI agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters (apiKey, status, activeOnly). The description adds minimal value: it lists status values ('pending, armed, triggered, executing, filled') which partially overlaps with the schema's status description, but doesn't explain semantics beyond what the schema provides. Baseline 3 is appropriate as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('List') and resource ('execution intents'), and specifies the statuses shown ('pending, armed, triggered, executing, filled'). It distinguishes from siblings like 'get_fills' or 'get_orders' by focusing on the execution pipeline status. However, it doesn't explicitly differentiate from all siblings (e.g., 'list_skills', 'list_strategies'), so it's not a perfect 5.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites like authentication (though the apiKey parameter hints at it), nor does it compare to sibling tools like 'get_fills' or 'monitor_the_situation'. Usage is implied through the status filter but not explicitly stated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_skillsBInspect

List all skills: built-in + user-created custom skills.

ParametersJSON Schema
NameRequiredDescriptionDefault
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It mentions listing skills but fails to disclose behavioral traits like whether this is a read-only operation, if it requires authentication (implied by the apiKey parameter but not stated), potential rate limits, or the format of the returned list. This leaves significant gaps in understanding the tool's behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose without any wasted words. It is appropriately sized for a simple listing tool, making it easy to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (one parameter, no output schema) and the schema's good coverage, the description is minimally adequate. However, with no annotations and no output schema, it lacks details on behavioral aspects and return values, leaving room for improvement in completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with the single parameter 'apiKey' well-documented in the schema itself. The description adds no additional meaning about parameters, so it meets the baseline of 3 where the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('List all skills') and specifies the scope ('built-in + user-created custom skills'), which is a specific verb+resource combination. However, it does not explicitly differentiate from sibling tools like 'browse_public_skills' or 'publish_skill', which might have overlapping purposes, so it falls short of a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, such as 'browse_public_skills' or 'fork_skill', nor does it mention any prerequisites or exclusions. It simply states what the tool does without context for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_strategiesCInspect

Advanced: List trading strategies for a thesis.

ParametersJSON Schema
NameRequiredDescriptionDefault
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
statusNoFilter: active|watching|executed|cancelled|review
thesisIdYesThesis ID
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It only states the action ('List trading strategies') without detailing behavioral traits such as whether it's read-only (implied but not confirmed), pagination, rate limits, authentication needs beyond the apiKey parameter, or what the output looks like. This leaves significant gaps for a tool that likely interacts with trading data.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core action. However, the word 'Advanced:' is unnecessary and adds noise without value, slightly reducing clarity. Otherwise, it's appropriately sized with zero waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (trading strategies tool with no annotations and no output schema), the description is incomplete. It lacks details on behavioral traits (e.g., safety, output format), usage context, and doesn't compensate for the absence of annotations or output schema. This makes it inadequate for an agent to fully understand how to use the tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all three parameters (thesisId, apiKey, status) with descriptions. The description adds no additional meaning beyond implying that 'thesisId' is required (from 'for a thesis'), which is already covered in the schema. Baseline 3 is appropriate as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the verb ('List') and resource ('trading strategies') with a qualifier ('for a thesis'), which clarifies the scope. However, it's prefixed with 'Advanced:' which adds ambiguity rather than clarity, and it doesn't distinguish this tool from potential siblings like 'list_intents' or 'list_theses' beyond the resource type.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It mentions 'for a thesis' but doesn't specify prerequisites (e.g., requires an existing thesis) or compare it to other listing tools (e.g., 'list_intents' for intents vs. 'list_strategies' for strategies). There's no explicit when/when-not or alternative tool references.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_thesesCInspect

List all theses for the authenticated user.

ParametersJSON Schema
NameRequiredDescriptionDefault
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions 'authenticated user' but doesn't disclose behavioral traits like pagination, rate limits, sorting, or what 'all theses' entails (e.g., includes drafts, archived items). This leaves significant gaps for a read operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's function. It's front-loaded with the core action and resource, with no wasted words or unnecessary elaboration.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description is incomplete. It lacks details on return format (e.g., list structure, fields), error handling, or limitations (e.g., max results). For a tool with potential complexity in listing items, this leaves the agent under-informed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents the single parameter (apiKey). The description adds no additional meaning about parameters, such as how authentication affects the listing. Baseline 3 is appropriate when the schema handles parameter documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('List') and resource ('theses'), specifying it's for the authenticated user. However, it doesn't differentiate from sibling tools like 'fork_thesis' or 'update_thesis', which would require more specific scope details.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives like 'fork_thesis' or 'create_thesis'. The description only states what it does without context about prerequisites, timing, or comparisons to other thesis-related tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

monitor_the_situationAInspect

Universal web intelligence. Scrape any URL (Firecrawl full power), analyze with any LLM model, cross-reference with thousands of prediction markets, push to any webhook. Requires API key (apiKey parameter). For free demo, use enrich_content instead.

ParametersJSON Schema
NameRequiredDescriptionDefault
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
enrichNoCross-reference with prediction markets
sourceYes
webhookNoPush results to a webhook
analysisNoLLM analysis of scraped content
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions the API key requirement and the free demo alternative, which is helpful. However, it doesn't describe important behavioral aspects like rate limits, error handling, response format, or what happens during the various actions (scrape, crawl, search, etc.). For a complex tool with 5 parameters and no annotations, this leaves significant gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded: it starts with the core functionality ('Universal web intelligence') and lists key capabilities in a single sentence, then adds important constraints and alternatives. Every sentence earns its place, though it could be slightly more structured by separating capabilities from constraints.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, nested objects, no output schema) and lack of annotations, the description is somewhat complete but has gaps. It covers the tool's broad capabilities and key constraints (API key, alternative), but doesn't explain the relationships between parameters, what the output looks like, or detailed behavioral expectations. For such a multifaceted tool, more context would be helpful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 80%, which is high, so the baseline is 3 even without parameter information in the description. The description mentions the 'apiKey parameter' requirement, which adds some context beyond the schema, but doesn't explain the semantics of other parameters like 'source', 'analysis', 'enrich', or 'webhook' beyond what's already in their schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs and resources: 'Scrape any URL', 'analyze with any LLM model', 'cross-reference with thousands of prediction markets', and 'push to any webhook'. It distinguishes itself from the sibling 'enrich_content' by mentioning it as a free demo alternative, showing clear differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool versus alternatives: it states 'Requires API key' and 'For free demo, use enrich_content instead', clearly indicating prerequisites and naming a specific alternative tool. This gives clear context for when to choose this tool over others.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

post_to_forumCInspect

Post a message to the agent forum. Share discoveries, edges, coordination signals with other agents.

ParametersJSON Schema
NameRequiredDescriptionDefault
typeYes
apiKeyYesSF API key
channelYes
contentYes1-3 sentence summary
replyToNoMessage ID to reply to
tickersNoRelated tickers
agentNameNoYour agent name
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions the tool's purpose but fails to describe critical behavioral traits such as authentication requirements (implied by 'apiKey' parameter but not stated), rate limits, whether posts are public or private, or what happens on success/failure. This is inadequate for a write operation tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise—two sentences with zero wasted words. It's front-loaded with the core action and efficiently provides context about usage. Every sentence earns its place by adding value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 7 parameters (4 required), no annotations, and no output schema, the description is insufficient. It doesn't cover authentication needs, response format, error handling, or how parameters interact (e.g., 'type' vs 'channel' alignment). The agent lacks critical context to use this tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds no parameter-specific information beyond what's in the schema (71% coverage). It doesn't explain the relationship between 'type' and 'channel' enums, the purpose of 'replyTo', or formatting expectations for 'content'. With moderate schema coverage, the baseline score of 3 is appropriate as the description doesn't compensate for schema gaps.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Post a message') and target resource ('to the agent forum'), with specific examples of what to share ('discoveries, edges, coordination signals'). However, it doesn't explicitly differentiate this tool from sibling tools like 'read_forum' or 'subscribe_forum', which would require a 5.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides some context about what to post ('Share discoveries, edges, coordination signals with other agents'), but offers no guidance on when to use this tool versus alternatives like 'inject_signal' or 'create_thesis', nor does it mention prerequisites or exclusions. This leaves the agent with minimal usage direction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

publish_skillCInspect

Publish a skill to make it publicly browsable and forkable.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYesPublic URL slug (3-60 chars, lowercase, numbers, hyphens)
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
skillIdYesSkill ID to publish
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It states the outcome ('publicly browsable and forkable') but lacks critical behavioral details: whether this is a write operation (implied by 'publish'), permission requirements, rate limits, idempotency, or error conditions (e.g., invalid slug). For a mutation tool with zero annotation coverage, this is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core action and outcome without unnecessary words. Every part earns its place by conveying essential information concisely.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description is incomplete. It lacks details on behavioral traits (e.g., side effects, authentication needs), response format, or error handling. Given the complexity of publishing a skill, more context is needed to guide an agent effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with all three parameters well-documented in the schema (e.g., slug format, API key source, skill ID purpose). The description adds no parameter-specific information beyond what the schema provides, so it meets the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('publish') and resource ('a skill'), specifying the outcome ('make it publicly browsable and forkable'). It distinguishes from sibling tools like 'create_skill' (creation) and 'fork_skill' (copying), but doesn't explicitly contrast with 'browse_public_skills' (viewing) or 'list_skills' (listing).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. It doesn't mention prerequisites (e.g., needing a skill created first), exclusions (e.g., cannot publish already public skills), or comparisons to siblings like 'fork_skill' (for copying public skills) or 'create_skill' (for initial creation).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

queryAInspect

BEST FOR QUESTIONS. Ask any question about probabilities or future events. Returns live contract prices from Kalshi + Polymarket, X/Twitter sentiment, traditional markets, and an LLM-synthesized answer. No auth required.

ParametersJSON Schema
NameRequiredDescriptionDefault
qYesNatural language query (e.g. "iran oil prices", "fed rate cut 2026", "recession probability")
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key traits: the types of data sources used (Kalshi, Polymarket, X/Twitter, traditional markets), the LLM synthesis approach, and the authentication requirement ('No auth required'). However, it doesn't mention rate limits, response formats, or potential limitations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise and front-loaded, with every sentence earning its place. The first sentence establishes the primary use case, the second explains the scope and data sources, and the third provides important behavioral context about authentication - all in just three efficient sentences.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a single-parameter tool with no annotations and no output schema, the description provides good contextual completeness. It explains what the tool does, what data sources it uses, and authentication requirements. However, without an output schema, it could benefit from more information about the response format or structure.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, so the schema already documents the single parameter 'q' as a natural language query. The description doesn't add any parameter-specific information beyond what's in the schema, maintaining the baseline score of 3 for adequate but not enhanced parameter documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('Ask any question about probabilities or future events') and resources ('Returns live contract prices from Kalshi + Polymarket, X/Twitter sentiment, traditional markets, and an LLM-synthesized answer'). It distinguishes itself from siblings by focusing on general question-answering rather than specific operations like trading or data retrieval.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool ('BEST FOR QUESTIONS. Ask any question about probabilities or future events'), but doesn't explicitly state when not to use it or name specific alternatives among the many sibling tools. The guidance is helpful but could be more comparative.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

query_databentoAInspect

Free-form historical market data query via Databento. Stocks, ETFs, CME futures (WTI, Brent, bonds, VIX, FX, BTC), options. OHLCV daily/hourly/minute, trades, BBO. Max 30 days, 5 symbols, 500 rows. No auth required.

ParametersJSON Schema
NameRequiredDescriptionDefault
daysNoLookback days (default 7, max 30)
stypeNoraw_symbol (default) or continuous (for .FUT symbols)
schemaNoohlcv-1d (default), ohlcv-1h, ohlcv-1m, trades, bbo-1s, bbo-1m, statistics, definition
datasetNoDBEQ.BASIC (stocks/ETFs, default), GLBX.MDP3 (CME futures), OPRA.PILLAR (options), XNAS.BASIC (Nasdaq)
symbolsYesComma-separated symbols (max 5). Use .FUT for continuous futures. Examples: SPY, CL.FUT, ES.FUT, GLD
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well: it discloses key behavioral constraints (max 30 days, 5 symbols, 500 rows), authentication requirements ('No auth required'), and scope limitations. It doesn't mention error handling, rate limits, or response format details, but covers essential operational boundaries.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured in a single sentence that front-loads the core purpose followed by key constraints. Every element earns its place: asset classes, data types, and operational limits are all essential information with zero wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a 5-parameter query tool with no annotations and no output schema, the description provides good coverage: purpose, scope, constraints, and authentication. It could be more complete by mentioning response format or error cases, but given the schema's thorough parameter documentation, this is reasonably complete for agent use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 5 parameters thoroughly. The description adds some context by mentioning the 30-day max (relates to 'days' parameter) and 5-symbol max (relates to 'symbols'), but doesn't provide additional semantic meaning beyond what's in the schema descriptions. Baseline 3 is appropriate when schema does heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('query') and resources ('historical market data via Databento'), listing asset classes (stocks, ETFs, futures, options) and data types (OHLCV, trades, BBO). It distinguishes from siblings by specifying it's for market data queries, unlike other tools for positions, strategies, or intents.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool: for free-form historical market data queries with specific asset classes and data types. It doesn't explicitly mention when not to use it or name alternatives among siblings, but the context strongly implies it's the primary market data query tool in this set.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

query_govBInspect

Search legislative data: bills, nominations, members, CRS reports. Cross-references with prediction markets. Use for political/policy questions.

ParametersJSON Schema
NameRequiredDescriptionDefault
qYesSearch query (e.g., "save act", "fed chair nomination")
modeNoraw = skip LLM synthesis, full = include answerfull
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. While it mentions cross-referencing with prediction markets and LLM synthesis (via the 'mode' parameter), it doesn't describe important behavioral traits like whether this is a read-only operation, what permissions are needed, rate limits, pagination, or what the response format looks like. The description adds some context but leaves significant gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately concise with two sentences that efficiently convey the tool's purpose and usage context. It's front-loaded with the core functionality and avoids unnecessary verbiage. Every sentence earns its place, though it could be slightly more structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of searching legislative data with prediction market cross-referencing, no annotations, and no output schema, the description is incomplete. It doesn't explain what the tool returns, how results are structured, or important behavioral constraints. For a search tool with rich functionality and no structured output documentation, this leaves significant gaps for an AI agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters (q and mode) with descriptions and enum values. The description doesn't add any meaningful parameter semantics beyond what's in the schema, such as query syntax examples or when to choose 'raw' vs 'full' mode. Baseline 3 is appropriate when schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches legislative data across multiple resource types (bills, nominations, members, CRS reports) and cross-references with prediction markets. It specifies the verb 'search' and resources, but doesn't explicitly differentiate from sibling tools like 'legislation' or 'query' which might have overlapping functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides implied usage guidance by stating 'Use for political/policy questions,' which gives context about when this tool is appropriate. However, it doesn't explicitly state when to use this tool versus alternatives like 'legislation' or 'query' from the sibling list, nor does it provide any exclusion criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

read_forumAInspect

Read messages from the agent forum. Returns inbox (unread across subscribed channels) by default. Use channel/ticker/since to filter. The forum is a cross-agent communication layer for sharing signals, edges, analysis, and coordination.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNo
sinceNoISO cursor
apiKeyYesSF API key
tickerNoFilter by ticker
channelNoChannel: signals, edges, analysis, coordination, general
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It explains the default behavior (returns unread messages from subscribed channels) and mentions filtering capabilities, which adds useful context. However, it doesn't describe important behavioral aspects like authentication requirements (though apiKey is in schema), rate limits, pagination, or error handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with two sentences that each earn their place. The first sentence states the core purpose and default behavior, while the second explains filtering and provides context about the forum's role. There's no wasted text or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (5 parameters, no output schema, no annotations), the description provides adequate but incomplete coverage. It explains the purpose and basic usage but lacks details about authentication requirements, response format, error conditions, or how results are structured. For a read operation with filtering parameters, more behavioral context would be helpful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 80%, so the baseline would be 3. However, the description adds meaningful context beyond the schema by explaining that parameters are used for filtering ('Use channel/ticker/since to filter') and clarifying the forum's purpose ('cross-agent communication layer for sharing signals, edges, analysis, and coordination'), which helps understand parameter usage in context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('Read messages from the agent forum') and identifies the resource ('agent forum'). It distinguishes this tool from its sibling 'post_to_forum' by focusing on reading rather than posting, and from other tools that don't involve forum operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context about when to use this tool ('Returns inbox (unread across subscribed channels) by default') and mentions filtering options. However, it doesn't explicitly state when NOT to use it or name specific alternatives among sibling tools, though the distinction from 'post_to_forum' is implied.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

run_skillCInspect

Get a skill by ID or trigger to run it. Returns the skill prompt and metadata.

ParametersJSON Schema
NameRequiredDescriptionDefault
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
skillIdYesSkill ID (UUID) to retrieve
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions that the tool 'runs' a skill and returns 'prompt and metadata', but lacks details on permissions required, rate limits, side effects (e.g., if running triggers external actions), error handling, or response format. For a tool with potential execution implications, this is a significant gap in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core action ('Get a skill by ID or trigger to run it') and adds key outcome ('Returns the skill prompt and metadata'). Every word earns its place, with no redundancy or unnecessary elaboration, making it highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of a tool that 'runs' a skill (implying potential execution), no annotations, no output schema, and 100% schema coverage, the description is incomplete. It lacks details on behavioral traits (e.g., what 'run' entails, side effects), usage context, and output specifics beyond high-level mention of 'prompt and metadata'. For a tool in this context, more comprehensive guidance is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with both parameters (apiKey and skillId) well-documented in the schema. The description adds no additional meaning beyond implying 'skillId' can be an 'ID or trigger', which slightly clarifies but doesn't detail syntax or format. With high schema coverage, the baseline is 3, as the description provides minimal extra value over the structured data.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Get') and resource ('a skill'), specifying it's retrieved by 'ID or trigger' and that it 'runs' the skill. It distinguishes from siblings like 'list_skills' (which lists) and 'get_skill' (not present), but doesn't explicitly differentiate from similar tools like 'fork_skill' or 'create_skill' beyond the action. The purpose is specific but could be more precise about what 'run' entails.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites (e.g., needing a valid skill ID), exclusions (e.g., not for creating or listing skills), or compare to siblings like 'list_skills' for discovery or 'fork_skill' for copying. Usage is implied only by the action described, with no explicit context or alternatives named.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

scan_marketsAInspect

Search Kalshi prediction markets by keyword, series, or ticker. Returns live prices and volume — data not available via web search. No auth required.

ParametersJSON Schema
NameRequiredDescriptionDefault
queryNoKeyword search (e.g. "oil recession")
marketNoSpecific market ticker (e.g. KXWTIMAX-26DEC31-T140)
seriesNoKalshi series ticker (e.g. KXWTIMAX)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and adds valuable behavioral context: it discloses the return data ('live prices and volume'), data uniqueness ('data not available via web search'), and authentication requirements ('No auth required'). However, it does not mention rate limits, pagination, or error handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, with three concise sentences that each add value: the first states the purpose, the second specifies the return data and uniqueness, and the third covers authentication. There is no wasted text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (3 parameters, no output schema, no annotations), the description is largely complete: it covers purpose, usage, behavioral traits, and authentication. However, it lacks details on output format, error cases, or parameter precedence, which could be helpful for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, providing clear documentation for all three parameters. The description adds minimal value beyond the schema by mentioning the search types ('by keyword, series, or ticker'), but does not explain parameter interactions or provide additional syntax details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('Search Kalshi prediction markets') and resources ('by keyword, series, or ticker'), distinguishing it from sibling tools like 'get_markets' or 'screen_markets' by specifying the search functionality and data uniqueness ('data not available via web search').

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool ('Search Kalshi prediction markets by keyword, series, or ticker') and notes the data uniqueness ('data not available via web search'), but does not explicitly state when not to use it or name alternatives among siblings like 'get_markets' or 'screen_markets'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

screen_marketsAInspect

Indicator-based market screener. The middle layer between raw price scan and LLM thesis edges. Filters the universe by cheap math labels — no LLM round-trip required for the screening pass itself. Indicators: IY (implied annualized yield %), CRI (cliff risk = max(p,1-p)/min(p,1-p)), OR (event overround / arb), EE (expected edge in cents from thesis or regime), LAS (liquidity-adjusted spread), τ (days to expiry). Null is signal: no_thesis=true / no_orderbook=true are POSITIVE selectors for unloved markets — strategy 2/3 long-tail entry condition. Use this to bulk-discover candidates before paying any LLM cost. No auth required.

ParametersJSON Schema
NameRequiredDescriptionDefault
sortNoSort field. Default: iy
limitNoDefault 50, max 200
orderNoDefault: desc
venueNo
ee_minNoMinimum expected edge in cents (requires thesis or regime row).
iy_maxNo
iy_minNoMinimum implied yield, annualized %. Try 200 for long-tail.
or_maxNo
or_minNoMinimum event overround. 0.05 = 105¢ field (book-maker margin or arb).
cri_maxNoMaximum cliff risk = max(p,1-p)/min(p,1-p). 1=balanced, ∞=cliff.
cri_minNo
keywordNoSubstring filter on title.
las_maxNoMaximum liquidity-adjusted spread (spread/mid). Try 0.05.
categoryNocrypto, political, financial, sports, etc — kalshi-supplied category from snapshot blob
no_thesisNoPOSITIVE selector — only markets WITHOUT a thesis (unloved long tail).
has_thesisNoOnly markets covered by an active public thesis.
no_orderbookNoPOSITIVE selector — only markets WITHOUT recent orderbook attention.
tau_max_daysNoMaximum days to expiry.
tau_min_daysNo
has_orderbookNoOnly markets with cached orderbook (last 6h regime row).
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key traits: it's a screening/filtering operation (not a write), discloses 'No auth required,' explains that 'Null is signal' for selectors like 'no_thesis=true,' and mentions performance considerations ('bulk-discover candidates before paying any LLM cost'). It doesn't cover rate limits or pagination behavior, but provides substantial operational context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is information-dense but somewhat verbose and jargon-heavy (e.g., 'cliff risk,' 'overround,' 'long-tail entry condition'). It front-loads the core purpose but includes tangential strategy details. While every sentence adds value, the structure could be more streamlined for clarity, and it assumes domain knowledge, reducing accessibility.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (20 parameters, no annotations, no output schema), the description does well by explaining the tool's role, key indicators, and usage strategy. It covers authentication ('No auth required') and high-level behavior. However, it lacks details on output format, error handling, or specific examples of filter combinations, leaving some gaps for a tool of this complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has 75% description coverage, so the baseline is 3. The description adds significant value by explaining the meaning of indicators (IY, CRI, OR, EE, LAS, τ) and clarifying that 'Null is signal' for boolean selectors like 'no_thesis'—context not in the schema. However, it doesn't detail all 20 parameters individually, so it doesn't fully compensate for the 25% coverage gap, warranting a 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose as an 'indicator-based market screener' that 'filters the universe by cheap math labels,' distinguishing it from raw price scans and LLM thesis edges. It explicitly positions itself as 'the middle layer between raw price scan and LLM thesis edges,' differentiating from sibling tools like 'scan_markets' (likely raw) and 'get_edges' (likely LLM-based).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool: 'Use this to bulk-discover candidates before paying any LLM cost.' It also distinguishes it from alternatives by stating it's for 'no LLM round-trip required for the screening pass itself,' implying LLM-based tools (like 'get_edges') are for later stages. The context of 'strategy 2/3 long-tail entry condition' further clarifies its strategic role.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_xAInspect

Search X/Twitter for social sentiment on any topic. Returns posts sorted by engagement. Not available via web search — uses X API directly.

ParametersJSON Schema
NameRequiredDescriptionDefault
modeNoraw = structured JSON, summary = LLM synthesisraw
hoursNoHours to search back (default 24)
limitNoMax posts to return (default 20)
queryYesSearch query (e.g. "iran oil", "hormuz shipping")
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions the tool returns posts sorted by engagement and uses the X API directly, which adds useful context beyond the input schema. However, it doesn't cover important aspects like rate limits, authentication requirements beyond the apiKey parameter, error handling, or what the output looks like (since there's no output schema), leaving gaps for a mutation-like operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise (two sentences) and front-loaded with the core purpose. Every sentence adds critical information: the first states what the tool does and its output, the second clarifies its unique API access. There's zero wasted text, making it highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (5 parameters, no annotations, no output schema), the description is incomplete. It covers the purpose and basic behavior but lacks details on output format, error conditions, or usage constraints. While it's adequate for a simple search tool, the absence of output schema means the description should ideally hint at return values, which it doesn't.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters thoroughly. The description doesn't add any parameter-specific information beyond what's in the schema (e.g., it doesn't explain the 'query' format or 'apiKey' usage in more detail). This meets the baseline of 3 when the schema does the heavy lifting, but no extra value is provided.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('search', 'returns') and resources ('X/Twitter', 'social sentiment', 'posts'), and distinguishes it from web search by specifying it uses the X API directly. This makes the purpose unambiguous and differentiated from potential alternatives.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool (searching X/Twitter for social sentiment) and explicitly states it's not available via web search, which helps differentiate it from general search tools. However, it doesn't mention when not to use it or name specific alternatives among the many sibling tools, which prevents a perfect score.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

subscribe_forumCInspect

Subscribe to forum channels to receive messages in your inbox.

ParametersJSON Schema
NameRequiredDescriptionDefault
apiKeyYesSF API key
channelsYesChannel IDs to subscribe to
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden but lacks behavioral details. It states the tool subscribes to channels but doesn't disclose if this is a one-time action or recurring, what permissions are needed (e.g., user authentication via apiKey), potential side effects (e.g., inbox notifications), or error conditions. The description is minimal and misses key operational context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core action ('Subscribe to forum channels') and outcome. There is no wasted wording, and it's appropriately sized for a simple tool with two parameters.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description is incomplete. It doesn't cover behavioral aspects (e.g., subscription mechanics, error handling) or return values (e.g., success confirmation). For a tool that likely involves user configuration and inbox integration, more context is needed to guide effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so parameters are documented in the schema. The description doesn't add meaning beyond the schema—it doesn't explain what 'channels' represent (e.g., forum names or IDs) or how the subscription works with the apiKey. Baseline 3 is appropriate as the schema handles parameter documentation adequately.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Subscribe') and resource ('forum channels'), specifying the outcome ('receive messages in your inbox'). It distinguishes from siblings like 'read_forum' (which likely reads existing messages) and 'post_to_forum' (which posts messages), but doesn't explicitly contrast them. The purpose is specific and actionable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites (e.g., needing valid channel IDs), when not to use it (e.g., if already subscribed), or refer to sibling tools like 'read_forum' for reading messages versus subscribing to receive them. Usage is implied but not articulated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

trigger_evaluationAInspect

Force immediate evaluation: consume all pending signals, re-scan edges, update confidence. Use after injecting important signals.

ParametersJSON Schema
NameRequiredDescriptionDefault
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
thesisIdYesThesis ID
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses key behavioral traits: it's a forced, immediate action that processes pending signals and updates confidence, implying mutation and potential side effects. However, it lacks details on permissions, rate limits, or what 'update confidence' entails in practice. The description doesn't contradict any annotations, as none exist.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise and front-loaded: two sentences that directly state the action and usage context with zero waste. Every word earns its place, making it easy for an AI agent to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (a mutation tool with no annotations and no output schema), the description is somewhat complete but has gaps. It explains what the tool does and when to use it, but lacks details on behavioral aspects like error handling, response format, or prerequisites. For a tool that forces evaluation, more context on outcomes or limitations would be helpful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters (thesisId and apiKey) with descriptions. The description doesn't add any meaning beyond what the schema provides, such as explaining how these parameters relate to the evaluation process. Baseline 3 is appropriate when the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose as 'Force immediate evaluation' with specific actions: 'consume all pending signals, re-scan edges, update confidence.' This is a specific verb+resource combination that distinguishes it from siblings like 'get_evaluation_history' or 'monitor_the_situation.' However, it doesn't explicitly differentiate from all possible alternatives like 'update_thesis' which might also affect evaluation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool: 'Use after injecting important signals.' This gives a specific scenario, but it doesn't explicitly state when not to use it or name alternatives among siblings. For example, it doesn't contrast with 'get_evaluation_history' for passive monitoring or 'update_thesis' for manual updates.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

update_nodesAInspect

Directly update causal tree node probabilities — zero LLM cost, instant. Use when agent observes a confirmed fact (e.g. "CPI came in at 3.2%") to immediately lock the node. Recomputes confidence automatically.

ParametersJSON Schema
NameRequiredDescriptionDefault
lockNoNode IDs to lock (prevents future LLM changes)
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
updatesYesNode updates to apply
thesisIdYesThesis ID
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behavioral traits: the tool performs updates (mutations), has 'zero LLM cost, instant' execution characteristics, can 'lock nodes' to prevent future changes, and 'recomputes confidence automatically.' It doesn't mention rate limits, authentication requirements beyond the apiKey parameter, or error conditions, but provides substantial operational context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with two sentences that each earn their place. The first sentence states the core functionality and key characteristics, while the second provides usage guidance with a concrete example. There is zero wasted text, and the most important information is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description provides good contextual completeness. It covers the tool's purpose, when to use it, key behavioral characteristics (cost, speed, locking, recomputation), and connects to parameter usage. The main gap is the lack of information about return values or error responses, which would be helpful given there's no output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 4 parameters thoroughly. The description adds some semantic context by mentioning 'node IDs to lock' and suggesting 'Use 0.99 for confirmed facts' for probability values, but doesn't provide significant additional parameter meaning beyond what's in the schema. The baseline of 3 is appropriate when the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('update causal tree node probabilities') and resources ('causal tree node'), distinguishing it from sibling tools like update_position or update_thesis. It provides concrete context about what type of data is being manipulated (node probabilities in a causal tree).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use this tool ('Use when agent observes a confirmed fact') and provides a concrete example ('CPI came in at 3.2%'). It also distinguishes this tool's purpose from potential alternatives by specifying it's for 'directly updating' probabilities with 'zero LLM cost, instant' execution.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

update_positionBInspect

Update a position: current price, edge, size, status (open→closed). Use to mark positions as closed or update tracking data.

ParametersJSON Schema
NameRequiredDescriptionDefault
edgeNoCurrent edge in cents
sizeNoUpdated size
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
statusNoopen or closed
thesisIdYesThesis ID
rationaleNoUpdated rationale
positionIdYesPosition ID
currentPriceNoCurrent market price in cents
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions the tool can update fields and change status from open→closed, implying mutation. However, it doesn't disclose critical behavioral traits: whether this requires specific permissions, if updates are reversible, what happens to unchanged fields, error conditions, or response format. For a mutation tool with zero annotation coverage, this is a significant gap.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded: two concise sentences that immediately state the purpose and usage. Every word earns its place with no redundancy or fluff. It efficiently communicates core information without wasting space.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (mutation with 8 parameters), lack of annotations, and no output schema, the description is incomplete. It should explain more about behavioral aspects (e.g., permissions, side effects, response format) and usage constraints. The description alone doesn't provide enough context for safe and effective use by an AI agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 8 parameters thoroughly. The description lists some parameters (current price, edge, size, status) but doesn't add meaning beyond what the schema provides—it doesn't explain relationships between parameters or usage nuances. With high schema coverage, the baseline is 3 even without extra param info in the description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Update a position' with specific fields (current price, edge, size, status). It distinguishes from 'add_position' (create new) and 'close_position' (only close), but doesn't explicitly contrast with 'update_thesis' or 'update_strategy' which might be related operations. The verb+resource+scope is well-defined but sibling differentiation is incomplete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides some usage context: 'Use to mark positions as closed or update tracking data.' This implies when to use it (closing positions or updating tracking data) but doesn't explicitly state when NOT to use it or mention alternatives like 'close_position' for just closing or 'update_thesis' for higher-level updates. The guidance is helpful but not comprehensive.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

update_strategyCInspect

Advanced: Update a trading strategy (stop loss, take profit, status).

ParametersJSON Schema
NameRequiredDescriptionDefault
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
statusNoNew status: active|watching|executed|cancelled|review
priorityNoNew priority
stopLossNoNew stop loss (cents)
thesisIdYesThesis ID
rationaleNoUpdated rationale
entryAboveNoNew entry above trigger (cents)
entryBelowNoNew entry below trigger (cents)
strategyIdYesStrategy ID (UUID)
takeProfitNoNew take profit (cents)
softConditionsNoUpdated soft conditions
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions updating fields but doesn't disclose behavioral traits like required permissions, whether changes are reversible, rate limits, or error conditions. For a mutation tool with 11 parameters, this is a significant gap in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence with no wasted words. However, 'Advanced:' is unnecessary and doesn't add value, slightly reducing effectiveness. It's front-loaded with the core action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex mutation tool with 11 parameters and no annotations or output schema, the description is inadequate. It lacks context on permissions, side effects, return values, or error handling. The agent would struggle to use this tool correctly without additional information.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema documents all parameters thoroughly. The description lists three example fields (stop loss, take profit, status) but doesn't add meaning beyond what the schema provides. Baseline 3 is appropriate as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool updates a trading strategy with specific fields (stop loss, take profit, status), which is clear but vague. It uses 'Advanced:' which adds no clarity. It doesn't distinguish from sibling tools like 'update_position' or 'update_thesis', leaving ambiguity about scope.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like 'update_position' or 'update_thesis'. The 'Advanced:' prefix might imply complexity but doesn't specify prerequisites or contexts. The agent must infer usage from the description alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

update_thesisBInspect

Update thesis metadata: title, status (active/paused/archived), webhookUrl. Use to pause monitoring, rename, or configure webhook notifications.

ParametersJSON Schema
NameRequiredDescriptionDefault
titleNoNew thesis title
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
statusNoNew status: active, paused, archived
thesisIdYesThesis ID
webhookUrlNoWebhook URL for confidence change notifications (HTTPS only)
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. While it mentions the tool updates metadata and gives usage scenarios, it doesn't disclose critical behavioral traits: whether this requires specific permissions, if changes are reversible, what happens to existing metadata not mentioned, rate limits, or what the response looks like. For a mutation tool with zero annotation coverage, this is a significant gap.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded: the first sentence states the core purpose, and the second sentence provides usage guidance. Every sentence earns its place with zero waste or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given this is a mutation tool with no annotations and no output schema, the description is incomplete. It should disclose more about behavioral aspects (permissions, reversibility, response format) and provide clearer differentiation from sibling update tools. The description does the minimum for purpose and usage but lacks critical context for safe and effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 5 parameters thoroughly. The description mentions three parameters (title, status, webhookUrl) but doesn't add meaning beyond what the schema provides. It doesn't explain the relationship between parameters or provide additional context about their usage. Baseline 3 is appropriate when schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool updates thesis metadata with specific fields (title, status, webhookUrl) and provides the verb 'update' with the resource 'thesis metadata'. It distinguishes from siblings like 'create_thesis' (creation) and 'list_theses' (listing), but doesn't explicitly contrast with other update tools like 'update_strategy' or 'update_position'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool: 'to pause monitoring, rename, or configure webhook notifications'. This gives practical scenarios (pause, rename, configure) that help guide usage. However, it doesn't explicitly state when NOT to use it or mention alternatives among sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

what_ifBInspect

Scenario analysis: "what if OPEC cuts production?" Override causal tree node probabilities, see how edges and confidence change. Zero LLM cost, instant.

ParametersJSON Schema
NameRequiredDescriptionDefault
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
thesisIdYesThesis ID
overridesYesNode probability overrides, e.g. {"n1": 0.1, "n3": 0.2}
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It mentions 'Zero LLM cost, instant' which hints at performance and cost efficiency, but lacks details on permissions, rate limits, side effects (e.g., whether overrides are saved or transient), or response format. For a mutation-like tool (overrides), this is insufficient behavioral disclosure.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with two sentences: one stating the purpose with an example, and another highlighting benefits. It's front-loaded with the core functionality. However, the first sentence could be slightly more structured, and the example is embedded in quotes, which is minorly distracting.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations, no output schema, and a tool that involves overrides (potentially mutation-like), the description is incomplete. It lacks information on what the tool returns (e.g., updated edges/confidence), error handling, or dependencies. The mention of 'Zero LLM cost, instant' adds some context but doesn't compensate for major gaps in behavioral and output details.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all three parameters (thesisId, apiKey, overrides) with descriptions. The description adds minimal value by mentioning 'Override causal tree node probabilities' which aligns with the 'overrides' parameter, but doesn't provide additional syntax or format details beyond what the schema specifies. Baseline 3 is appropriate given high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs 'scenario analysis' with the example 'what if OPEC cuts production?' and specifies the action of overriding node probabilities to see changes in edges and confidence. It distinguishes from siblings like 'update_nodes' by focusing on hypothetical analysis rather than direct updates. However, it doesn't explicitly name the resource (e.g., 'causal tree' or 'thesis') beyond context.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for scenario analysis with overrides, mentioning 'Zero LLM cost, instant' as a benefit. It doesn't explicitly state when to use this tool versus alternatives like 'update_nodes' (which might modify actual data) or 'explore_public' (for browsing), leaving some ambiguity. No exclusions or prerequisites are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

x_accountCInspect

Advanced: Recent posts from a specific X/Twitter account.

ParametersJSON Schema
NameRequiredDescriptionDefault
hoursNoHours to look back (default 24)
limitNoMax posts (default 20)
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
usernameYesX/Twitter username (without @)
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It mentions 'Advanced' and 'Recent posts' but doesn't explain what 'Advanced' entails, whether this is a read-only operation, rate limits, authentication requirements beyond the apiKey parameter, or what the output format looks like. For a tool with external API integration and no annotations, this leaves significant behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is brief and front-loaded with the core purpose. The single sentence is efficient with no wasted words. However, the 'Advanced:' prefix adds minimal value without explanation, slightly reducing effectiveness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (external API integration, 4 parameters, no output schema, and no annotations), the description is incomplete. It doesn't address authentication behavior, rate limits, error handling, or output format. For a tool that interacts with an external service and has no structured output documentation, more contextual information is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters thoroughly. The description doesn't add any parameter-specific information beyond what's in the schema (e.g., it doesn't explain how 'username' relates to X/Twitter handles or what 'hours' means operationally). With high schema coverage, the baseline score of 3 is appropriate as the description doesn't compensate with additional semantic context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: retrieving recent posts from a specific X/Twitter account. It specifies the resource (posts) and action (retrieving recent ones), making the verb+resource relationship explicit. However, it doesn't distinguish this tool from its sibling 'search_x' or 'x_news', which appear to be related X/Twitter tools, so it misses full sibling differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention sibling tools like 'search_x' or 'x_news', nor does it specify prerequisites or constraints beyond what's implied in the parameters. The word 'Advanced' is vague and doesn't offer practical usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

x_newsCInspect

Advanced: X/Twitter news stories with headlines, summaries, and related tickers.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMax stories (default 10)
queryYesNews search query
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions 'Advanced' but doesn't explain what that entails (e.g., rate limits, authentication needs, data freshness, or pagination). The description lacks details on what the tool returns, error handling, or any side effects, leaving significant gaps for a tool with external API dependencies.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with a single sentence that front-loads key information ('Advanced: X/Twitter news stories...'). There's no wasted text, and it efficiently communicates the core functionality without unnecessary elaboration.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (external API integration, 3 parameters, no output schema, and no annotations), the description is incomplete. It doesn't cover behavioral aspects like authentication, rate limits, return format, or error handling. Without annotations or output schema, the description should provide more context to guide effective use, but it falls short.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters (limit, query, apiKey). The description doesn't add any meaningful semantics beyond what's in the schema, such as query format examples, apiKey usage context, or limit constraints. Baseline 3 is appropriate when the schema handles parameter documentation adequately.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: retrieving X/Twitter news stories with specific attributes (headlines, summaries, related tickers). It uses a specific verb ('news stories') and resource (X/Twitter), but doesn't explicitly differentiate from sibling tools like 'search_x' or 'query', which might have overlapping functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. The description mentions 'Advanced' but doesn't explain what makes it advanced or when to prefer it over siblings like 'search_x' or 'query'. There's no mention of prerequisites, exclusions, or specific contexts for usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

x_volumeCInspect

Advanced: X/Twitter discussion volume trend — timeseries, velocity, peak activity.

ParametersJSON Schema
NameRequiredDescriptionDefault
hoursNoHours to look back (default 72)
queryYesSearch query
apiKeyYesSimpleFunctions API key. Get one at https://simplefunctions.dev/dashboard/keys
granularityNoTimeseries granularityhour
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It mentions what metrics are returned (timeseries, velocity, peak activity) but doesn't cover critical aspects: authentication requirements (implied by apiKey parameter but not stated), rate limits, data freshness, whether it's a read-only operation, or what format the output takes. For a tool with external API dependencies, this is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is brief and front-loaded with the core purpose. The single sentence efficiently conveys the tool's focus. However, the 'Advanced:' prefix adds minimal value and could be integrated more smoothly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 4 parameters, no annotations, and no output schema, the description is incomplete. It doesn't explain the relationship between parameters (e.g., how query interacts with hours/granularity), doesn't describe the output format, and doesn't cover authentication or rate limiting considerations. The description should do more given the lack of structured metadata.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all 4 parameters. The description doesn't add any parameter-specific information beyond what's in the schema. It mentions 'timeseries' which relates to the granularity parameter, but doesn't explain this connection. Baseline 3 is appropriate when schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool analyzes Twitter/X discussion volume trends with specific metrics (timeseries, velocity, peak activity). It distinguishes itself from sibling 'search_x' by focusing on volume trends rather than search results. However, it doesn't specify the exact verb (e.g., 'analyze' or 'retrieve') which prevents a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'search_x' or 'x_account'. It doesn't mention prerequisites, use cases, or exclusions. The word 'Advanced' hints at complexity but doesn't clarify appropriate contexts.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Sign in to create a connector for this server.