ByteRay AI
Server Details
AI-augmented binary vulnerability analysis with 38 MCP tools for taint tracing and zero-day hunting
- Status
- Healthy
- Last Tested
- Transport
- Streamable HTTP
- URL
Glama MCP Gateway
Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.
Full call logging
Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.
Tool access control
Enable or disable individual tools per connector, so you decide what your agents can and cannot do.
Managed credentials
Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.
Usage analytics
See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.
Tool Definition Quality
Average 3.7/5 across 37 of 39 tools scored. Lowest: 3.1/5.
Tools are generally well-distinguished by granularity and direction (e.g., get_call_tree for multi-level traversal vs who_calls/what_calls for immediate neighbors, read_block vs get_code_blocks for content vs metadata). However, the subtle distinctions between the various trace_* and read_* tools, plus the overlap between get_call_tree and who_calls/what_calls, could cause occasional misselection.
Excellent consistent snake_case convention throughout. All tools follow a clear verb_noun pattern (get_, list_, read_, search_, trace_, check_, find_, ask_, goto_, inspect_, open_, resolve_, upload_, navigate_, who_, what_). Even interrogative forms like who_calls and what_calls maintain consistency with each other and the overall schema.
With 39 tools, this significantly exceeds the 25+ threshold for 'too many.' While binary analysis is a complex domain requiring multiple views (assembly, IL, pseudocode, SSA), the surface area is heavy enough to risk overwhelming context windows and complicating tool selection, despite the domain justification.
Provides comprehensive coverage of static binary analysis workflows: loading (upload/open), triage (binary_info, file_hash), discovery (list_functions, search_*, find_attack_surface), navigation (goto_address, inspect_address), multi-layer code views (read_assembly, read_pseudocode, read_ssa_form), control flow (get_control_flow, navigate_cfg), and data flow tracing (trace_*, check_sanitization). Minor gaps include no session cleanup/close_binary, no export/save functionality, and no annotation capabilities.
Available Tools
39 toolsask_aiAInspect
Returns session context for the binary (metadata, security flags, findings count). By default only returns context so the calling LLM can reason about it directly. Set use_ai=true to additionally call the Anthropic API for a threat-model analysis (requires ANTHROPIC_API_KEY; costs extra tokens).
| Name | Required | Description | Default |
|---|---|---|---|
| use_ai | No | Set to true to call the Anthropic API for AI analysis. Default false returns session context only (recommended when an LLM is already driving the conversation, to avoid burning extra API tokens). | |
| question | Yes | Your question about the binary or analysis | |
| session_id | Yes | Session ID |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full disclosure burden. It successfully warns about external API costs and authentication requirements ('requires ANTHROPIC_API_KEY') for the AI mode. It could improve by mentioning error behavior if the key is missing or rate limiting.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two well-structured sentences. The first establishes default behavior and return value; the second covers the optional AI mode with its requirements. Zero waste—every word conveys essential information about functionality, cost, or authentication.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the lack of output schema, the description adequately explains what data is returned (metadata, security flags, findings count) and the behavioral difference between modes. It covers the complexity of the optional AI integration sufficiently, though output format specifics would further improve completeness.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, establishing a baseline of 3. The description adds value by specifying what session context contains (metadata, security flags, findings count) and highlighting the ANTHROPIC_API_KEY requirement for use_ai mode, which isn't mentioned in the parameter schema descriptions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool returns 'session context for the binary (metadata, security flags, findings count)' and distinguishes this from low-level binary inspection siblings by noting it provides context 'so the calling LLM can reason about it directly.' It also clarifies the optional AI analysis mode.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
While it doesn't explicitly contrast with specific sibling tools, it provides clear guidance on when to use use_ai=true vs false, noting cost implications ('costs extra tokens') and recommending default mode 'when an LLM is already driving the conversation, to avoid burning extra API tokens.'
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
check_sanitizationAInspect
Analyze code around an address for sanitization and validation patterns. Detects bounded copy functions (strncpy, snprintf), input validation (isalnum, isdigit), encoding/escaping, parameterized queries, path canonicalization, numeric conversion, bounds checks (sizeof comparisons), null termination, and return value checking. Returns confidence level and evidence list.
| Name | Required | Description | Default |
|---|---|---|---|
| address | Yes | Address within the code to check for sanitization patterns | |
| session_id | Yes | Session ID | |
| context_blocks | No | Number of surrounding basic blocks to include in the analysis (default 2) |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It successfully discloses what patterns are detected and that it returns a 'confidence level and evidence list', but omits operational details like whether the analysis is read-only, performance characteristics, or error handling for invalid addresses.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two well-structured sentences with zero waste. First sentence establishes purpose; second enumerates specific detection targets and return format. Information is front-loaded and density is high.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the absence of an output schema, the description partially compensates by stating it returns a 'confidence level and evidence list'. For a 3-parameter analysis tool, the description adequately covers the detection scope, though explicit mention of the return structure format would improve completeness.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, establishing a baseline of 3. The description mentions 'code around an address' which loosely references the address and context_blocks parameters, but adds no specific syntax details, validation rules, or usage examples beyond what the schema already provides.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states the tool analyzes code for 'sanitization and validation patterns' with specific examples (strncpy, snprintf, isalnum, etc.). It effectively distinguishes from general code inspection siblings like inspect_address or read_assembly by specifying the security-focused detection scope.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
While the implied usage is clear (use when checking for security sanitization), there is no explicit guidance on when to prefer this over similar analysis tools like inspect_address or trace_data_flow, nor are prerequisites (like having opened a binary) stated.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
find_attack_surfaceAInspect
Scan the binary for potential source and sink functions. Categorizes them by vulnerability type (command injection, buffer overflow, format string, path traversal, etc.). Includes firmware-specific sources/sinks. Returns suggestions for where to focus analysis.
| Name | Required | Description | Default |
|---|---|---|---|
| session_id | Yes | Session ID from open_binary |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It effectively discloses the analytical behavior (categorization by vulnerability type, firmware-specific handling) but omits operational traits: it does not explicitly state the tool is read-only/non-destructive, mention performance characteristics, or describe the return value structure (only that 'suggestions' are returned).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description consists of four efficient sentences with zero redundancy. It is front-loaded with the core action ('Scan the binary...'), followed by categorization details, specific domain handling (firmware), and output purpose. Every sentence earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the lack of output schema, the description partially compensates by stating the tool returns 'suggestions for where to focus analysis' and listing vulnerability categories covered. However, it does not describe the return format (e.g., JSON structure, function addresses, severity ratings) or explain session lifecycle implications, leaving gaps for a complex security analysis tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage (session_id is fully documented), establishing a baseline score of 3. The description does not mention parameters, but the schema adequately documents the single required session identifier without needing additional semantic context from the description.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool scans binaries for source/sink functions and categorizes by vulnerability type (command injection, buffer overflow, etc.). It specifies the scope includes firmware-specific sources/sinks. However, it does not explicitly distinguish this from security-related siblings like `check_sanitization` or `trace_data_flow`.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides implied usage guidance through 'Returns suggestions for where to focus analysis,' indicating this is a reconnaissance/triage tool. However, it lacks explicit prerequisites (beyond the schema's session_id reference), when-not-to-use guidance, or named alternatives for different analysis depths.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_binary_infoAInspect
Get cached triage data for an already-opened binary. Includes security posture (PIE, NX, canary, RELRO), compiler hints, firmware detection (VxWorks, UEFI, bare-metal).
| Name | Required | Description | Default |
|---|---|---|---|
| session_id | Yes | Session ID from open_binary |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It discloses the 'cached' nature of the data (important behavioral trait) and enumerates specific data categories returned. However, it omits safety declarations (read-only assumption), error conditions (invalid session handling), or idempotency characteristics that annotations would typically cover.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences with zero waste. The first sentence establishes the core operation and prerequisite (cached data, opened binary). The second sentence elaborates specific data categories (PIE, NX, canary, RELRO, VxWorks, UEFI) without redundancy. Information is front-loaded and dense.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Despite lacking an output schema, the description comprehensively enumerates the specific data categories returned (security features, compiler hints, firmware types). For a single-parameter getter, this is sufficient, though it could marginally improve by indicating the return structure (e.g., 'returns a dictionary with keys...').
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100% (session_id fully documented as 'Session ID from open_binary'). The description reinforces this workflow requirement with 'already-opened binary' but does not add parameter-specific details (format, length constraints, examples) beyond what the schema already provides, warranting the baseline score.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses specific verb 'Get' with clear resource 'cached triage data' and scope 'already-opened binary'. It effectively distinguishes from sibling 'open_binary' by emphasizing the binary must already be opened, and from other analysis tools by specifying it returns metadata (security posture, compiler hints) rather than code or structures.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The phrase 'already-opened binary' provides clear context that this requires a prior session, reinforced by the schema parameter description referencing 'open_binary'. However, it lacks explicit 'when not to use' guidance or named alternatives for different analysis needs (e.g., distinguishing from get_file_hash or list_sections).
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_call_treeAInspect
Build a multi-level call tree. "callers" walks upward (who calls this function, and who calls them). "callees" walks downward (what does this function call, and what do they call). Use to answer "how does main reach system?" or "what vulnerable functions does this handler call?".
| Name | Required | Description | Default |
|---|---|---|---|
| depth | No | How many levels deep to traverse (default 3, max 5) | |
| address | Yes | Function address to start from | |
| direction | No | Direction: "callers" (who calls this?) or "callees" (what does this call?) | |
| session_id | Yes | Session ID |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It explains the traversal directions effectively but omits critical behavioral details: whether it handles cyclic call graphs, the output format/structure, traversal algorithm (BFS vs DFS), or computational limits beyond the depth parameter's max 5 constraint in the schema.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three tightly structured sentences with zero waste: first defines the action, second explains the key parameter values with directional metaphors, third gives concrete use cases. Every sentence earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the lack of annotations and output schema, the description adequately covers selection criteria but leaves gaps regarding the return data structure and explicit differentiation from similar sibling tools (who_calls/what_calls). For a complex tree-traversal operation, it should disclose cycle-handling or memory usage concerns.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Despite 100% schema coverage (baseline 3), the description adds valuable conceptual semantics. It maps 'callers' to 'upward' traversal and 'callees' to 'downward', and clarifies the recursive nature ('who calls them') that the schema's parameter descriptions don't convey.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly defines the tool as building a 'multi-level call tree' and specifies the two traversal modes (callers/callees). It implies distinction from single-level siblings like 'who_calls' and 'what_calls' through the 'multi-level' and 'walks upward/downward' phrasing, though it doesn't explicitly name those alternatives.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides concrete usage scenarios ('how does main reach system?', 'what vulnerable functions does this handler call?') that clarify when to use each direction. However, it lacks explicit guidance on when NOT to use this tool (e.g., for single-level queries where 'who_calls' might suffice) or performance considerations for deep trees.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_code_blocksBInspect
Get basic blocks of a function: start/end addresses, block size, outgoing edges (branch targets), incoming edges.
| Name | Required | Description | Default |
|---|---|---|---|
| address | Yes | Address (decimal or hex string like "0x401000") | |
| session_id | Yes | Session ID |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It discloses what data is returned (block structure and edges) but omits behavioral traits like error handling (what if address isn't a function?), side effects, caching behavior, or whether this triggers analysis.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single sentence that is front-loaded with the action verb 'Get' and efficiently enumerates the returned fields. No redundant or wasted language.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no output schema exists, the description partially compensates by listing the fields returned (addresses, edges, size). However, it lacks structural context (is the return value a list or object?) and omits edge cases for this moderately complex binary analysis operation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage for both parameters (session_id and address), establishing a baseline score of 3. The description adds no additional parameter semantics, such as noting that the address parameter accepts hex strings or that it represents the function's entry point.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly identifies the resource (basic blocks), scope (of a function), and specific attributes returned (start/end addresses, edges, size). However, it doesn't explicitly differentiate from siblings like 'get_function_detail' or 'read_block' despite listing CFG-specific details that help imply the distinction.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance provided on when to use this versus alternatives like 'get_control_flow' or 'read_block', nor does it mention prerequisites such as requiring an active session or that the address must point to a function start.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_control_flowAInspect
Get a control flow graph diagram (Mermaid or DOT format) for a function. Shows how execution flows through blocks, branches, and loops.
| Name | Required | Description | Default |
|---|---|---|---|
| format | No | Output format: "mermaid" or "dot" (default "mermaid") | |
| address | Yes | Function address | |
| session_id | Yes | Session ID |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. It successfully discloses output format options (Mermaid/DOT) and content (execution flow through blocks/branches), but omits safety traits (read-only status), error conditions, or return structure since no output schema exists.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two efficient sentences with zero waste. Front-loaded with output formats and purpose; second sentence adds value by describing the semantic content (blocks, branches, loops) rather than repeating parameter names.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Appropriate for the complexity: explains what the output contains (diagram showing control flow elements) despite lacking output schema. Would benefit from explicit statement about return type (string vs. file path) or read-only safety given zero annotations.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, establishing baseline 3. Description reinforces the 'function' context for the address parameter and mentions format options, but does not add substantial semantic detail beyond what the schema already provides (e.g., explaining session_id context or format selection criteria).
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states the tool 'Get[s] a control flow graph diagram' with specific output formats (Mermaid/DOT) and target (a function). The mention of 'blocks, branches, and loops' distinguishes it from sibling tools like get_call_tree or read_assembly.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Implies usage through specificity of output formats (diagram generation vs. raw data), but lacks explicit when-to-use guidance compared to siblings like navigate_cfg or get_code_blocks. No explicit exclusions or alternatives named.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_file_hashAInspect
Compute and return file hashes: SHA-256, SHA-1, MD5. Also returns file size.
| Name | Required | Description | Default |
|---|---|---|---|
| session_id | Yes | Session ID from open_binary |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Without annotations, the description carries the burden of behavioral disclosure. It successfully specifies what gets returned (hashes and file size) but omits safety characteristics (read-only status), performance implications (full file read), or error conditions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences with zero waste: first states the primary operation and specific algorithms, second adds the supplementary return value. Information is front-loaded and density is appropriate for the tool's simplicity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a single-parameter utility tool with no output schema, the description adequately covers the return values (hashes and size). Minor gap: could clarify this operates on the binary associated with the session, though this is implied by the tool name and sibling context.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage for the single session_id parameter, the schema adequately documents inputs. The description adds no parameter semantics, but per rubric guidelines, high schema coverage establishes a baseline of 3 without requiring compensatory description.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description uses specific verb 'Compute' with clear resource 'file hashes' and enumerates exact algorithms (SHA-256, SHA-1, MD5). No sibling tools perform hashing functions, making the purpose distinct in the binary analysis toolkit.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Description provides no guidance on when to use this tool versus alternatives like get_binary_info, nor does it mention prerequisites. While the schema parameter description references 'Session ID from open_binary', the tool description itself lacks explicit workflow guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_function_detailAInspect
Detailed info for one function: name, address, size, type signature, parameters, stack frame layout, calling convention, whether it can return, whether it is a thunk.
| Name | Required | Description | Default |
|---|---|---|---|
| address | Yes | Address (decimal or hex string like "0x401000") | |
| session_id | Yes | Session ID |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. It compensates by enumerating all returned fields (type signature, stack frame layout, calling convention, thunk status), effectively substituting for the missing output schema. Does not mention error conditions (e.g., invalid address) or side effects.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single sentence, front-loaded with purpose ('Detailed info for one function'), followed by colon-separated list of return values. No filler words; every element specifies concrete data returned.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the lack of output schema, the description achieves completeness by exhaustively listing all function attributes returned (name through thunk status). For a 2-parameter read-only analysis tool, this level of output documentation is sufficient.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with clear descriptions for both 'session_id' and 'address'. The tool description mentions 'address' as a returned field but does not add semantic context to the input parameter beyond the schema's 'Address (decimal or hex...)' definition. Baseline 3 appropriate given schema completeness.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Specific verb ('Detailed info') and resource ('one function') with explicit scope limitation. The enumerated attributes (stack frame, calling convention, thunk status) clearly distinguish this from sibling list_functions (bulk listing) and inspect_address (general address inspection).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides clear context that this retrieves comprehensive metadata for a single function (implied by 'one function' and detailed field list), distinguishing it from bulk listing operations. Lacks explicit 'when-not' guidance or named alternatives, but the scope is unambiguous.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_references_fromAInspect
Find all code and data references FROM an address. Answers "what does this instruction reference?"
| Name | Required | Description | Default |
|---|---|---|---|
| address | Yes | Address (decimal or hex string like "0x401000") | |
| session_id | Yes | Session ID |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description must carry the full disclosure burden. It specifies that both code and data references are found, but fails to indicate whether this is a read-only operation, what format results are returned in, or any limitations on reference depth/count. For a binary analysis tool, the absence of safety/scope details is a significant gap.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description consists of two efficient sentences with zero redundancy. The first sentence states the core action immediately, while the second provides an intuitive framing question. Every word serves a purpose in conveying the tool's directional nature.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the absence of an output schema and annotations, the description is minimally adequate for a two-parameter tool. However, it lacks information about return value structure (critical for a reference-finding tool) and does not explicitly contrast with 'get_references_to' despite the clear sibling relationship. It meets baseline needs but leaves operational gaps.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the structured fields already document both 'address' (including format hints) and 'session_id'. The description adds semantic context that the address represents an instruction location ('what does this instruction reference'), but does not elaborate on parameter interactions or provide examples beyond what the schema already offers.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool finds 'all code and data references FROM an address' and explicitly frames the question as 'what does this instruction reference?' The capitalized 'FROM' effectively distinguishes this tool from its sibling 'get_references_to', establishing a clear directional scope for the operation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
While the description implies usage context (analyzing what an instruction references), it lacks explicit guidance on when to use this versus the sibling 'get_references_to' tool. It does not state prerequisites (e.g., requiring an active session) or exclude inappropriate use cases.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_references_toAInspect
Find all code and data references TO an address. Answers "who reads/writes/calls this?"
| Name | Required | Description | Default |
|---|---|---|---|
| address | Yes | Address (decimal or hex string like "0x401000") | |
| session_id | Yes | Session ID |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden. It discloses that the tool finds references (read operation implied), but does not describe the return format, whether results include indirect references, performance characteristics, or error behaviors.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences with zero waste. The first sentence front-loads the core action and scope, while the second provides immediate usage context through the question format. Every word earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
While the 2 parameters are fully documented in the schema, the tool lacks both annotations and an output schema. The description does not compensate by explaining what the reference results contain (addresses, instructions, types), leaving a gap for a data-retrieval tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the schema fully documents both 'address' (including format examples) and 'session_id'. The description does not add semantic information beyond the schema, warranting the baseline score for high-coverage schemas.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses specific verbs ('Find') and identifies the exact resource ('code and data references TO an address'). The capitalization of 'TO' effectively distinguishes it from sibling tool 'get_references_from', and the quoted question 'who reads/writes/calls this?' provides clear operational context.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The framing 'Answers who reads/writes/calls this?' implies when to use the tool (to find accessors of an address), but it does not explicitly differentiate from similar siblings like 'what_calls', 'who_calls', or 'get_references_from', nor does it state prerequisites or exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_stringsAInspect
Get all strings found in the binary with addresses and types. Supports filter, min_length, offset/limit for pagination.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Max results | |
| filter | No | Filter string (substring match) | |
| offset | No | Pagination offset | |
| min_length | No | Minimum string length (default 4) | |
| session_id | Yes | Session ID |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description must carry the full burden. It discloses pagination behavior (offset/limit) and filtering, but fails to clarify what 'types' refers to (ASCII, UTF-16, etc.), performance characteristics for large binaries, or the return data structure.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences with zero waste: first establishes purpose and return attributes, second lists capabilities. Every word earns its place with no redundancy or fluff.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the 5 parameters with complete schema documentation and no output schema, the description adequately covers the tool's function. However, it could improve by describing the return format (list of objects with address/type fields) since no output schema exists to document this.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, establishing a baseline of 3. The description mentions the parameters explicitly (filter, min_length, offset/limit) which reinforces their grouping by function, but adds minimal semantic detail beyond what the schema already provides.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the core action (Get all strings) and key attributes returned (addresses and types). However, it doesn't explicitly distinguish when to use this versus the sibling tool 'search_strings', though the 'get all' vs 'search' naming implies different use cases.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description mentions capabilities (filter, min_length, pagination) which implies usage patterns, but lacks explicit guidance on when to use this tool versus alternatives like 'search_strings', or prerequisites like requiring an open session.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_variablesBInspect
Get all variables for a function: parameters, local stack variables, register variables.
| Name | Required | Description | Default |
|---|---|---|---|
| address | Yes | Address (decimal or hex string like "0x401000") | |
| session_id | Yes | Session ID |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It fails to indicate whether this is a read-only operation, if it triggers automatic analysis, expected performance characteristics, or the structure/format of the returned variable data.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that front-loads the core action and resource. Every word earns its place with zero redundancy or filler content.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a 2-parameter tool without output schema or annotations, the description adequately covers the basic intent but leaves significant gaps regarding return value structure, error conditions (e.g., invalid function address), and domain-specific requirements.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
While the schema has 100% coverage (baseline 3), the description adds crucial semantic context that the 'address' parameter should specify a function address, not just any arbitrary address. This constraint is not explicit in the schema description.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('Get') and target resource ('variables for a function'), enumerating specific variable types (parameters, locals, registers). However, it does not explicitly differentiate from similar sibling tools like 'trace_variable' or 'get_function_detail'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives (e.g., 'trace_variable' for tracking changes vs. 'get_variables' for static inspection), nor does it mention prerequisites like requiring the binary to be analyzed first.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
goto_addressAInspect
Navigate to an address and show what is there (section, function, symbol) plus the code at that location. Combines inspect_address and code viewing in one call. The "goto" operation for reverse engineers.
| Name | Required | Description | Default |
|---|---|---|---|
| view | No | Code view: "pseudocode" (default), "assembly", or "ssa" | |
| address | Yes | Address to navigate to | |
| session_id | Yes | Session ID |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full disclosure burden. It explains the compound nature of the operation (navigation + inspection + code retrieval) and what metadata it returns (sections, functions, symbols). However, it omits safety details like whether this modifies session state, error behavior for invalid addresses, or whether the operation is idempotent.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences with zero waste: sentence 1 defines core functionality, sentence 2 provides sibling differentiation, sentence 3 establishes domain terminology. Information is front-loaded with the essential action and payload, followed by usage context. No redundancy with schema or structured fields.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a 3-parameter navigation tool with 100% schema coverage and no output schema, the description adequately explains the return value conceptually (metadata + code). It leverages the 'goto' metaphor familiar to reverse engineers. Minor gap: could explicitly mention that 'view' parameter affects the code representation returned, though this is inferable from schema.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, establishing a baseline of 3. The description adds semantic context that the 'address' represents a location containing sections/functions/symbols, but does not elaborate on parameter syntax, the session_id lifecycle, or provide examples for the view options (though the schema adequately documents these).
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description provides specific verb ('Navigate'), resource ('address'), and exact scope ('show what is there (section, function, symbol) plus the code'). It explicitly distinguishes from sibling tools by stating it 'Combines inspect_address and code viewing in one call', clearly positioning it against the standalone inspection and code reading tools.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies when to use this tool by explaining it combines two operations (inspect_address and code viewing) into one call, suggesting efficiency benefits. However, it lacks explicit 'when-not-to-use' guidance (e.g., 'use inspect_address alone if you only need metadata'). The 'goto operation for reverse engineers' provides helpful domain context.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
inspect_addressBInspect
Inspect what exists at a given address: section, segment, function, symbol, string, data variable.
| Name | Required | Description | Default |
|---|---|---|---|
| address | Yes | Address (decimal or hex string like "0x401000") | |
| session_id | Yes | Session ID |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Without annotations, the description carries the full burden. It discloses what entity types it inspects for (section, segment, function, symbol, string, data variable), providing useful behavioral context. However, it omits whether it returns all matches or prioritized ones, error handling for invalid addresses, and whether the operation is read-only (implied by 'Inspect' but not explicit).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence with zero waste. It front-loads the action ('Inspect') and uses the colon-separated list to maximize information density without verbosity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the simple 2-parameter input and 100% schema coverage, the description is minimally adequate. However, lacking an output schema, it should ideally describe the return structure (e.g., whether it returns a single classification or a list of overlapping entities) rather than just listing what it searches for.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the schema fully documents both parameters including the address format (decimal or hex string). The description mentions 'at a given address' but adds no semantic detail beyond what the schema already provides, warranting the baseline score.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly defines the action ('Inspect') and target ('what exists at a given address'), and the colon-separated list of entity types (section, segment, function, etc.) clarifies the scope. However, it lacks explicit differentiation from siblings like `read_assembly` or `get_function_detail` that also operate on addresses.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
There is no guidance on when to use this tool versus more specific alternatives (e.g., `get_function_detail` when you already know the address contains a function). It does not state prerequisites or conditions for use.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_exportsBInspect
List all exported symbols with addresses.
| Name | Required | Description | Default |
|---|---|---|---|
| session_id | Yes | Session ID from open_binary |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden. It only adds that addresses are included in the output ('with addresses'), but omits safety characteristics (read-only vs destructive), performance expectations, return format, or error conditions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Extremely efficient at six words. The single sentence is front-loaded with the verb and contains no redundant or filler text. Every word serves to clarify the tool's scope.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's simplicity (single parameter) and 100% schema coverage, the description is minimally adequate. However, with no output schema provided, the description should ideally characterize the return structure (e.g., array of export objects) rather than just mentioning 'addresses'.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage for the single session_id parameter, the schema fully documents the input. The description adds no additional parameter context, syntax help, or examples, meriting the baseline score of 3.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
States a specific action ('List') and resource ('exported symbols with addresses'). The term 'exported' distinguishes it from sibling tools like list_imports and list_symbols in the binary analysis domain. However, it lacks explicit contrast with these siblings in the text itself.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides no guidance on when to use this tool versus alternatives like list_symbols or list_imports. Does not mention prerequisites (e.g., requiring an open session) despite the session_id parameter implying dependency on open_binary.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_functionsAInspect
List all functions with name, address, size, parameter count. Supports filter (name substring), offset/limit for pagination.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Max results to return (default 200) | |
| filter | No | Optional name filter (substring match) | |
| offset | No | Pagination offset (default 0) | |
| session_id | Yes | Session ID from open_binary |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It successfully communicates what data is returned (name, address, size, parameter count) and mentions pagination/filtering behavior. However, it omits details about default sorting order, maximum limits, or whether results include library vs user-defined functions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two efficiently structured sentences with zero redundancy. The first sentence establishes core purpose and return values; the second adds operational capabilities (filtering/pagination). Information is front-loaded and every phrase earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Despite lacking an output schema, the description compensates by explicitly listing the four fields returned for each function. Given the 100% schema coverage and clear scope, the description is sufficient for a listing tool, though it could briefly mention this operates within an active binary analysis session.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Input schema has 100% description coverage, establishing a baseline of 3. The description adds marginal value by semantically grouping 'offset/limit' as 'pagination' and confirming 'filter' performs substring matching, but largely restates schema information already provided.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description opens with a specific verb ('List') and resource ('functions'), clearly specifying the returned fields (name, address, size, parameter count). This distinguishes it from sibling tools like get_function_detail (single deep analysis) and other list operations (list_exports, list_imports) by emphasizing bulk function enumeration with metadata.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description mentions pagination capabilities (offset/limit) and filtering, implying suitability for bulk operations. However, it lacks explicit guidance on when to use this versus get_function_detail or other navigation tools, and doesn't state the prerequisite of calling open_binary first (though implied by session_id parameter).
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_importsAInspect
List all imported functions with library names and addresses. Shows what external APIs the binary uses.
| Name | Required | Description | Default |
|---|---|---|---|
| session_id | Yes | Session ID from open_binary |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It partially discloses output structure by mentioning 'library names and addresses,' but does not explicitly state that this is a read-only operation, detail error conditions (e.g., invalid session_id), or describe the return format beyond the field names.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description consists of two efficient sentences with zero redundancy. The first sentence front-loads the core action and data returned, while the second provides valuable conceptual context (external APIs) without wasting words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's simplicity (single parameter) and lack of output schema, the description adequately covers the conceptual output by listing the data fields returned (names, addresses). It appropriately compensates for the missing output schema by indicating what information the user will receive.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage for the single 'session_id' parameter, establishing a baseline of 3. The description adds no additional parameter details (e.g., format constraints, where to obtain the ID), so it meets but does not exceed the baseline.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses the specific verb 'List' with the clear resource 'imported functions' and details the returned attributes ('library names and addresses'). It effectively distinguishes this tool from siblings like 'list_exports' and 'list_functions' by specifying 'imported' and emphasizing 'external APIs'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The phrase 'Shows what external APIs the binary uses' implies a use case (analyzing external dependencies), providing implicit context for when to invoke the tool. However, it lacks explicit guidance on when to prefer this over 'list_functions' for internal symbols or 'list_exports' for the binary's exposed interfaces.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_sectionsBInspect
List binary sections: name, address range, size, semantics.
| Name | Required | Description | Default |
|---|---|---|---|
| session_id | Yes | Session ID from open_binary |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It partially compensates by disclosing the return value structure (name, address range, size, semantics), which is crucial given the lack of an output schema. However, it omits safety information, error conditions, and side effects.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence of nine words. Every word earns its place—verb, resource, and return value fields are specified with zero redundancy or filler.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a single-parameter tool with full schema coverage, the description is minimally adequate. It compensates for the missing output schema by listing return fields. However, given the rich ecosystem of sibling listing tools, the lack of usage guidelines and differentiation from similar tools (e.g., list_segments) leaves a completeness gap.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the baseline is 3. The description does not mention the session_id parameter, but the schema fully documents it as 'Session ID from open_binary'. No additional parameter semantics are provided in the description text.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb (List) and resource (binary sections), and specifies the returned attributes (name, address range, size, semantics). However, it does not explicitly differentiate from the sibling tool 'list_segments', which could cause confusion since sections and segments are distinct concepts in binary analysis.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives like list_segments or list_functions, nor does it mention prerequisites. While the schema parameter description references 'open_binary', the main description text lacks explicit usage context.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_segmentsBInspect
List memory segments: address range, file offset, data length, read/write/execute permissions.
| Name | Required | Description | Default |
|---|---|---|---|
| session_id | Yes | Session ID from open_binary |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It compensates for the missing output schema by disclosing the return value fields (address range, offset, length, permissions), but omits other behavioral traits like performance characteristics, caching, or side effects.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, dense sentence that front-loads the action and efficiently lists the specific data attributes returned. There is no redundant or wasteful text.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the lack of output schema, the description adequately compensates by detailing the structure of returned data. It successfully communicates the tool's purpose for a binary analysis context, though it could be strengthened by explicitly contrasting segments with sections (the sibling tool).
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, establishing a baseline of 3. The description does not explicitly mention the session_id parameter or add semantic context beyond what the schema already provides ('Session ID from open_binary').
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly specifies the action ('List') and resource ('memory segments'), and the enumerated attributes (address range, permissions) implicitly distinguish this from sibling tool 'list_sections' which deals with file sections rather than runtime memory layout.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus similar alternatives like 'list_sections', nor does it mention prerequisites (e.g., requiring 'open_binary' first), though the schema hints at this via the session_id parameter.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_symbolsAInspect
List all symbols, optionally filtered by type: function, import, data, external.
| Name | Required | Description | Default |
|---|---|---|---|
| type | No | Symbol type filter: "function", "import", "data", "external" | |
| session_id | Yes | Session ID |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. It explains the filtering behavior but fails to disclose read-only safety (critical for binary analysis tools), return value structure, or pagination limits. The word 'List' implies read-only, but explicit confirmation is absent given the annotation vacuum.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single sentence that is front-loaded with the core action ('List all symbols') and appends the filtering qualification. Every word earns its place with zero redundancy or filler.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Adequate for a simple enumeration tool with 100% schema coverage, but lacking output format description (no output schema exists). Given the complexity of binary analysis and the density of sibling tools, additional context about what constitutes a 'symbol' in this domain would improve completeness.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, establishing baseline 3. The description reinforces the enum values for the 'type' parameter in prose form and emphasizes the optional nature of filtering, adding marginal value beyond the structured schema descriptions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
States specific verb 'List' and resource 'symbols', and enumerates the four filterable types (function, import, data, external). This implicitly distinguishes it from siblings like list_functions or list_imports by showing this is the comprehensive superset tool, though it doesn't explicitly state when to prefer this over the specific variants.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Implies usage through the optional filter description ('optionally filtered by type'), but provides no explicit when-to-use guidance or alternatives. Given the many sibling enumeration tools (list_exports, list_functions, etc.), explicit guidance on when to use this general tool versus specific ones would be valuable.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
open_binaryAInspect
Open a binary file for analysis. If the binary was uploaded through Claude's web interface, provide content_base64 with the base64-encoded file bytes -- the server will save it locally and open it. Returns session ID, triage summary, file hashes, and suggests next analysis steps.
| Name | Required | Description | Default |
|---|---|---|---|
| path | Yes | File system path to the binary, OR just the filename when providing content_base64 | |
| content_base64 | No | Optional: base64-encoded binary file content. Provide this when the file is not on the server's filesystem (e.g. uploaded through Claude's web interface). The server will decode and save it locally before analysis. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full burden. It successfully discloses key behavioral traits: side effects ('server will save it locally'), return structure ('session ID, triage summary, file hashes'), and post-invocation workflow ('suggests next analysis steps'). Does not mention idempotency or permissions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three tightly constructed sentences with zero waste. Front-loaded with purpose ('Open a binary file'), followed by input handling logic, and closing with return value documentation. Every clause conveys essential information not duplicated in structured fields.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Compensates well for missing output schema by detailing return values (session ID, hashes, suggestions). Addresses the 100% schema coverage adequately. Given this is a complex session-initialization tool with 40+ siblings, the description provides sufficient context to identify it as the correct entry point.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, establishing baseline 3. The description adds significant value by explaining the 'why' and 'when' for content_base64 (web interface uploads) and clarifying the dual-use nature of 'path' (filesystem vs filename placeholder). This contextual guidance exceeds the raw schema definitions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
States specific action ('Open') and resource ('binary file') clearly. Implicitly distinguishes itself as the session initialization entry point (mentioning 'Returns session ID') among the many analysis siblings, though it doesn't explicitly contrast with 'upload_binary'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides clear conditional guidance on parameter selection: explicitly states when to use 'content_base64' ('If the binary was uploaded through Claude's web interface') versus relying on 'path' for server-side files. Could improve by explicitly mentioning 'upload_binary' as an alternative for non-web uploads.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
read_assemblyAInspect
Read raw assembly code at an address or for an entire function. Returns instructions with addresses, hex bytes, and mnemonics.
| Name | Required | Description | Default |
|---|---|---|---|
| length | No | Number of instructions (omit for full function) | |
| address | Yes | Function or start address | |
| session_id | Yes | Session ID |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively compensates by describing the return structure ('instructions with addresses, hex bytes, and mnemonics') despite the lack of an output schema. It does not explicitly state read-only safety, though this is implied by the verb 'Read'.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description consists of two highly efficient sentences. The first establishes purpose and scope; the second discloses return format. Every word earns its place with no redundancy or filler content.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the 100% input schema coverage and lack of output schema, the description adequately covers the tool's purpose, input behavior, and return structure. It is complete enough for agent selection and invocation, though explicit safety annotations (readOnlyHint) would have provided additional value.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage, establishing a baseline of 3. The description reinforces the relationship between parameters ('at an address or for an entire function' maps to the address and length parameters), but does not add substantial semantic meaning beyond what the schema already provides.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states a specific verb ('Read'), resource ('raw assembly code'), and scope ('at an address or for an entire function'). The mention of 'raw assembly' and specific output format ('hex bytes, and mnemonics') effectively distinguishes it from siblings like read_pseudocode, read_lifted_il, and read_memory.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides implied usage guidance through the specificity of 'raw assembly' and the output format, suggesting when to use this versus high-level alternatives. However, it lacks explicit comparisons to similar siblings (e.g., read_block, inspect_address) or explicit 'when-not-to-use' guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
read_blockAInspect
Read code for a specific basic block (not the whole function). Shows the block containing the given address plus N neighbor blocks. Use this for large functions where read_pseudocode returns too much.
| Name | Required | Description | Default |
|---|---|---|---|
| view | No | Code view: "pseudocode" (default), "assembly", or "ssa" | |
| address | Yes | Address within the target basic block | |
| session_id | Yes | Session ID | |
| context_blocks | No | Number of neighbor blocks to include (default 1) |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. Adds valuable behavioral context that it 'Shows the block containing the given address plus N neighbor blocks' (explaining the neighbor inclusion logic). However, lacks explicit read-only confirmation, error behavior, or return format details that annotations would typically cover.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three tightly constructed sentences with zero waste. Front-loaded with core purpose, followed by behavioral details, then usage guidance. Every sentence earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Appropriate for a 4-parameter tool with 100% schema coverage and no output schema. Explains what content is returned (block + neighbors) and the view options are well-documented in schema. Minor gap in not describing the return data structure, but sufficient for tool selection.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, establishing baseline 3. Description adds semantic meaning by mapping 'N neighbor blocks' to context_blocks parameter and clarifying that address should be 'within the target basic block', enhancing understanding beyond raw schema definitions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Specific verb 'Read' with clear resource 'code for a specific basic block'. Explicitly scopes to 'not the whole function' and distinguishes from sibling 'read_pseudocode' by contrasting use cases.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit when-to-use guidance ('Use this for large functions where read_pseudocode returns too much') and implicitly defines when not to use it. Names the alternative tool explicitly.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
read_lifted_ilAInspect
Read low-level intermediate representation for a function. Closer to assembly but architecture-independent. Useful for understanding exact register and flag operations.
| Name | Required | Description | Default |
|---|---|---|---|
| address | Yes | Address (decimal or hex string like "0x401000") | |
| session_id | Yes | Session ID |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full disclosure burden. Adds valuable behavioral context about the IL being architecture-independent and register/flag-focused. However, fails to clarify read-only safety, output format, or performance characteristics expected for a binary analysis tool.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences, zero waste. Front-loaded with core action, followed by differentiation ('Closer to assembly...'), then use case. Every sentence earns its place without redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no output schema, description adequately covers the IL's characteristics but omits return format details. Sufficient for tool selection among 30+ siblings, though output structure description would strengthen completeness.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema has 100% coverage with basic types. Description adds crucial semantic context that the 'address' parameter refers to a function address ('for a function'), not just any memory location, which helps the agent select valid inputs.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description uses specific verb 'Read' with clear resource 'low-level intermediate representation for a function'. Effectively distinguishes from siblings by stating it is 'Closer to assembly but architecture-independent', positioning it between read_assembly (architecture-specific) and read_pseudocode (higher level).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides clear context on when to use ('Useful for understanding exact register and flag operations'), implying it's for low-level analysis. Lacks explicit 'when not to use' guidance or named alternatives, though the abstraction level comparison implicitly guides selection.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
read_memoryBInspect
Read raw bytes at a virtual address. Returns hex dump.
| Name | Required | Description | Default |
|---|---|---|---|
| length | No | Number of bytes to read (default 64, max 4096) | |
| address | Yes | Virtual address to read from | |
| session_id | Yes | Session ID |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It discloses the return format ('Returns hex dump') which compensates partially for the missing output schema, but fails to state whether the operation is read-only, safe, or what happens on invalid memory addresses.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences totaling nine words. First sentence defines the action and target, second defines the return value. Zero redundancy, appropriately front-loaded.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Adequate for a simple read operation with complete schema coverage, but gaps remain given no annotations and no output schema. Missing safety confirmation (read-only nature) and error behavior for invalid addresses.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Input schema has 100% description coverage (address, length constraints, session_id), establishing baseline 3. The description adds no additional parameter semantics beyond the schema, such as address format expectations or typical length values.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses specific verbs ('Read') and resources ('raw bytes') and clearly distinguishes from sibling tools like read_assembly, read_pseudocode, and read_lifted_il by specifying 'raw bytes' and 'hex dump'. However, it doesn't explicitly contrast with read_block or inspect_address.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives (e.g., when to use read_memory vs read_assembly or inspect_address) and does not mention prerequisites like requiring an active session.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
read_pseudocodeAInspect
Read decompiled C-like pseudocode for a function. The most human-readable view of what the code does.
| Name | Required | Description | Default |
|---|---|---|---|
| address | Yes | Address (decimal or hex string like "0x401000") | |
| session_id | Yes | Session ID |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full disclosure burden. While 'Read' implies a safe operation, the description lacks critical behavioral details: error handling (what if address isn't a function?), prerequisites (does binary need analysis first?), caching behavior, or return value structure. Insufficient for a tool with no safety annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences, zero waste. First sentence defines the action and resource; second sentence provides value proposition distinguishing from alternatives. Perfectly front-loaded and appropriately sized.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Adequate for a simple read operation with well-documented schema. However, lacks output specification (no output schema exists to compensate), error condition documentation, or context about what 'decompiled' implies regarding analysis state. Sufficient but not comprehensive.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Input schema has 100% coverage with adequate descriptions (session_id, address with format examples). Description adds no parameter-specific semantics, but with complete schema documentation, the baseline score of 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Excellent specificity: 'Read decompiled C-like pseudocode for a function' provides exact verb and resource. The phrase 'most human-readable view' effectively distinguishes it from siblings like read_assembly, read_lifted_il, and read_ssa_form by positioning it as the high-level alternative.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides implicit guidance by positioning the tool as the 'most human-readable view,' suggesting use when clarity is prioritized over detail. However, lacks explicit when-to-use/when-not-to-use guidance or direct comparison statements versus read_assembly or other read_* siblings.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
read_ssa_formAInspect
Read SSA (Static Single Assignment) form for a function. Each variable version is unique, enabling precise tracking of where values are defined and used. Best for tracing data flow.
| Name | Required | Description | Default |
|---|---|---|---|
| address | Yes | Address (decimal or hex string like "0x401000") | |
| session_id | Yes | Session ID |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It successfully explains the semantic nature of SSA data (unique versions, tracking definitions/uses), which is essential context for interpreting results. However, it omits operational details like whether the operation is read-only, caching behavior, or the structure/format of the returned SSA representation.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The three-sentence structure is efficient and front-loaded: purpose, technical explanation, and use case. Each sentence earns its place without redundancy. The description avoids repeating parameter types or schemas, focusing instead on domain-specific context.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the technical complexity of SSA and the absence of an output schema, the description adequately explains the domain concept but leaves a significant gap regarding the return format (e.g., JSON structure, instruction format). For a specialized analysis tool among many siblings, it meets minimum viability but lacks richness about output structure.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
While the schema has 100% coverage documenting the address and session_id parameters, the description adds crucial semantic constraints by specifying this is 'for a function.' This implies the address parameter should target a function entry point rather than arbitrary memory, adding value beyond the schema's generic 'Address' description.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly defines the tool's action ('Read SSA form') and expands the technical acronym (Static Single Assignment). It explains the key characteristic of SSA (unique variable versions), which helps distinguish it from siblings like read_assembly or read_pseudocode. However, it does not explicitly differentiate from closely related tracing tools like trace_ssa_step.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The phrase 'Best for tracing data flow' provides a positive use case hint, suggesting when the tool is appropriate. However, it lacks explicit guidance on when NOT to use it (e.g., when trace_ssa_step is more appropriate) or prerequisites (like ensuring the address points to a valid function).
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
resolve_nameAInspect
Look up a function or symbol by name. Returns all matching addresses. Tries exact match first, then substring. Use this when you see a name like "system" or "vulnerable_cmd" and need its address.
| Name | Required | Description | Default |
|---|---|---|---|
| name | Yes | Symbol or function name to look up (exact or substring match) | |
| session_id | Yes | Session ID |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. Discloses critical matching algorithm ('exact match first, then substring') and return cardinality ('all matching addresses'). Lacks explicit read-only declaration (though implied) and error behavior for no matches, preventing a perfect score.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Four sentences with zero waste: purpose, return behavior, matching algorithm, and usage trigger. Information is front-loaded with the core action in the first sentence. Every sentence provides unique value not redundant with schema or title.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a two-parameter lookup tool without output schema, description adequately covers return type ('addresses') and matching logic. Could improve by mentioning session prerequisite (implied by session_id) or addressing case sensitivity, but sufficient for correct invocation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, establishing baseline 3. Description adds concrete security-relevant examples ('system', 'vulnerable_cmd') for the name parameter, illustrating expected input format and use case. Session_id semantics are adequately covered by schema alone.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
States specific verb 'Look up' and resource 'function or symbol by name', clearly distinguishing from siblings like list_functions (enumerates all) or get_function_detail (requires address). The phrase 'Returns all matching addresses' further clarifies the output distinguishes it from tools returning metadata.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicit 'Use this when...' clause provides concrete trigger conditions with examples ('system', 'vulnerable_cmd'). Clearly indicates this is the correct tool when possessing a symbolic name and needing an address, contrasting with address-based navigation tools like goto_address or inspect_address.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
search_bytesAInspect
Search for a byte pattern (hex string like "48 8b 05") in the binary. Returns all matching addresses.
| Name | Required | Description | Default |
|---|---|---|---|
| pattern | Yes | Hex byte pattern (e.g. "48 8b 05" or "488b05") | |
| session_id | Yes | Session ID |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Discloses the return value ('Returns all matching addresses'), which is critical given no output schema exists. However, with no annotations provided, the description misses opportunity to disclose search scope (entire binary?), address format (hex vs decimal), or performance characteristics on large binaries.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences, zero waste. First sentence defines action and input format; second discloses return type. Information is front-loaded and appropriately sized for the tool's complexity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the simple 2-parameter schema with complete coverage and lack of output schema, the description adequately covers the essential contract (input pattern type and return addresses). Minor gap on search scope details (sections vs entire binary), but sufficient for agent selection.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Input schema has 100% coverage with clear descriptions and examples for both parameters (session_id and pattern). The tool description reinforces the pattern format but adds no additional semantic meaning beyond what the schema already provides. Baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clear specific verb ('Search') and resource ('binary'). The hex pattern example ('48 8b 05') effectively distinguishes this from siblings like search_constants (numeric values) and search_strings (text), specifying this is for raw byte sequences.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides implied usage through the hex pattern format specification, but lacks explicit guidance on when to prefer this over search_constants or search_strings. No 'when-not-to-use' or explicit alternatives named.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
search_constantsAInspect
Find all uses of a specific numeric constant value across the binary.
| Name | Required | Description | Default |
|---|---|---|---|
| value | Yes | Numeric constant value to find | |
| session_id | Yes | Session ID |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It states 'Find all uses' but does not clarify what constitutes a 'use' (immediate values, references, etc.), whether the operation is read-only, performance characteristics of scanning the binary, or the return format.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single sentence, 11 words, front-loaded with the action verb. No filler words or redundant information. Every word earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the simple 2-parameter input schema with full coverage, the description is minimally sufficient. However, it lacks information about the return structure (no output schema exists) and safety profile (no annotations), leaving gaps an agent would need to infer from the verb 'Find'.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, providing detailed descriptions for both 'value' (numeric constant to find) and 'session_id'. The description mentions the constant value but adds no additional semantic context (e.g., integer format, size constraints) beyond what the schema already provides.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description provides a specific verb ('Find'), clear resource ('numeric constant value'), and scope ('across the binary'). It effectively distinguishes from siblings like search_bytes (byte patterns) and search_strings (text strings) by specifying 'numeric constant'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage through specificity (constants vs strings/bytes), but provides no explicit guidance on when to use this over search_bytes (which could overlap for hex values) or what format constants should be in. No alternatives or exclusions are mentioned.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
search_stringsBInspect
Search for a text pattern in disassembly output. Returns matching locations.
| Name | Required | Description | Default |
|---|---|---|---|
| text | Yes | Text to search for in disassembly | |
| session_id | Yes | Session ID |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions 'Returns matching locations' but fails to specify the format (addresses? line numbers? function names?), whether the search is case-sensitive, supports wildcards, or has performance implications on large binaries.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description consists of exactly two sentences with zero waste: the first defines the action, the second defines the return value. It is appropriately front-loaded and sized for the tool's complexity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the simple 2-parameter schema with full coverage, the description is minimally adequate. However, with no output schema provided, the phrase 'matching locations' is insufficiently specific about the return structure (e.g., array of addresses vs. formatted strings).
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, establishing a baseline of 3. The description adds minimal semantic value beyond the schema—referring to 'text pattern' rather than just 'text' suggests pattern matching capabilities, but this is ambiguous and not elaborated upon.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb (search), resource (text pattern), and scope (in disassembly output). The phrase 'in disassembly output' distinguishes it from sibling tools like search_bytes (binary patterns) and get_strings (binary strings), though it doesn't explicitly name these alternatives.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus siblings like search_bytes, search_constants, or get_strings. It does not mention prerequisites (e.g., requiring an open binary session) or constraints (case sensitivity, regex support).
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
trace_data_flowAInspect
Run forward taint analysis on a SINGLE function. Tracks how data flows from sources to sinks within that function. Accepts optional custom sources/sinks. Completes in seconds.
| Name | Required | Description | Default |
|---|---|---|---|
| sinks | No | Custom sink function names (uses defaults if omitted) | |
| sources | No | Custom source function names (uses defaults if omitted) | |
| session_id | Yes | Session ID | |
| function_address | Yes | Function address for targeted taint analysis |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Since no annotations exist, description carries full burden. It adds valuable timing information ('Completes in seconds') and directionality ('forward'), but omits safety profile (read-only vs. destructive), state modifications, and return value format. Adequate but incomplete behavioral disclosure for a security analysis tool.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Four sentences, zero waste. Front-loaded with core purpose ('Run forward taint analysis'), followed by mechanism ('Tracks how data flows'), parameter hint ('Accepts optional custom sources/sinks'), and performance characteristic ('Completes in seconds'). Every sentence earns its place with no redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Sufficient for correct invocation given complete schema coverage and clear purpose. However, lacks output description (no output schema exists) and safety annotations, leaving gaps in understanding what the tool returns and whether it modifies analysis state. Adequate but not comprehensive for a specialized security tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with clear descriptions for all 4 parameters. Description mentions 'Accepts optional custom sources/sinks' which aligns with schema nullability, but adds no semantic detail beyond schema (e.g., format expectations, examples of valid function names). Baseline 3 appropriate for high schema coverage with minimal description supplementation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description explicitly states 'Run forward taint analysis on a SINGLE function' — specific verb (run forward taint analysis), resource (function), and scope (single function). The capitalized 'SINGLE' effectively distinguishes this from siblings like trace_variable or get_call_tree that may operate across scopes or on different analysis units.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides implied usage through functional description ('Tracks how data flows from sources to sinks'), but lacks explicit when-to-use guidance or differentiation from related analysis tools like check_sanitization or trace_ssa_step. No mention of prerequisites or when to prefer this over backward taint analysis.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
trace_ssa_stepAInspect
Trace a variable's SSA def-use chain incrementally -- returns N operations (default 5) at a time. Use "forward" to see where data flows TO (from definition toward sinks) or "backward" to see where data came FROM (from usage toward sources). Use from_index to continue from a previous result. Each step shows SSA text, variables read/written, and flags for source/sink/sanitizer functions. Returns has_more=true with next_index when more operations exist.
| Name | Required | Description | Default |
|---|---|---|---|
| address | Yes | Address inside the function containing the variable | |
| direction | No | Trace direction: "forward" (where does data go?) or "backward" (where did data come from?) | |
| max_steps | No | Maximum operations to return per call (default 5, max 20) | |
| from_index | No | Resume from this operation index (0 = start, use next_index from previous call to continue) | |
| session_id | Yes | Session ID | |
| variable_name | Yes | Variable name (or SSA name like "buf#3") to trace |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full burden. It effectively explains the pagination mechanism (returns N operations, has_more/next_index) and output content (SSA text, source/sink/sanitizer flags). However, it omits explicit confirmation that this is a read-only analysis operation or safety properties, which is critical given the 'trace' terminology could be ambiguous in binary analysis contexts.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Excellent information density with zero waste. Four sentences cover: (1) core purpose and pagination, (2) direction semantics, (3) continuation mechanics, and (4) return value structure. Each sentence earns its place and builds logically on the previous.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Strong coverage for a complex SSA analysis tool with no output schema. Explains the returned data structure (has_more, next_index, flags) and the incremental nature of results. Could be improved by mentioning error conditions or confirming the read-only nature of the operation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Despite 100% schema coverage (baseline 3), the description adds significant semantic value: clarifies 'forward' traces toward sinks vs 'backward' toward sources, explains that 'from_index' uses next_index from previous calls, and notes the default value of 5 for max_steps which isn't in the schema constraints.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clear specific verb ('Trace') and resource ('SSA def-use chain'), with the key differentiator 'incrementally' hinting at pagination behavior. However, it doesn't explicitly distinguish from siblings like 'trace_variable' or 'trace_data_flow' which likely perform similar analysis without the stepwise pagination.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides clear guidance on when to use 'forward' vs 'backward' directions and how to use 'from_index' for pagination. Lacks explicit comparison to sibling tools (e.g., when to choose this over 'trace_variable' for full traces) or prerequisites like requiring an active session.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
trace_variableAInspect
Trace a variable through SSA def-use chains within a function. Given a variable name, returns where it is defined and every place it is used.
| Name | Required | Description | Default |
|---|---|---|---|
| address | Yes | Function address containing the variable | |
| session_id | Yes | Session ID | |
| variable_name | Yes | Variable name to trace |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. It explains the analytical behavior (def-use chain analysis) and implies read-only nature via 'returns,' but lacks explicit safety disclosure (read-only confirmation), error conditions, or performance notes.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences with zero waste. First sentence establishes purpose and scope; second explains input/output contract. Every word earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a 3-parameter analysis tool without output schema, the description adequately explains the conceptual return value ('where it is defined and every place it is used'). Could mention behavior when variable not found, but covers essential contract.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, establishing baseline 3. Description reinforces 'variable name' and 'function' context but adds no additional semantics about parameter formats, validation rules, or constraints beyond what the schema already provides.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Excellent specificity: 'Trace' (verb) + 'variable' (resource) + 'SSA def-use chains' (mechanism) + 'within a function' (scope). The mention of 'SSA def-use chains' effectively distinguishes this from siblings like trace_data_flow and trace_ssa_step.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides implied usage context—you use this when you need to track a variable's definitions and uses. However, it lacks explicit guidance on when to choose this over trace_data_flow or trace_ssa_step, despite those being available siblings.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
upload_binaryAInspect
Upload a binary file for analysis. Use this when the binary is not already on the server's filesystem (e.g. when uploading through Claude's web interface). Send the file content as base64 and receive a local path that you can pass to open_binary.
| Name | Required | Description | Default |
|---|---|---|---|
| filename | Yes | Original filename of the binary (e.g. "firmware.bin") | |
| content_base64 | Yes | Base64-encoded binary file content |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations, the description carries full burden. It discloses input format ('base64-encoded'), side effects ('receive a local path'), and the transformation performed (remote content to server filesystem). Minor gap: doesn't specify if the path is temporary, persistent, or overwrite behavior.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences efficiently structured: purpose statement, usage condition, and behavioral outcome/workflow. No redundant words. Every sentence provides distinct value not replicated in structured fields.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a 2-parameter tool with no output schema, the description adequately explains the return value ('local path') and its intended use. It provides sufficient context for the agent to understand the upload-to-analysis workflow despite missing output schema annotations.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema has 100% description coverage ('Original filename', 'Base64-encoded binary file content'), establishing baseline 3. The description reinforces the base64 encoding requirement but does not add significant semantic depth, constraints, or format details beyond what the schema already provides.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description states specific action ('Upload'), resource ('binary file'), and purpose ('for analysis'). It clearly distinguishes from siblings by specifying this is for binaries 'not already on the server's filesystem,' differentiating it from direct analysis tools like open_binary.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit when-to-use condition ('when the binary is not already on the server's filesystem') with a concrete example ('uploading through Claude's web interface'). Also establishes the workflow relationship with sibling tool open_binary, indicating the output is meant to be passed there.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
what_callsBInspect
Find all functions called by a given function. Shows the call tree downward.
| Name | Required | Description | Default |
|---|---|---|---|
| address | Yes | Address (decimal or hex string like "0x401000") | |
| session_id | Yes | Session ID |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It discloses that the tool returns a 'call tree' structure, which hints at the output format. However, it lacks details on whether this is recursive or direct-only, performance characteristics, or safety (read-only nature).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences, zero waste. First sentence establishes purpose, second clarifies directionality and output structure. Appropriately sized and front-loaded for a simple two-parameter tool.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the simple schema (2 parameters, no nested objects) and lack of output schema, the description adequately covers the core functionality. Mentioning 'call tree' partially compensates for missing output schema documentation, though details on recursion depth or return format would improve completeness.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
While the schema has 100% description coverage (baseline 3), the description adds valuable semantic context by referring to the address parameter as representing a 'given function,' clarifying that the input should be a function entry point rather than any arbitrary address.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool finds functions called by a given function using specific verbs ('Find', 'Shows'). The term 'downward' effectively hints at the directionality (callees vs callers), implicitly distinguishing it from sibling 'who_calls', though it doesn't explicitly name alternatives.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No explicit guidance on when to use this tool versus siblings like 'who_calls' (callers) or 'get_call_tree' (potentially full trees). The 'downward' keyword provides implicit context but lacks explicit when-to-use or when-not-to-use instructions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
who_callsBInspect
Find all functions that call a given function. Essential for tracing how user input reaches dangerous functions.
| Name | Required | Description | Default |
|---|---|---|---|
| address | No | Function address (optional if name is provided) | |
| session_id | Yes | Session ID | |
| function_name | No | Function name (optional if address is provided) |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It indicates comprehensiveness ('all functions') and the analytical purpose, but omits critical behavioral details such as the return format (addresses vs names vs structures), performance characteristics on large binaries, or whether the operation requires specific analysis prerequisites beyond the session_id.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description consists of two highly efficient sentences. The first front-loads the core functionality ('Find all functions that call a given function'), while the second adds the value proposition without redundancy. There is no extraneous text; every word serves the definition.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the 100% schema coverage and relatively simple 3-parameter structure, the description adequately covers the tool's intent. However, with no output schema and no annotations, the description could be strengthened by indicating what the tool returns (e.g., list of function addresses/names) or any prerequisite analysis states required in the session.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage, with clear documentation for session_id, address, and function_name parameters including their mutual exclusivity ('optional if X is provided'). The description does not add parameter-specific semantics beyond the implied 'given function' reference, meeting the baseline for high-coverage schemas.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool finds functions that call a given function using the specific verb 'Find' and identifies the resource (caller functions). It adds valuable security context ('tracing how user input reaches dangerous functions'). However, it fails to distinguish from the sibling tool 'what_calls', which appears synonymous without explicit differentiation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides an implied usage scenario via the security use case ('tracing how user input reaches dangerous functions'), suggesting when this tool is valuable. However, it lacks explicit guidance on when to use this versus similar siblings like 'what_calls', 'get_references_to', or 'trace_data_flow', and provides no 'when-not-to-use' exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:
{
"$schema": "https://glama.ai/mcp/schemas/connector.json",
"maintainers": [{ "email": "your-email@example.com" }]
}The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.
Control your server's listing on Glama, including description and metadata
Access analytics and receive server usage reports
Get monitoring and health status updates for your server
Feature your server to boost visibility and reach more users
For users:
Full audit trail — every tool call is logged with inputs and outputs for compliance and debugging
Granular tool control — enable or disable individual tools per connector to limit what your AI agents can do
Centralized credential management — store and rotate API keys and OAuth tokens in one place
Change alerts — get notified when a connector changes its schema, adds or removes tools, or updates tool definitions, so nothing breaks silently
For server owners:
Proven adoption — public usage metrics on your listing show real-world traction and build trust with prospective users
Tool-level analytics — see which tools are being used most, helping you prioritize development and documentation
Direct user feedback — users can report issues and suggest improvements through the listing, giving you a channel you would not have otherwise
The connector status is unhealthy when Glama is unable to successfully connect to the server. This can happen for several reasons:
The server is experiencing an outage
The URL of the server is wrong
Credentials required to access the server are missing or invalid
If you are the owner of this MCP connector and would like to make modifications to the listing, including providing test credentials for accessing the server, please contact support@glama.ai.
Discussions
No comments yet. Be the first to start the discussion!