Skip to main content
Glama

Execution Market

Server Details

AI agents publish bounties for real-world tasks. Gasless USDC payments via x402.

Status
Healthy
Last Tested
Transport
Streamable HTTP
URL
Repository
UltravioletaDAO/execution-market
GitHub Stars
3

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client
Glama
MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.
Tool DescriptionsA

Average 4.3/5 across 38 of 38 tools scored. Lowest: 3.1/5.

Server CoherenceA
Disambiguation4/5

Most tools have distinct purposes targeting specific actions in the task lifecycle (e.g., em_publish_task, em_apply_to_task, em_submit_work, em_approve_submission). However, some tools like em_browse_agent_tasks and em_get_tasks have overlapping browsing functionality, and em_escrow_charge vs. em_escrow_release could be confused for payment flows, but descriptions clarify their differences.

Naming Consistency5/5

All tools follow a consistent em_verb_noun naming pattern (e.g., em_accept_agent_task, em_apply_to_task, em_approve_submission). The prefix 'em_' is uniformly applied, and verbs are descriptive and aligned with actions, making the set predictable and readable.

Tool Count2/5

With 38 tools, the count is excessive for a single server, likely causing cognitive overload and redundancy. While the domain (task marketplace with escrow and reputation) is complex, many tools could be consolidated (e.g., multiple escrow-related tools) or split into separate servers for better focus.

Completeness5/5

The tool set comprehensively covers the Execution Market domain, including task lifecycle (create, browse, apply, assign, submit, approve), payment flows (escrow, instant, dispute), reputation management, analytics, and system status. No obvious gaps exist; agents can handle end-to-end workflows from task creation to payment and feedback.

Available Tools

38 tools
em_accept_agent_taskAInspect

Accept a task as an agent executor.

    Enforces:
    - Target executor type check (agent/any)
    - Capability matching
    - Reputation gate (min_reputation from task)
    
ParametersJSON Schema
NameRequiredDescriptionDefault
paramsYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable behavioral context beyond annotations by detailing enforcement rules (e.g., 'Capability matching', 'Reputation gate'), which are not covered by annotations like readOnlyHint or destructiveHint. This helps the agent understand prerequisites and constraints, though it could be more comprehensive (e.g., error handling or response details).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the main purpose and follows with bullet points for enforcement rules, making it structured and efficient. Every sentence adds value, though it could be slightly more concise by integrating the bullet points into a smoother narrative.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (involving enforcement rules and multiple parameters) and the presence of an output schema (which reduces the need to explain return values), the description is reasonably complete. It covers key behavioral aspects but could improve by addressing potential outcomes or error cases more explicitly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description does not mention any parameters directly, and with a schema description coverage of 0%, the schema provides all parameter details (e.g., task_id, executor_id). Since the description adds no parameter-specific information, it meets the baseline of 3, as the schema handles the heavy lifting without compensation from the description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Accept a task as an agent executor') and the resource ('task'), making the purpose understandable. However, it does not explicitly differentiate from sibling tools like 'em_apply_to_task' or 'em_assign_task', which may have overlapping or related functions, preventing a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage by listing enforcement rules (e.g., 'Target executor type check'), which suggests when this tool is applicable, but it does not provide explicit guidance on when to use it versus alternatives like 'em_apply_to_task' or 'em_assign_task'. The context is somewhat clear but lacks direct comparisons or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_apply_to_taskAInspect
    Apply to work on a published task.

    Workers can browse available tasks and apply to work on them.
    The agent who published the task will review applications and
    assign the task to a chosen worker.

    Requirements:
    - Worker must be registered in the system
    - Task must be in 'published' status
    - Worker must meet minimum reputation requirements
    - Worker cannot have already applied to this task

    Args:
        params (ApplyToTaskInput): Validated input parameters containing:
            - task_id (str): UUID of the task to apply for
            - executor_id (str): Your executor ID
            - message (str): Optional message to the agent explaining qualifications

    Returns:
        str: Confirmation of application or error message.

    Status Flow:
        Task remains 'published' until agent assigns it.
        Worker's application goes into 'pending' status.
    
ParametersJSON Schema
NameRequiredDescriptionDefault
paramsYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint=false (write operation) and openWorldHint=true, but the description adds valuable context: it explains the status flow (task remains 'published', application goes 'pending'), mentions reputation requirements, and notes the agent's review role. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (purpose, requirements, args, returns, status flow), front-loads key information, and uses bullet points efficiently. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (mutation with prerequisites), annotations cover safety (non-destructive), and an output schema exists (returns str), the description is complete. It includes prerequisites, parameter meanings, behavioral context, and status details, leaving no significant gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, but the description's 'Args' section details all parameters (task_id, executor_id, message), explaining their purposes (e.g., 'Optional message to the agent explaining qualifications'). It compensates well for the schema gap, though it doesn't specify format details like UUID length.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Apply to work on a published task') and resource ('task'), distinguishing it from siblings like em_assign_task (agent assigns) or em_submit_work (submit completed work). It specifies the worker's role in applying, not other actions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The 'Requirements' section explicitly lists four conditions for when to use this tool (e.g., task must be 'published', worker registered), and the description contrasts with agent actions (agent reviews applications). It implicitly guides away from alternatives like em_assign_task for agents.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_approve_submissionAInspect
    Approve or reject a submission from a human executor.

    Use this after reviewing the evidence submitted by a human.
    - "accepted": Task is complete, payment will be released
    - "disputed": Opens a dispute (evidence insufficient)
    - "more_info_requested": Ask for additional evidence

    Args:
        params (ApproveSubmissionInput): Validated input parameters containing:
            - submission_id (str): UUID of the submission
            - agent_id (str): Your agent ID (for authorization)
            - verdict (SubmissionVerdict): accepted, disputed, or more_info_requested
            - notes (str): Explanation of your verdict

    Returns:
        str: Confirmation of the verdict.
    
ParametersJSON Schema
NameRequiredDescriptionDefault
paramsYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate this is a non-destructive, non-idempotent, non-read-only operation. The description adds valuable context about payment release for 'accepted' verdicts and dispute opening for 'disputed', which goes beyond annotations. However, it doesn't address rate limits, authentication requirements beyond agent_id, or detailed error conditions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the core purpose, followed by usage guidance, parameter explanations, and return value. Every sentence adds value with no redundant information, making it efficient for an AI agent to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (financial/payment implications, multiple verdict options) and rich input schema with 8 parameters, the description does a good job covering the essential semantics. With an output schema present, it doesn't need to explain return values. However, it could better address the partial verdict scenario and payment authorization parameters mentioned in the schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage (the schema has rich descriptions but coverage metric is 0%), the description compensates by explaining the key parameters: submission_id, agent_id, verdict options, and notes. It provides semantic meaning for the verdict enum values, though it doesn't cover all parameters like rating_score, release_percent, or payment_auth fields.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('Approve or reject') and resource ('a submission from a human executor'), distinguishing it from siblings like em_escrow_release or em_resolve_dispute. It explicitly identifies the action and target resource.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool ('Use this after reviewing the evidence submitted by a human') and offers clear alternatives for different verdicts (accepted, disputed, more_info_requested). It establishes a specific workflow context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_assign_taskAInspect
    Assign a published task to a specific worker (executor).

    This tool performs eligibility verification before assignment:
    1. Verifies worker exists and is active
    2. Checks reputation meets task minimum
    3. Verifies worker is not at concurrent task limit
    4. Updates task status to ACCEPTED
    5. Notifies worker (optional)

    Args:
        params (AssignTaskInput): Validated input parameters containing:
            - task_id (str): UUID of the task
            - agent_id (str): Your agent ID (for authorization)
            - executor_id (str): Worker's executor ID to assign
            - notes (str): Optional notes for the worker
            - skip_eligibility_check (bool): Skip checks (default: False)
            - notify_worker (bool): Send notification (default: True)

    Returns:
        str: Confirmation of assignment with worker details.
    
ParametersJSON Schema
NameRequiredDescriptionDefault
paramsYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide basic hints (non-readOnly, non-destructive, non-idempotent, openWorld), but the description adds significant behavioral context beyond this: it details the 5-step eligibility verification process, mentions status updates to 'ACCEPTED', and notes optional notification. This clarifies side-effects and operational flow, though it doesn't address rate limits or auth specifics beyond agent_id.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a clear purpose statement, numbered verification steps, and organized parameter/return sections. It's appropriately sized but could be more front-loaded; the verification details, while useful, might be better summarized after the core purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (eligibility checks, status updates) and annotations covering safety, the description is mostly complete: it explains the process, parameters, and returns. With an output schema present, return values need not be detailed. Minor gaps include lack of error handling or prerequisite states.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, but the description's 'Args' section provides meaningful context for all parameters, explaining their roles (e.g., 'agent_id (for authorization)', 'skip_eligibility_check (use with caution)'). However, it doesn't add syntax or format details beyond what the schema's titles and descriptions already cover, so it meets baseline expectations.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Assign a published task to a specific worker') and distinguishes it from siblings like 'em_apply_to_task' (which implies worker-initiated application) and 'em_accept_agent_task' (which suggests agent acceptance). It specifies the resource ('published task') and target ('specific worker'), making the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context through 'published task' and eligibility verification steps, suggesting this is for assigning already-created tasks to workers. However, it does not explicitly state when to use this tool versus alternatives like 'em_apply_to_task' (worker self-assignment) or 'em_publish_task' (task creation), nor does it mention prerequisites such as task publication status.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_batch_create_tasksAInspect
    Create multiple tasks in a single operation with escrow calculation.

    ⚠️ **WARNING**: This tool BYPASSES the standard payment flow by calling
    db.create_task() directly instead of using the REST API (POST /api/v1/tasks).
    This means it skips x402 payment verification and balance checks.
    For production use, tasks should be created via the REST API to ensure
    proper payment authorization and escrow handling.

    Supports two operation modes:
    - ALL_OR_NONE: Atomic creation (all tasks or none)
    - BEST_EFFORT: Create as many as possible

    Process:
    1. Validates all tasks in batch
    2. Calculates total escrow required
    3. Creates tasks (atomic or best-effort) - **BYPASSING PAYMENT FLOW**
    4. Returns summary with all task IDs

    Args:
        params (BatchCreateTasksInput): Validated input parameters containing:
            - agent_id (str): Your agent identifier
            - tasks (List[BatchTaskDefinition]): List of tasks (max 50)
            - payment_token (str): Payment token (default: USDC)
            - operation_mode (BatchOperationMode): all_or_none or best_effort
            - escrow_wallet (str): Optional custom escrow wallet

    Returns:
        str: Summary of created tasks with IDs and escrow details.
    
ParametersJSON Schema
NameRequiredDescriptionDefault
paramsYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds significant behavioral context beyond annotations. While annotations indicate it's not read-only (readOnlyHint: false) and not destructive (destructiveHint: false), the description discloses critical behavioral traits: it bypasses standard payment flow, skips x402 verification and balance checks, operates in two modes with different atomicity guarantees, and outlines the 4-step process. This provides essential context about security implications and operational behavior that annotations don't cover.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections: purpose statement, warning, operation modes, process steps, and parameter/return explanations. Each sentence adds value, though the Args section could be slightly more concise. The warning is appropriately prominent, and information is front-loaded with the most critical details first.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given this is a complex mutation tool with significant security implications, the description provides comprehensive context. It covers purpose, warnings, operation modes, process flow, parameter semantics, and return value explanation. With annotations covering basic safety profile and an output schema presumably documenting the return structure, the description fills all necessary gaps, especially the critical payment bypass warning that wouldn't be captured elsewhere.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description carries the full burden of explaining parameters. It provides meaningful semantic context for all key parameters: agent_id ('Your agent identifier'), tasks ('List of tasks (max 50)'), payment_token ('Payment token (default: USDC)'), operation_mode ('all_or_none or best_effort'), and escrow_wallet ('Optional custom escrow wallet'). While it doesn't detail nested properties of BatchTaskDefinition, it gives sufficient high-level understanding of what each parameter represents.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Create multiple tasks in a single operation with escrow calculation.' It specifies the verb ('create'), resource ('tasks'), scope ('multiple...in a single operation'), and key feature ('escrow calculation'). It distinguishes from sibling tools like em_publish_task by emphasizing batch creation and bypassing standard payment flows.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidance with a warning section that states when NOT to use this tool ('For production use, tasks should be created via the REST API') and explains the alternative approach. It also clarifies the two operation modes (ALL_OR_NONE vs BEST_EFFORT) and their implications, giving clear context for when to choose each mode.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_browse_agent_tasksB
Read-onlyIdempotent
Inspect

Browse tasks available for agent execution.

ParametersJSON Schema
NameRequiredDescriptionDefault
paramsYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint=true, destructiveHint=false, idempotentHint=true, and openWorldHint=false, covering safety and idempotency. The description adds context about 'browsing' which suggests exploration/filtering rather than direct retrieval, aligning with the rich parameter schema for filtering. However, it doesn't mention pagination behavior (implied by limit/offset) or response format details beyond what the schema provides.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise with a single sentence that directly states the tool's purpose. No wasted words or unnecessary elaboration. The description is front-loaded and efficiently communicates the core function without structural issues.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has rich annotations (read-only, idempotent, non-destructive), a detailed input schema with 8 parameters, and an output schema exists (though not shown), the description is reasonably complete. It identifies the tool as a browsing mechanism for tasks, which aligns with the filtering parameters. However, it could better explain the relationship to other task-retrieval tools and the browsing vs. direct retrieval distinction.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, but the input schema is highly detailed with titles, defaults, constraints, and enums for category and response_format. The description doesn't add any parameter-specific information beyond 'browse,' which vaguely implies filtering/searching. With comprehensive schema documentation, the baseline is 3 even though the description adds minimal parameter semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Browse tasks available for agent execution' clearly states the action (browse) and resource (tasks), but it's vague about scope and doesn't differentiate from sibling tools like em_get_tasks or em_get_my_tasks. It specifies 'for agent execution' which provides some context, but lacks specificity about what browsing entails versus other retrieval tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like em_get_tasks, em_get_my_tasks, or em_get_task. The description implies a browsing/searching function but doesn't clarify prerequisites, when this is the appropriate choice, or what distinguishes it from other task-retrieval tools in the sibling list.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_calculate_feeA
Read-onlyIdempotent
Inspect
Calculate the fee breakdown for a potential task.

Use this to preview how much workers will receive after platform fees.

Args:
    bounty_usd: Bounty amount in USD
    category: Task category

Returns:
    str: Fee breakdown details.
ParametersJSON Schema
NameRequiredDescriptionDefault
categoryYes
bounty_usdYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The annotations already provide strong behavioral hints: readOnlyHint=true, destructiveHint=false, idempotentHint=true, and openWorldHint=false. The description adds some context by specifying this is for 'potential task' calculations and 'preview' purposes, which aligns with the read-only nature. However, it doesn't disclose additional behavioral traits like rate limits, authentication needs, or error conditions beyond what annotations cover.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and appropriately sized: it starts with a clear purpose statement, provides usage guidance, lists parameters with brief explanations, and specifies the return type. Every sentence earns its place without redundancy, and information is front-loaded for quick comprehension.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (2 parameters, no nested objects) and the presence of annotations and an output schema, the description is reasonably complete. It covers purpose, usage, parameters, and return type. However, it could be more comprehensive by explaining the relationship with sibling tools like 'em_get_fee_structure' or providing examples of fee breakdown outputs, though the output schema mitigates some of this need.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the schema provides no parameter descriptions. The description includes an 'Args' section that lists 'bounty_usd' and 'category' with brief explanations ('Bounty amount in USD', 'Task category'), adding meaningful semantics beyond the bare schema. However, it doesn't detail format constraints (e.g., numeric ranges for bounty_usd) or explain the enum values for category, leaving some gaps in parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Calculate the fee breakdown for a potential task' and 'preview how much workers will receive after platform fees.' It specifies the verb ('calculate'), resource ('fee breakdown'), and scope ('potential task'), but doesn't explicitly differentiate from sibling tools like 'em_get_fee_structure' which might provide different fee-related information.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool: 'Use this to preview how much workers will receive after platform fees.' This indicates it's for pre-task planning or estimation. However, it doesn't explicitly mention when not to use it or name alternatives among the sibling tools, such as distinguishing from 'em_get_fee_structure' which might retrieve actual fee data rather than calculate projections.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_cancel_taskA
Destructive
Inspect
    Cancel a task you published (only if still in 'published' or 'accepted' status).

    Use this if you no longer need the task completed.

    Args:
        params (CancelTaskInput): Validated input parameters containing:
            - task_id (str): UUID of the task to cancel
            - agent_id (str): Your agent ID (for authorization)
            - reason (str): Reason for cancellation

    Returns:
        str: Confirmation of cancellation.
    
ParametersJSON Schema
NameRequiredDescriptionDefault
paramsYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable behavioral context beyond what annotations provide: it specifies status constraints ('published' or 'accepted'), clarifies ownership ('you published'), and mentions authorization needs. While annotations already indicate destructiveHint=true and readOnlyHint=false, the description provides specific operational constraints that aren't captured in structured fields.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with clear sections: purpose statement, usage guidance, parameter summary, and return value. Each sentence earns its place, and the information is front-loaded with the most critical details (what the tool does and when to use it) appearing first.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a destructive operation with one complex parameter object, the description provides good context about preconditions, ownership, and usage. The existence of an output schema means the description doesn't need to detail return values. However, it could better address potential edge cases or error conditions given the destructive nature of the operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description provides a helpful summary of the three parameters (task_id, agent_id, reason) and their purposes. However, it doesn't add significant semantic detail beyond what's already evident from parameter names, and doesn't explain format requirements like UUID validation or length constraints that the schema specifies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Cancel a task you published') with precise conditions ('only if still in 'published' or 'accepted' status'), distinguishing it from sibling tools like em_get_task or em_publish_task. It explicitly identifies the resource (task) and scope (user's own published tasks).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool ('if you no longer need the task completed') and includes important preconditions ('only if still in 'published' or 'accepted' status'). This helps differentiate it from tools like em_resolve_dispute or em_escrow_refund that might handle task termination in other contexts.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_check_escrow_stateA
Read-onlyIdempotent
Inspect
Query the on-chain escrow state for a task (Fase 2 mode only).

Returns the current escrow state from the AuthCaptureEscrow contract:
- capturableAmount: Funds available for release to worker
- refundableAmount: Funds available for refund to agent
- hasCollectedPayment: Whether initial deposit was collected

Args:
    task_id: UUID of the task to check

Returns:
    JSON with escrow state, or error if not in fase2 mode or no escrow found.
ParametersJSON Schema
NameRequiredDescriptionDefault
task_idYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true, destructiveHint=false, idempotentHint=true, and openWorldHint=true. The description adds valuable context about the specific contract (AuthCaptureEscrow) and the three specific return fields, which goes beyond what annotations provide. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured: purpose statement, return details, parameter explanation, and error conditions in four clear sections. Every sentence adds value with zero wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has comprehensive annotations, an output schema (implied by 'Returns: JSON with escrow state'), and only one parameter, the description provides complete context. It explains the specific return fields, error conditions, and usage constraints appropriate for this complexity level.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description compensates by explaining the single parameter ('task_id: UUID of the task to check'), adding semantic meaning beyond the schema's basic type information. It clarifies the format (UUID) and purpose.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Query the on-chain escrow state') and resource ('for a task'), with explicit scope limitation ('Fase 2 mode only'). It distinguishes from siblings like 'em_escrow_status' by focusing on contract state rather than general status.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage context: 'Fase 2 mode only' and 'or error if not in fase2 mode or no escrow found.' This clearly defines when the tool is applicable versus when alternatives might be needed, though it doesn't name specific alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_check_identityA
Read-onlyIdempotent
Inspect
    Check if a wallet address has an ERC-8004 identity on-chain.

    Args:
        wallet_address: Ethereum wallet address (0x-prefixed)
        network: Network to check (default: "base")

    Returns:
        Identity status: registered/not_registered, agent_id if found.
    
ParametersJSON Schema
NameRequiredDescriptionDefault
networkNobase
wallet_addressYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, non-destructive, idempotent, and open-world behavior. The description adds value by specifying the on-chain nature of the check and the return format (identity status and agent_id), which are not covered by annotations. No contradiction with annotations exists.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose, followed by structured Args and Returns sections. Every sentence earns its place by clarifying parameters and outputs without redundancy, making it efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity, rich annotations (covering safety and behavior), and the presence of an output schema (implied by Returns section), the description is largely complete. It explains the action, parameters, and return values adequately, though usage context could be enhanced.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description compensates by explaining both parameters: 'wallet_address' as an Ethereum address with 0x prefix and 'network' with its default value. This adds essential meaning beyond the bare schema, though it could detail network options or validation rules.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Check if a wallet address has an ERC-8004 identity on-chain'), identifies the resource (wallet address), and distinguishes it from siblings by focusing on identity verification rather than task management, escrow, or other operations. The verb 'check' is precise and the scope is well-defined.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It does not mention prerequisites, exclusions, or related tools like 'em_register_identity' for identity creation, leaving the agent to infer usage from context alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_check_submissionA
Read-onlyIdempotent
Inspect
Check submissions for a task you published.

Use this to see if a human has submitted evidence for your task.
You can then use em_approve_submission to accept or reject.

Args:
    params (CheckSubmissionInput): Validated input parameters containing:
        - task_id (str): UUID of the task
        - agent_id (str): Your agent ID (for authorization)
        - response_format (ResponseFormat): markdown or json

Returns:
    str: Submission details or "No submissions yet".
ParametersJSON Schema
NameRequiredDescriptionDefault
paramsYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable behavioral context beyond annotations: it explains the authorization requirement ('Your agent ID (for authorization)'), describes the possible return values ('Submission details or "No submissions yet"'), and clarifies it's for checking human submissions. While annotations already indicate read-only, idempotent, and non-destructive behavior, this additional context about authorization and return format is helpful.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly structured: purpose statement first, usage guidance second, parameter documentation third, return value last. Every sentence earns its place with zero wasted words. The Args/Returns formatting is clear and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity, rich annotations (readOnlyHint, idempotentHint, etc.), and the presence of an output schema, the description provides complete context. It covers purpose, usage workflow, parameter meanings, and return behavior, making it fully self-contained for an AI agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description carries the full burden. It provides meaningful context for all parameters: task_id ('UUID of the task'), agent_id ('Your agent ID (for authorization)'), and response_format ('markdown or json'). This adds substantial value beyond the bare schema, though it doesn't explain UUID format constraints or default values.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Check submissions') and resource ('for a task you published'), distinguishing it from siblings like em_approve_submission (which handles acceptance/rejection) and em_get_task (which retrieves task details). The opening sentence provides a complete purpose statement.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use this tool ('to see if a human has submitted evidence for your task') and provides a clear alternative ('You can then use em_approve_submission to accept or reject'). This gives perfect guidance on the workflow sequence and distinguishes it from related tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_escrow_authorizeAInspect
    Lock a task bounty in escrow via the PaymentOperator contract.

    This is the first step for escrow-based payment strategies.
    Funds are locked on-chain and can later be released to the worker
    or refunded to the agent.

    The on-chain flow: Agent USDC -> PaymentOperator.authorize() -> Escrow contract

    Args:
        params: task_id, receiver wallet, amount, strategy, optional tier override

    Returns:
        Authorization result with transaction hash and payment info.
    
ParametersJSON Schema
NameRequiredDescriptionDefault
paramsYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint=false and destructiveHint=false, indicating a non-destructive write operation. The description adds valuable context beyond annotations by explaining the on-chain flow ('Agent USDC -> PaymentOperator.authorize() -> Escrow contract'), funds being locked, and the purpose as part of a payment strategy, though it doesn't detail rate limits or auth requirements.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, with the first sentence stating the core purpose, followed by context and details in clear, efficient sentences. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of an on-chain escrow operation, the description is complete enough: it explains the purpose, usage context, parameters, and return value ('Authorization result with transaction hash and payment info.'), and with an output schema present, it doesn't need to detail return values further.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description carries full burden. It lists parameters (task_id, receiver wallet, amount, strategy, optional tier override) and adds meaning by specifying 'task bounty' and 'worker wallet address', but does not fully explain each parameter's role or constraints beyond what the schema's properties and descriptions already detail.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Lock a task bounty in escrow') and resource ('via the PaymentOperator contract'), distinguishing it from sibling tools like 'em_escrow_release' or 'em_escrow_refund' by specifying it as the 'first step for escrow-based payment strategies'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It provides clear context by stating this is 'the first step for escrow-based payment strategies' and mentions later steps (release or refund), but does not explicitly name when not to use it or list specific alternatives among siblings like 'em_escrow_charge' or 'em_escrow_partial_release'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_escrow_chargeAInspect
    Make an instant payment to a worker without escrow.

    The on-chain flow: Agent USDC -> PaymentOperator.charge() -> Worker USDC (direct)

    Best for:
    - Micro-tasks under $5
    - Trusted workers with >90% reputation
    - Time-sensitive payments

    This is a single-step operation. Funds go directly to the worker.

    Args:
        params: task_id, receiver wallet, amount, optional tier

    Returns:
        Transaction result with hash and confirmation.
    
ParametersJSON Schema
NameRequiredDescriptionDefault
paramsYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable behavioral context beyond annotations: it explains the on-chain flow (Agent USDC -> PaymentOperator.charge() -> Worker USDC), notes it's a 'single-step operation', and describes the return value. Annotations provide readOnlyHint=false and destructiveHint=false, which align with the description's 'payment' action, so no contradiction. However, it doesn't mention rate limits or authentication requirements.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the core purpose, followed by usage guidelines, operational details, and parameter/return info. Every sentence adds value without redundancy, making it efficient and easy to parse for an AI agent.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (payment operation with blockchain context), the description is complete: it covers purpose, usage, behavior, parameters, and returns. With annotations providing safety hints and an output schema handling return values, the description fills all necessary gaps without over-explaining structured data.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description lists parameters (task_id, receiver wallet, amount, optional tier) and adds context like 'UUID of the task' and 'Worker wallet address', which complements the schema. With 0% schema description coverage, the description compensates well by providing semantic meaning, though it doesn't detail constraints like amount limits or tier values beyond the schema's properties.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verb ('Make an instant payment') and resource ('to a worker'), distinguishing it from sibling tools like escrow-based payment tools (e.g., em_escrow_release, em_escrow_refund). It explicitly mentions 'without escrow' and 'directly to the worker', making the scope unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool ('Best for: micro-tasks under $5, trusted workers with >90% reputation, time-sensitive payments') and implicitly when not to use it (for escrow-based or larger payments). It differentiates from alternatives by highlighting the single-step, direct payment nature, though it doesn't name specific sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_escrow_disputeA
Destructive
Inspect
    Initiate a post-release dispute refund.

    WARNING: NOT FUNCTIONAL IN PRODUCTION. The protocol team has not yet
    implemented the required tokenCollector contract. This tool will fail.

    For dispute resolution, the recommended approach is to keep funds in
    escrow and use em_escrow_refund (refund-in-escrow) instead. This
    guarantees funds are available and under arbiter control.

    This tool is kept for future use when the protocol implements
    tokenCollector support.

    Args:
        params: task_id, optional amount to dispute

    Returns:
        Dispute result (will fail - tokenCollector not implemented).
    
ParametersJSON Schema
NameRequiredDescriptionDefault
paramsYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds significant behavioral context beyond annotations. Annotations indicate destructiveHint=true and readOnlyHint=false, but the description clarifies that the tool is non-functional in production, will fail due to missing tokenCollector contract, and explains the recommended workaround. This provides crucial operational warnings not captured in annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and appropriately sized. It front-loads the purpose, immediately follows with critical warnings, provides alternative guidance, and explains future use. Every sentence serves a clear purpose, though the warning section could be slightly more concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (destructive operation with implementation issues), the description is remarkably complete. It covers purpose, current limitations, alternatives, and future use. With annotations providing safety hints and an output schema existing, the description focuses on the critical contextual gaps—especially the non-functional status—making it highly complete for agent decision-making.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description mentions parameters ('task_id, optional amount to dispute') but adds minimal semantic value beyond the input schema. The schema has 0% description coverage, but the description doesn't elaborate on parameter meanings or usage. It states 'Disputes full bounty if not specified' which slightly clarifies the amount_usdc parameter, but overall compensation for low schema coverage is limited.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Initiate a post-release dispute refund.' It specifies the verb ('Initiate'), resource ('post-release dispute refund'), and distinguishes it from sibling tools like em_escrow_refund, which is for refunds while funds are still in escrow.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when NOT to use this tool: 'WARNING: NOT FUNCTIONAL IN PRODUCTION... This tool will fail.' It recommends an alternative: 'use em_escrow_refund (refund-in-escrow) instead.' It also explains the future use case: 'when the protocol implements tokenCollector support.'

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_escrow_partial_releaseAInspect
    Release a partial payment for proof-of-attempt and refund the remainder.

    This is a two-step operation:
    1. Release X% to the worker (reward for attempting the task)
    2. Refund (100-X)% to the agent

    Common use case: Worker attempted the task but couldn't fully complete it.
    Default is 15% release for proof-of-attempt.

    Args:
        params: task_id, release_percent (1-99, default 15%)

    Returns:
        Both transaction results with amounts.
    
ParametersJSON Schema
NameRequiredDescriptionDefault
paramsYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable behavioral context beyond annotations: it explains the two-step operation (release to worker, refund to agent), specifies the default percentage (15%), and clarifies the 'proof-of-attempt' rationale. While annotations provide some hints (non-readOnly, non-destructive, non-idempotent), the description enriches understanding of the specific financial transaction behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly structured and concise: it starts with a clear purpose statement, explains the two-step operation in bullet points, provides usage context, and documents parameters and returns in labeled sections. Every sentence adds value with zero redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's financial transaction complexity and the presence of an output schema (which handles return values), the description is complete: it explains the operation's purpose, when to use it, behavioral details, parameter semantics, and return overview. No critical information is missing for agent understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description fully compensates by explaining both parameters: 'task_id' is implied through context, and 'release_percent' is described with its purpose (percentage to release to worker), range (1-99), and default (15%). The description adds meaning about how these parameters drive the two-step operation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Release a partial payment for proof-of-attempt and refund the remainder') and distinguishes it from siblings like 'em_escrow_release' (full release) and 'em_escrow_refund' (full refund). It explicitly identifies the two-step operation and the target resources (worker payment, agent refund).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool ('Common use case: Worker attempted the task but couldn't fully complete it') and includes a default behavior ('Default is 15% release for proof-of-attempt'). It distinguishes this from alternatives by describing its unique partial-release+refund mechanism compared to full release or refund tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_escrow_recommend_strategyA
Read-onlyIdempotent
Inspect
    Recommend the best payment strategy for a task based on its parameters.

    Uses the Execution Market Agent Decision Tree to select the optimal payment flow.
    When ERC-8004 on-chain reputation is available, it takes precedence.

    Decision logic:
    - High reputation (>90%) + micro amount (<$5) -> instant_payment
    - External dependency (weather, events) -> escrow_cancel
    - Quality review needed + high value (>=$50) -> dispute_resolution
    - Low reputation (<50%) + high value (>=$50) -> dispute_resolution
    - Default -> escrow_capture

    Args:
        params: Amount, reputation, and task characteristics

    Returns:
        Recommended strategy with explanation and tier timings.
    
ParametersJSON Schema
NameRequiredDescriptionDefault
paramsYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true, destructiveHint=false, idempotentHint=true, and openWorldHint=false. The description adds valuable behavioral context beyond this: it explains the decision logic (e.g., precedence of ERC-8004 reputation, specific thresholds for strategies) and mentions 'tier timings' in returns. This enhances understanding of how the tool behaves without contradicting annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the purpose, followed by decision logic and parameter/return summaries. Every sentence adds value: the first states the purpose, the second adds context, the third details logic, and the last two cover inputs/outputs. There is no wasted text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (decision logic with multiple conditions), annotations (covering safety and idempotency), and the presence of an output schema (so returns need not be detailed), the description is complete. It explains the tool's purpose, usage context, behavioral logic, and summarizes parameters and returns adequately for an AI agent to use it correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It adds meaning by summarizing parameters ('Amount, reputation, and task characteristics') and linking them to decision logic (e.g., 'High reputation (>90%) + micro amount (<$5)'). However, it does not detail all specific parameters like 'external_dependency' or 'requires_quality_review' explicitly, leaving some gaps.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Recommend the best payment strategy for a task based on its parameters' and 'Uses the Execution Market Agent Decision Tree to select the optimal payment flow.' This is specific (verb+resource) and distinguishes it from sibling tools like em_escrow_authorize or em_escrow_release which perform actions rather than recommendations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool: 'for a task based on its parameters' and 'Uses the Execution Market Agent Decision Tree.' It also implies usage by detailing decision logic scenarios. However, it does not explicitly state when NOT to use it or name specific alternatives among siblings, keeping it from a perfect score.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_escrow_refundAInspect
    Refund escrowed funds back to the agent (cancel task).

    The on-chain flow: Escrow contract -> PaymentOperator.refundInEscrow() -> Agent USDC

    Use this when a task is cancelled before completion.
    Only works if funds are still in escrow (not yet released).

    Args:
        params: task_id, optional amount (defaults to full bounty)

    Returns:
        Transaction result with hash and gas used.
    
ParametersJSON Schema
NameRequiredDescriptionDefault
paramsYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable behavioral context beyond what annotations provide. While annotations indicate this is not read-only (readOnlyHint: false) and not destructive (destructiveHint: false), the description clarifies the specific on-chain transaction flow, the precondition about funds needing to be in escrow, and that it returns a transaction result. This provides important implementation details that annotations don't cover.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with clear sections: purpose statement, technical flow, usage conditions, parameters, and returns. Every sentence adds value without redundancy. The formatting with clear sections makes it easy to parse while remaining concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's transactional nature and the presence of an output schema (which handles return values), the description provides complete context. It covers purpose, usage conditions, behavioral details, parameter semantics, and references the transaction result. For a tool with good annotations and output schema, this description fills all necessary gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description compensates well by explaining both parameters: task_id is clearly identified, and amount_usdc is described as optional with default behavior ('defaults to full bounty'). The description adds meaningful context about parameter usage that the schema alone doesn't provide.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Refund escrowed funds back to the agent') and distinguishes it from siblings by specifying it's for canceling tasks before completion. It explicitly mentions the on-chain flow and contrasts with other escrow-related tools like em_escrow_release or em_escrow_partial_release by focusing on refunds rather than releases.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool ('when a task is cancelled before completion') and when not to use it ('Only works if funds are still in escrow (not yet released)'). It clearly distinguishes this from other escrow operations by specifying the refund context rather than release or dispute scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_escrow_releaseAInspect
    Release escrowed funds to the worker after task approval.

    The on-chain flow: Escrow contract -> PaymentOperator.release() -> Worker USDC

    This is an irreversible operation. Once released, funds go directly
    to the worker's wallet. For dispute resolution after release,
    use em_escrow_dispute.

    Args:
        params: task_id, optional amount (defaults to full bounty)

    Returns:
        Transaction result with hash and gas used.
    
ParametersJSON Schema
NameRequiredDescriptionDefault
paramsYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds significant behavioral context beyond annotations: it discloses the irreversible nature of the operation, describes the on-chain flow, specifies that funds go directly to worker's wallet, and mentions transaction return format. Annotations cover basic hints but don't provide this operational detail.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Perfectly structured with clear sections: purpose statement, technical flow, behavioral warnings, parameter summary, and return value. Every sentence earns its place with zero redundancy. The information is front-loaded with the core action first.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Complete for a blockchain transaction tool: covers purpose, irreversible nature, on-chain flow, parameter semantics, return format, and alternative tools. With output schema handling return values, the description focuses appropriately on behavioral context and usage guidance.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description compensates well by explaining both parameters: task_id context and amount behavior (optional, defaults to full bounty). It doesn't provide format details like UUID or numeric constraints, but adds meaningful semantic context missing from schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Release escrowed funds') and target ('to the worker after task approval'), distinguishing it from siblings like em_escrow_dispute and em_escrow_partial_release. It uses precise terminology and identifies the exact resource being manipulated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use ('after task approval') and when not to use (implies not for disputes). Provides a clear alternative for dispute resolution ('use em_escrow_dispute') and distinguishes from partial release by noting default full bounty release.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_escrow_statusA
Read-onlyIdempotent
Inspect
    Get the current escrow payment status for a task.

    Returns the payment state including:
    - Authorization status
    - Amount locked, released, and refunded
    - Transaction hashes
    - Current payment strategy

    Args:
        params: task_id

    Returns:
        Payment status details or "not found" if task has no escrow.
    
ParametersJSON Schema
NameRequiredDescriptionDefault
paramsYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true, destructiveHint=false, and idempotentHint=true, indicating a safe, non-destructive read operation. The description adds valuable behavioral context beyond annotations by specifying the return structure (e.g., authorization status, amounts, transaction hashes, payment strategy) and the 'not found' response for tasks without escrow, enhancing the agent's understanding of tool behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the core purpose, followed by bullet points for returns and clear sections for Args and Returns. Every sentence adds value without redundancy, making it efficient and easy to parse for an AI agent.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (payment status retrieval), the description is complete: it explains the purpose, parameter, return details, and edge cases ('not found'). With annotations covering safety and an output schema likely detailing the return structure, no additional information is needed for effective tool invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0% description coverage, but the description compensates by clearly explaining the single parameter ('task_id') in the Args section and specifying it as a UUID for identifying the task. This adds essential meaning beyond the schema's basic type constraints, though it could benefit from more detail on format or validation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Get the current escrow payment status') and resource ('for a task'), distinguishing it from siblings like 'em_check_escrow_state' or 'em_get_payment_info' by focusing on detailed payment status retrieval rather than general checks or broader info.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for usage ('Get the current escrow payment status for a task') and implies when to use it based on the need for detailed payment status. However, it does not explicitly state when not to use it or name alternatives among siblings, such as 'em_check_escrow_state' for a simpler check.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_get_arbiter_verdictA
Read-onlyIdempotent
Inspect
Get the Ring 2 arbiter verdict for a task or submission.

Returns the dual-inference verdict (PHOTINT + Arbiter) including decision,
score, tier used, evidence hash, commitment hash, and dispute status if
the submission was escalated to L2 human review.

Only available for tasks that were created with arbiter_mode != "manual"
and after Phase B verification has completed.

Args:
    params (GetArbiterVerdictInput): Validated input containing:
        - task_id (str, optional): UUID of the task
        - submission_id (str, optional): UUID of the submission
        - response_format (ResponseFormat): markdown or json
        (at least one of task_id or submission_id must be provided)

Returns:
    str: Arbiter verdict details or error message if not yet evaluated.
ParametersJSON Schema
NameRequiredDescriptionDefault
paramsYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, covering core safety aspects. The description adds valuable behavioral context beyond annotations: it specifies the dual-inference nature (PHOTINT + Arbiter), lists the returned data fields, mentions dispute status for escalated cases, and clarifies availability constraints. No contradiction with annotations exists.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the core purpose, followed by return details, usage constraints, and parameter explanations. Every sentence adds value without redundancy, and the bullet-like formatting in Args and Returns sections enhances readability efficiently.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (retrieving specialized verdicts with constraints), the description is complete: it covers purpose, return content, availability rules, and parameters. With annotations providing safety context and an output schema existing (though not shown), no critical gaps remain for agent understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, but the description's Args section compensates by explaining the three parameters (task_id, submission_id, response_format) and their optionality rules. However, it doesn't add significant semantic context beyond what's implied by the schema's structure and titles, such as UUID format details or precedence logic already in the schema's description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verb ('Get') and resource ('Ring 2 arbiter verdict for a task or submission'), distinguishing it from sibling tools like em_check_submission or em_get_task which have different retrieval scopes. It precisely identifies what is being retrieved.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit when-to-use guidance: 'Only available for tasks that were created with arbiter_mode != "manual" and after Phase B verification has completed.' It also distinguishes from alternatives by specifying this is for arbiter verdicts, not general task or submission checks.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_get_fee_structureA
Read-onlyIdempotent
Inspect
Get the current platform fee structure.

Returns information about:
- Fee rates by task category (6-8%)
- Minimum and maximum limits
- Treasury wallet address
- Worker vs platform distribution

Returns:
    str: Fee structure details in markdown format.
ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable context beyond annotations by specifying what information is returned (fee rates, limits, wallet address, distribution). Annotations already indicate read-only, non-destructive, and idempotent behavior, but the description usefully details the return content format (markdown) and structure categories.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly structured: a clear purpose statement followed by bullet points detailing return content and a final note about format. Every sentence earns its place with zero redundant information. The information is front-loaded with the most important details first.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has no parameters, comprehensive annotations (readOnlyHint, idempotentHint, etc.), and an output schema exists, the description provides complete context. It details what information will be returned in a helpful structured format, making the tool's behavior fully understandable to an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0 parameters and 100% schema description coverage, the baseline is 4. The description appropriately doesn't discuss parameters since none exist, and instead focuses on the return value semantics, which is the relevant information for this tool.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with 'Get the current platform fee structure' (verb+resource). It distinguishes from siblings by focusing on fee structure retrieval rather than task operations, payments, or escrow management. However, it doesn't explicitly differentiate from potential similar 'get' tools like em_get_payment_info or em_get_task_analytics.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided about when to use this tool versus alternatives. The description doesn't mention prerequisites, timing considerations, or compare it to other fee-related tools like em_calculate_fee. Users must infer usage from the purpose alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_get_my_executionsB
Read-onlyIdempotent
Inspect

Get tasks the agent has accepted/completed.

ParametersJSON Schema
NameRequiredDescriptionDefault
paramsYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide key behavioral hints: readOnlyHint=true, destructiveHint=false, idempotentHint=true, openWorldHint=false. The description adds minimal context by implying it retrieves task data, but doesn't disclose additional traits like rate limits, authentication needs, or what 'accepted/completed' entails. No contradiction with annotations exists, so the baseline is met with slight value added.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, clear sentence with no wasted words. It's front-loaded with the core purpose and efficiently conveys the scope. Every part of the sentence earns its place by specifying what is being retrieved and for whom.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (retrieval with filtering), rich annotations (safety and idempotency hints), and the presence of an output schema (which handles return values), the description is reasonably complete. It states the purpose and scope adequately, though it lacks usage guidelines and parameter insights, which are partially mitigated by structured data.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the schema provides no parameter descriptions. The tool description mentions no parameters, failing to compensate for this gap. However, the tool has only 1 parameter (a nested object with 4 fields), and the schema includes titles and constraints (e.g., limit range, executor_id length), providing some structure. The description adds no semantic value beyond what's inferable from the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Get tasks the agent has accepted/completed.' It specifies the verb ('Get') and resource ('tasks'), and clarifies the scope is limited to tasks the agent has accepted or completed. However, it doesn't explicitly differentiate from sibling tools like 'em_get_my_tasks' or 'em_get_tasks', which likely have different scopes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention sibling tools like 'em_get_my_tasks' or 'em_get_tasks', nor does it specify prerequisites, exclusions, or appropriate contexts for usage. The agent must infer usage from the tool name and description alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_get_my_tasksA
Read-onlyIdempotent
Inspect
    Get your assigned tasks, pending applications, and recent submissions.

    Use this to see:
    - Tasks assigned to you (in progress)
    - Pending applications waiting for agent approval
    - Recent submissions and their verdict status
    - Summary of your activity

    Args:
        params (GetMyTasksInput): Validated input parameters containing:
            - executor_id (str): Your executor ID
            - status (TaskStatus): Optional filter by task status
            - include_applications (bool): Include pending applications (default: True)
            - limit (int): Max results (default: 20)
            - response_format (ResponseFormat): markdown or json

    Returns:
        str: Your tasks and applications in requested format.
    
ParametersJSON Schema
NameRequiredDescriptionDefault
paramsYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable behavioral context beyond annotations: it explains what types of data are returned (tasks, applications, submissions, activity summary) and mentions default values (include_applications: True, limit: 20, response_format: markdown). Annotations already cover safety (readOnlyHint: true, destructiveHint: false) and idempotency, so the description appropriately focuses on operational details without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded: the first sentence states the purpose, followed by a bulleted list of what's included, then parameter details, and return information. Every sentence earns its place, with no redundant or vague phrasing, making it highly efficient for an AI agent.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (1 parameter with nested object), rich annotations (readOnlyHint, idempotentHint, etc.), and the presence of an output schema (implied by 'Returns' section), the description is complete. It covers purpose, usage, parameters, and output format, leaving no gaps for the agent to infer behavior, especially since annotations handle safety and idempotency.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description compensates by explaining all parameters in the Args section, detailing executor_id, status filtering, include_applications, limit, and response_format. However, it doesn't add meaning beyond what the schema's property names and enums imply (e.g., it doesn't clarify what 'pending applications' entail or how 'recent submissions' are defined). The coverage is adequate but not insightful.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('Get your assigned tasks, pending applications, and recent submissions') and distinguishes it from siblings by focusing on the worker's perspective (e.g., 'your assigned tasks', 'your executor ID'). It explicitly lists what the tool returns, making it distinct from broader task-browsing tools like em_browse_agent_tasks or em_get_tasks.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool ('Get your assigned tasks...') and implies it's for workers viewing their own tasks, not for agents or administrators. However, it doesn't explicitly state when NOT to use it or name specific alternatives (e.g., em_get_tasks for all tasks vs. em_get_my_tasks for worker-specific tasks), which prevents a perfect score.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_get_payment_infoA
Read-onlyIdempotent
Inspect
Get payment details needed to approve a task submission (Fase 1 mode).

External agents use this to get the exact addresses and amounts they need
to sign 2 EIP-3009 authorizations: one for the worker and one for the
platform fee.

Args:
    task_id: UUID of the task
    submission_id: UUID of the submission to approve

Returns:
    JSON with worker_address, treasury_address, bounty_amount, fee_amount,
    token details, and signing parameters.
ParametersJSON Schema
NameRequiredDescriptionDefault
task_idYes
submission_idYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The annotations already provide readOnlyHint=true, destructiveHint=false, idempotentHint=true, and openWorldHint=true, covering safety and idempotency. The description adds valuable context about the tool's role in a multi-step workflow (approval process) and the specific signing requirements (EIP-3009 authorizations for worker and platform fee), which isn't captured in annotations. No contradiction with annotations exists.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the core purpose, followed by specific usage context, then parameter and return value sections. Every sentence earns its place by adding unique value: the first explains what it does, the second why it's used, and the structured sections provide essential technical details without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (workflow-specific, cryptographic signing context), the description provides complete context: it explains the purpose, usage scenario, parameters, and return values. With annotations covering behavioral traits and an output schema implied by the 'Returns' section, no critical gaps remain. The description effectively bridges the structured data with practical application knowledge.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, but the description includes an 'Args' section that documents both parameters (task_id and submission_id) with brief explanations. However, it doesn't add significant semantic context beyond naming them (e.g., format details like UUID validation or relationships between parameters). With two parameters fully listed, it meets the baseline for adequate but not enriched documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('get payment details') and resources ('task submission'), and distinguishes it from siblings by specifying 'Fase 1 mode' and the exact use case for external agents needing to sign EIP-3009 authorizations. It goes beyond a simple read operation to explain the downstream application of the retrieved data.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use this tool: 'to approve a task submission (Fase 1 mode)' and 'External agents use this to get the exact addresses and amounts they need to sign 2 EIP-3009 authorizations'. It clearly differentiates from siblings like 'em_approve_submission' by focusing on the preparatory payment information retrieval rather than the approval action itself.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_get_reputationA
Read-onlyIdempotent
Inspect
    Get on-chain reputation for an agent from the ERC-8004 Reputation Registry.

    Provide either agent_id (numeric ERC-8004 token ID) or wallet_address.

    Args:
        agent_id: ERC-8004 agent token ID (e.g. 2106)
        wallet_address: Agent's wallet address (resolved to agent_id)
        network: ERC-8004 network (default: "base")

    Returns:
        Reputation score, rating count, and network info.
    
ParametersJSON Schema
NameRequiredDescriptionDefault
networkNobase
agent_idNo
wallet_addressNo

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The annotations already provide excellent behavioral context (readOnlyHint: true, openWorldHint: true, idempotentHint: true, destructiveHint: false). The description adds valuable context about what the tool returns ('Reputation score, rating count, and network info') and clarifies the relationship between agent_id and wallet_address parameters. This goes beyond what annotations provide without contradicting them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly structured and concise: a clear purpose statement, parameter guidance, and return value description in just four sentences. Every sentence earns its place, with no wasted words or redundant information. The Args/Returns formatting makes it easily scannable.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity, rich annotations covering safety and behavior, and the existence of an output schema, the description provides exactly what's needed. It explains the purpose, parameters, and return values without duplicating what's already in structured fields. The description is complete for a read-only query tool with good annotation coverage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description carries the full burden of explaining parameters. It successfully explains all three parameters: agent_id ('numeric ERC-8004 token ID'), wallet_address ('Agent's wallet address (resolved to agent_id)'), and network ('ERC-8004 network'). It also clarifies the either/or relationship between agent_id and wallet_address, adding significant value beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Get on-chain reputation') and resource ('for an agent from the ERC-8004 Reputation Registry'), distinguishing it from sibling tools which focus on tasks, payments, escrow, and other agent operations. The verb 'Get' combined with the specific resource makes the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context about when to use this tool ('Get on-chain reputation for an agent') and specifies the two alternative ways to identify the agent (agent_id or wallet_address). However, it doesn't explicitly state when NOT to use it or name specific alternative tools for related queries, keeping it from a perfect score.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_get_taskB
Read-onlyIdempotent
Inspect
Get detailed information about a specific task.

Args:
    params (GetTaskInput): Validated input parameters containing:
        - task_id (str): UUID of the task
        - response_format (ResponseFormat): markdown or json

Returns:
    str: Task details in requested format.
ParametersJSON Schema
NameRequiredDescriptionDefault
paramsYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds minimal behavioral context beyond what annotations already provide. Annotations clearly indicate this is a read-only, non-destructive, idempotent operation with open-world semantics. The description adds only that it returns 'Task details in requested format,' which provides some output context but doesn't elaborate on what details are included, error conditions, or authentication requirements. No contradiction with annotations exists.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and appropriately sized with a clear purpose statement followed by Args and Returns sections. Every sentence serves a purpose: the first states the tool's function, the second documents parameters, and the third specifies the return. No wasted words, though the formatting could be more concise by integrating parameter descriptions into the main text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (single read operation with 2 parameters), good annotations covering safety and behavior, and the presence of an output schema (implied by 'Returns' section), the description is reasonably complete. It covers the basic purpose, parameters, and return format. The main gap is lack of sibling differentiation, but for a straightforward retrieval tool with comprehensive annotations, this is adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description carries the full burden of parameter documentation. It correctly identifies both parameters (task_id and response_format) and provides basic semantics ('UUID of the task' and 'markdown or json'), but doesn't explain format differences, UUID validation rules, or default behavior. The schema itself has good descriptions, so the description adds some value but doesn't fully compensate for the coverage gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose as 'Get detailed information about a specific task' with a specific verb ('Get') and resource ('task'), making it immediately understandable. However, it doesn't explicitly differentiate from sibling tools like 'em_get_tasks' (plural) or 'em_get_my_tasks', which might cause confusion about when to use this single-task retrieval versus batch retrieval tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. With multiple sibling tools that also retrieve task information (em_get_tasks, em_get_my_tasks, em_browse_agent_tasks), there's no indication of when this single-task retrieval is preferred over batch operations or filtered views. The description simply states what it does without contextual usage information.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_get_task_analyticsA
Read-onlyIdempotent
Inspect
    Get comprehensive analytics and metrics for your tasks.

    Provides insights on:
    - Task completion rates and performance
    - Financial metrics (bounties paid, averages)
    - Time-to-completion statistics
    - Quality metrics (disputes, resubmissions)
    - Geographic distribution
    - Top worker performance

    Args:
        params (GetTaskAnalyticsInput): Validated input parameters containing:
            - agent_id (str): Your agent ID
            - days (int): Number of days to analyze (default: 30)
            - include_worker_details (bool): Include top workers (default: True)
            - include_geographic (bool): Include location data (default: True)
            - category_filter (TaskCategory): Filter to specific category
            - response_format (ResponseFormat): markdown or json

    Returns:
        str: Analytics in requested format with actionable insights.
    
ParametersJSON Schema
NameRequiredDescriptionDefault
paramsYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, non-destructive, idempotent, and open-world behavior. The description adds value by specifying the comprehensive nature of analytics, listing insight categories, and noting the output includes 'actionable insights.' It doesn't contradict annotations and provides useful context beyond them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with a clear purpose statement, bulleted insights, and organized parameter documentation. Slightly verbose but each section adds value. Could be more front-loaded by moving the Args section earlier, but overall efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (analytics with multiple parameters) and rich annotations, the description is fairly complete. It details parameters, insights, and output format. With an output schema present, it doesn't need to explain return values. Minor gaps include lack of usage scenarios and sibling differentiation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, but the description compensates well by detailing all parameters in the Args section with explanations and defaults. It adds meaning beyond the bare schema, clarifying each parameter's role in filtering and formatting analytics, though it doesn't explain enum values for TaskCategory or ResponseFormat.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Get comprehensive analytics and metrics for your tasks' and provides a bulleted list of specific insights. It distinguishes from siblings by focusing on analytics rather than task operations like creation or submission, though it doesn't explicitly name alternatives.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives is provided. The description mentions what insights are available but doesn't specify scenarios, prerequisites, or contrast with other analytics-related tools (none exist among siblings). Usage context is implied rather than stated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_get_tasksA
Read-onlyIdempotent
Inspect
Get tasks from the Execution Market system with optional filters.

Use this to monitor your published tasks or browse available tasks.

Args:
    params (GetTasksInput): Validated input parameters containing:
        - agent_id (str): Filter by agent ID (your tasks only)
        - status (TaskStatus): Filter by status (published, accepted, completed, etc.)
        - category (TaskCategory): Filter by category
        - limit (int): Max results (1-100, default 20)
        - offset (int): Pagination offset (default 0)
        - response_format (ResponseFormat): markdown or json

Returns:
    str: List of tasks in requested format.

Examples:
    - Get my published tasks: agent_id="0x...", status="published"
    - Get all completed tasks: status="completed"
    - Browse physical tasks: category="physical_presence"
ParametersJSON Schema
NameRequiredDescriptionDefault
paramsYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable context beyond annotations by specifying the tool's filtering capabilities, pagination behavior (limit and offset), and output format options. While annotations already declare readOnlyHint=true and destructiveHint=false, the description provides operational details about what the tool actually returns and how results are structured, though it doesn't mention rate limits or authentication requirements.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and efficiently organized with clear sections (purpose, usage guidance, args, returns, examples). Every sentence adds value, with no redundant information. The front-loaded purpose statement immediately communicates the tool's function, followed by progressively detailed information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multiple filtering parameters, pagination, format options) and the presence of annotations but no output schema, the description provides complete context. It explains what the tool does, when to use it, all parameters with semantics, return format, and practical examples. The combination of annotations and description gives the agent everything needed to correctly invoke this tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Despite 0% schema description coverage, the description comprehensively documents all 6 parameters with clear explanations of their purposes, constraints, and defaults. It provides semantic meaning for each parameter (e.g., 'Filter by agent ID (your tasks only)', 'Max results (1-100, default 20)', 'Pagination offset (default 0)'), fully compensating for the schema's lack of descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verb ('Get') and resource ('tasks from the Execution Market system'), and distinguishes it from siblings by emphasizing filtering capabilities. It explicitly mentions monitoring published tasks or browsing available tasks, which differentiates it from tools like em_get_my_tasks or em_browse_agent_tasks.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool: 'Use this to monitor your published tasks or browse available tasks.' It includes practical examples that demonstrate different use cases (getting my published tasks, getting all completed tasks, browsing physical tasks), giving clear context for application.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_publish_taskAInspect
    Publish a new task for human execution in the Execution Market.

    This tool creates a task that human executors can browse, accept, and complete.
    Tasks require evidence of completion which the agent can later verify.

    Args:
        params (PublishTaskInput): Validated input parameters containing:
            - agent_id (str): Your agent identifier (wallet or ERC-8004 ID)
            - title (str): Short task title (5-255 chars)
            - instructions (str): Detailed instructions (20-5000 chars)
            - category (TaskCategory): Task category
            - bounty_usd (float): Payment amount in USD (0-10000)
            - deadline_hours (int): Hours until deadline (1-720)
            - evidence_required (List[EvidenceType]): Required evidence types
            - evidence_optional (List[EvidenceType]): Optional evidence types
            - location_hint (str): Location description
            - min_reputation (int): Minimum executor reputation
            - payment_token (str): Payment token symbol (default: USDC)
            - payment_network (str): Payment network (default: base)
            - arbiter_mode (str): Verification mode for evidence approval.
                'manual' (default): you review and approve submissions yourself.
                'auto': Ring 2 ArbiterService evaluates evidence using PHOTINT
                        forensic checks + LLM semantic analysis, then auto-releases
                        funds on PASS or auto-refunds on FAIL. No agent action needed.
                'hybrid': arbiter recommends a verdict, you confirm before payment.
                Cost: 0 for tasks <$1, ~$0.001 for $1-$10, ~$0.003 for >=$10.
                Hard cap: arbiter spend never exceeds 10% of bounty.
            - gps_required (bool | None): Override GPS verification behavior.
                None (default): auto-detect — digital tasks (screenshot, json, etc.)
                                skip GPS, physical tasks require it.
                False: explicitly disable GPS check (use for screenshot tasks,
                       remote work, or any task where location is irrelevant).
                True: enforce GPS even for non-physical categories.

    Returns:
        str: Success message with task ID and details, or error message.
    
ParametersJSON Schema
NameRequiredDescriptionDefault
paramsYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate this is a non-destructive, non-idempotent write operation. The description adds valuable behavioral context beyond annotations: it explains the evidence verification process, arbiter modes with cost structures, and hard caps on arbiter spend. This provides crucial operational details not captured in annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (purpose, behavior, args, returns) and uses bullet points for parameter details. While comprehensive, some sentences could be more concise (e.g., the arbiter mode explanation is quite detailed). Overall, it's appropriately sized for a complex tool.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (13 parameters, financial transactions, evidence verification), the description provides complete context. It covers purpose, usage, detailed parameter semantics, behavioral traits, and return values. With annotations covering safety aspects and an output schema handling return structure, the description fills all necessary gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description carries the full burden of parameter documentation. It provides comprehensive semantic explanations for all 13 parameters, including constraints, defaults, and practical implications (e.g., arbiter mode costs, geocoding behavior for location parameters). This adds significant value beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verb ('Publish') and resource ('task for human execution in the Execution Market'). It distinguishes itself from siblings by focusing on task creation rather than task management, acceptance, or submission operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context about when to use this tool ('creates a task that human executors can browse, accept, and complete') and mentions verification capabilities. However, it doesn't explicitly contrast with alternatives like 'em_batch_create_tasks' or specify when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_rate_agentAInspect
    Rate an AI agent after completing a task (worker -> agent feedback).

    Submits on-chain reputation feedback via the ERC-8004 Reputation Registry.

    Args:
        task_id: UUID of the completed task
        score: Rating from 0 (worst) to 100 (best)
        comment: Optional comment about the agent

    Returns:
        Rating result with transaction hash, or error message.
    
ParametersJSON Schema
NameRequiredDescriptionDefault
scoreYes
commentNo
task_idYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable behavioral context beyond annotations: it specifies this is for on-chain reputation feedback via ERC-8004, which implies blockchain transaction behavior. Annotations provide basic hints (readOnly=false, openWorld=true, etc.), but the description adds the specific reputation system context and return format expectations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly structured: purpose statement first, mechanism second, parameter documentation third, return value fourth. Every sentence earns its place with zero wasted words, and it's appropriately sized for a 3-parameter tool.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (on-chain transaction with reputation system), the description provides complete context: purpose, mechanism, parameters, and return values. With output schema present, it doesn't need to detail return structure but still mentions what to expect. Annotations provide additional behavioral hints.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description fully compensates by explaining all three parameters: task_id (UUID of completed task), score (rating 0-100), and comment (optional comment about agent). It provides clear semantic meaning beyond just parameter names.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Rate an AI agent'), the context ('after completing a task'), and the mechanism ('Submits on-chain reputation feedback via the ERC-8004 Reputation Registry'). It distinguishes from siblings like 'em_rate_worker' by specifying it's for agent feedback rather than worker feedback.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context ('after completing a task') and implies usage timing, but doesn't explicitly state when not to use it or name alternatives. It distinguishes from 'em_rate_worker' by context but doesn't mention other potential rating tools or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_rate_workerAInspect
    Rate a worker after reviewing their submission.

    Submits on-chain reputation feedback via the ERC-8004 Reputation Registry.
    If no score is provided, a dynamic score is computed from the submission.

    Args:
        submission_id: UUID of the submission to rate
        score: Rating from 0 (worst) to 100 (best). Optional — auto-scored if omitted.
        comment: Optional comment about the worker's performance

    Returns:
        Rating result with transaction hash, or error message.
    
ParametersJSON Schema
NameRequiredDescriptionDefault
scoreNo
commentNo
submission_idYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable behavioral context beyond the annotations: it discloses the on-chain nature of the operation ('submits on-chain reputation feedback'), mentions the dynamic scoring behavior when score is omitted, and describes the return format. While annotations cover basic safety (readOnlyHint=false, destructiveHint=false), the description provides implementation-specific details that help the agent understand what actually happens.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly structured and concise: purpose statement first, key behavioral details second, parameter explanations third, and return format last. Every sentence earns its place with zero redundancy, and the information is front-loaded with the most important details about what the tool does.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (on-chain operation with optional parameters and dynamic behavior), the description provides complete context. It explains the purpose, usage, parameters, and return format. With an output schema present, the description appropriately focuses on behavioral context rather than detailing return values, making it well-balanced for the agent's needs.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description fully compensates by explaining all three parameters: submission_id (UUID of submission to rate), score (rating scale 0-100 with optional auto-scoring), and comment (optional performance feedback). It provides crucial semantic information not present in the bare schema, including the UUID format, scoring range, and the dynamic scoring behavior when score is omitted.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Rate a worker after reviewing their submission') and identifies the resource (worker's submission). It distinguishes this from sibling tools like 'em_approve_submission' or 'em_rate_agent' by focusing specifically on worker rating with on-chain reputation feedback.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool ('after reviewing their submission') and mentions the optional score parameter with auto-scoring behavior. However, it doesn't explicitly state when NOT to use it or name specific alternatives among the sibling tools, though the context implies it's for rating workers rather than agents or other actions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_register_as_executorB
Idempotent
Inspect

Register as an agent executor on Execution Market.

ParametersJSON Schema
NameRequiredDescriptionDefault
paramsYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide substantial behavioral information (idempotentHint=true, destructiveHint=false, openWorldHint=true, readOnlyHint=false). The description adds minimal context beyond this - it doesn't explain what 'registering as an agent executor' entails operationally, what permissions are required, or any rate limits. However, it doesn't contradict the annotations, so it gets a baseline score for adding some value while relying on annotations for core behavioral traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that states the core purpose without unnecessary words. It's appropriately sized for a registration operation and front-loads the essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (registration with 6 parameters, some optional), the presence of annotations helps, and an output schema exists (though not provided here). However, the description lacks crucial context about what registration entails, when it's needed, and parameter meanings. For a registration tool in a multi-tool ecosystem, this is minimally adequate but leaves significant gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage for the 6 parameters, the description provides no information about what parameters are needed or their meanings. The description doesn't mention wallet_address, capabilities, display_name, or the optional parameters, leaving the agent to rely entirely on the schema structure without semantic guidance.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Register') and the target ('as an agent executor on Execution Market'), providing a specific verb+resource combination. However, it doesn't differentiate this registration tool from sibling tools like 'em_register_identity', which appears to be a related but distinct registration operation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, prerequisites, or timing considerations. With multiple sibling tools in the Execution Market ecosystem (including other registration and task management tools), the lack of contextual guidance is a significant gap.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_register_identityAInspect
    Register a new ERC-8004 identity on-chain (gasless via Facilitator).

    The Facilitator pays all gas fees. The minted ERC-721 NFT is
    transferred to the specified wallet address.

    Args:
        wallet_address: Wallet address to register and receive the NFT
        mode: Must be "gasless" (only supported mode)
        network: ERC-8004 network (default: "base")

    Returns:
        Registration result with agent_id and transaction hash.
    
ParametersJSON Schema
NameRequiredDescriptionDefault
modeNogasless
networkNobase
wallet_addressYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable behavioral context beyond annotations: it explains that gas fees are paid by the Facilitator, specifies the NFT transfer destination, and mentions the return format. While annotations cover read/write and idempotency hints, the description provides practical implementation details that help the agent understand the tool's behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with clear sections: purpose statement, key behavioral details, parameter explanations, and return information. Every sentence adds value with no redundancy or filler content. The formatting with Args/Returns sections enhances readability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (on-chain registration with gasless operation), the description provides complete context: purpose, behavioral details, parameter explanations, and return information. With annotations covering safety aspects and an output schema presumably detailing the return structure, the description fills all necessary gaps for agent understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description compensates well by explaining all three parameters: wallet_address purpose ('to register and receive the NFT'), mode constraint ('Must be "gasless"'), and network default. It adds meaningful context beyond the bare schema, though it could elaborate on parameter formats or validation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Register a new ERC-8004 identity on-chain'), identifies the resource (ERC-8004 identity/ERC-721 NFT), and distinguishes from siblings by specifying gasless operation via Facilitator. It goes beyond the tool name to explain what actually happens.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context about when to use this tool ('gasless via Facilitator') and mentions the only supported mode. However, it doesn't explicitly contrast with alternatives or state when NOT to use it compared to other identity-related tools like 'em_check_identity'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_resolve_disputeA
Destructive
Inspect
Submit a resolution verdict on a Ring 2 escalated dispute.

Who can call this:
    1. The publishing agent (always, for their own task disputes)
    2. Eligible human arbiters (reputation_score >= 80 AND
       tasks_completed >= 10 in the same category)

Verdict options:
    - 'release': worker wins -> triggers Facilitator /settle
    - 'refund':  agent wins  -> triggers Facilitator /refund
    - 'split':   partial release + partial refund
                 (requires split_pct = agent's refund %, 0-100)

Side effects:
    - Updates the dispute row (status, winner, resolution_type='manual',
      agent_refund_usdc, executor_payout_usdc)
    - Triggers the appropriate payment flow via existing Facilitator paths
    - Emits dispute.resolved event on the event bus
    - Destructive: moves funds on-chain (use carefully)

Args:
    params (ResolveDisputeInput):
        - dispute_id (str): UUID of the dispute
        - verdict (str): 'release' | 'refund' | 'split'
        - reason (str): justification (5-2000 chars, stored in audit trail)
        - split_pct (float, optional): required for 'split' verdict (0-100)
        - response_format (ResponseFormat): markdown | json

Returns:
    str: Success message with dispute ID, verdict, amounts, and triggered
         payment action, or error message.
ParametersJSON Schema
NameRequiredDescriptionDefault
paramsYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds substantial behavioral context beyond annotations. While annotations indicate destructiveHint=true and readOnlyHint=false, the description details specific side effects (updates dispute row, triggers payment flows, emits events), explicitly warns 'Destructive: moves funds on-chain (use carefully)', and explains the payment consequences of each verdict option. This provides crucial operational context that annotations alone don't convey.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (purpose, eligibility, verdict options, side effects, args, returns) and front-loaded key information. While comprehensive, some sections like the detailed eligibility criteria and side effects list are necessary for this complex tool. A minor point: the 'Args' section could be slightly more concise by integrating with the earlier verdict explanations.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (destructive payment operations, eligibility restrictions, multiple verdict types) and the presence of an output schema, the description is remarkably complete. It covers purpose, usage context, behavioral consequences, parameter semantics, and return expectations. The output schema handles return structure, allowing the description to focus on operational semantics rather than response formatting details.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage (the schema has no descriptions for the overall parameters object), the description fully compensates by explaining all parameters in the 'Args' section. It clarifies each parameter's purpose, constraints (e.g., '5-2000 chars' for reason, 'required for split verdict'), and the relationship between verdict and split_pct. This adds essential meaning not present in the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the specific action ('Submit a resolution verdict') on a specific resource ('Ring 2 escalated dispute'). It clearly distinguishes this from sibling tools like em_escrow_dispute (which initiates disputes) or em_escrow_release/refund (which handle payments directly) by focusing on the arbitration resolution function.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit eligibility criteria ('Who can call this') with specific conditions (publishing agent always, human arbiters with reputation_score >= 80 AND tasks_completed >= 10). It also implicitly indicates when to use by specifying it's for 'Ring 2 escalated dispute' resolution, distinguishing it from lower-level dispute handling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_server_statusA
Read-onlyIdempotent
Inspect
Get the current status of the Execution Market MCP server and its integrations.

Returns:
    str: Server status including WebSocket connections, x402 status, etc.
ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, non-destructive, and idempotent behavior, which the description does not contradict. The description adds value by specifying what the status includes (WebSocket connections, x402 status), providing useful context beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose in the first sentence, followed by a concise return specification. Both sentences are necessary and add value, with no wasted words or irrelevant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (0 parameters, annotations covering safety, and an output schema indicated), the description is complete enough. It explains what the tool does and what it returns, aligning well with the structured data without gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0 parameters and 100% schema description coverage, the baseline is high. The description does not need to cover parameters, and it appropriately focuses on the tool's function and return details without redundancy.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Get' and the resource 'current status of the Execution Market MCP server and its integrations', making the purpose specific and distinct from sibling tools that handle tasks, payments, or disputes rather than server monitoring.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for checking server health, but does not explicitly state when to use this tool versus alternatives (e.g., for diagnostics vs. operational actions). No exclusions or specific contexts are provided, leaving usage to inference.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_submit_agent_workAInspect

Submit completed work as an agent executor.

    On auto-approval:
    - Calculates Fase 5 fees (13% platform fee)
    - Logs payment events to audit trail
    - Records fee breakdown in submission metadata

    On auto-rejection:
    - Records structured rejection feedback
    - Reverts task to accepted (agent can retry)
    
ParametersJSON Schema
NameRequiredDescriptionDefault
paramsYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable behavioral context beyond annotations: it details what happens on auto-approval (fee calculation, payment logging, metadata recording) and auto-rejection (feedback recording, task reversion). Annotations provide basic hints (readOnlyHint=false, destructiveHint=false, etc.), but the description enriches this with specific business logic outcomes, though it doesn't cover error handling or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and concise, with zero wasted words. It starts with a clear purpose statement, followed by bullet points detailing outcomes for auto-approval and auto-rejection. Each sentence earns its place by adding specific behavioral context, making it easy to scan and understand.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (involves payment processing and state changes), annotations provide basic hints, and an output schema exists (so return values needn't be explained). The description adds key behavioral details but doesn't cover all aspects like error conditions or idempotency (though idempotentHint=false is in annotations). It's mostly complete but has minor gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description carries full burden for parameter meaning. It doesn't mention any parameters explicitly, leaving all parameter semantics undocumented. However, with only 1 parameter (a nested object 'params'), the baseline is 4, but the description fails to explain what 'params' contains (e.g., task_id, executor_id, result_data), resulting in a score of 3 due to this gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Submit completed work as an agent executor.' This specifies the verb ('Submit') and resource ('completed work'), and the context ('as an agent executor') distinguishes it from generic submission tools. However, it doesn't explicitly differentiate from sibling tools like 'em_submit_work', leaving some ambiguity about when to use this specific agent-focused tool versus the more general one.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context through the details about auto-approval and auto-rejection behaviors, suggesting this tool is for finalizing agent work submissions. However, it doesn't provide explicit guidance on when to use this tool versus alternatives like 'em_submit_work' or 'em_approve_submission', nor does it mention prerequisites (e.g., task must be in a specific state). The implied context is helpful but incomplete.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_submit_workAInspect
    Submit completed work with evidence for an assigned task.

    After completing a task, use this to submit your evidence for review.
    The agent will verify your submission and release payment if approved.

    Requirements:
    - You must be assigned to this task
    - Task must be in 'accepted' or 'in_progress' status
    - Evidence must match the task's evidence_schema
    - All required evidence fields must be provided

    Args:
        params (SubmitWorkInput): Validated input parameters containing:
            - task_id (str): UUID of the task
            - executor_id (str): Your executor ID
            - evidence (dict): Evidence matching the task's requirements
            - notes (str): Optional notes about the submission

    Returns:
        str: Confirmation of submission or error message.

    Status Flow:
        accepted/in_progress -> submitted -> verifying -> completed

    Evidence Format Examples:
        Photo task:
            {"photo": "ipfs://Qm...", "gps": {"lat": 25.76, "lng": -80.19}}

        Document task:
            {"document": "https://storage.../doc.pdf", "timestamp": "2026-01-25T10:30:00Z"}

        Observation task:
            {"text_response": "Store is open, 5 people in line", "photo": "ipfs://..."}
    
ParametersJSON Schema
NameRequiredDescriptionDefault
paramsYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable behavioral context beyond annotations: it explains the verification process ('agent will verify your submission and release payment if approved'), status flow transitions, and evidence format requirements. While annotations provide basic hints (non-readOnly, non-destructive, etc.), the description enriches understanding of the submission workflow and outcomes. No contradiction with annotations exists.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (purpose, usage, requirements, args, returns, status flow, examples), but could be more front-loaded; the core purpose is stated first, yet some details like evidence examples are extensive. Every sentence adds value, though minor trimming of repetitive elements (e.g., 'Evidence must match...' and evidence format details) might improve conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (submission with evidence, status flows, payment implications) and lack of schema descriptions, the description provides complete context: it covers purpose, prerequisites, parameters with examples, return expectations, and behavioral workflow. The output schema exists, so return values needn't be detailed, and the description compensates fully for schema gaps with rich parameter and usage information.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Despite 0% schema description coverage, the description comprehensively explains all parameters in the 'Args' section, detailing each field's purpose (e.g., task_id as UUID, evidence matching requirements, notes as optional). It goes beyond schema by providing evidence format examples (photo, document, observation tasks), adding crucial semantic context not present in the structured schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('submit completed work with evidence') and resource ('assigned task'), distinguishing it from siblings like em_accept_agent_task or em_apply_to_task which handle task acceptance/application rather than submission of completed work. The verb 'submit' is precise and the context of 'completed work' differentiates it from other work-related tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool through the 'Requirements' section, stating prerequisites like being assigned to the task and task status conditions. It also implicitly distinguishes from alternatives by focusing on submission after completion, unlike em_apply_to_task (for applying) or em_approve_submission (for approving).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

em_withdraw_earningsAInspect
    Withdraw your available earnings to your wallet.

    After completing tasks and receiving payment approval, your earnings
    become available for withdrawal. This initiates a transfer to your
    registered wallet address via x402 protocol.

    Requirements:
    - Minimum withdrawal: $5.00 USDC
    - Must have available balance
    - Wallet address must be registered or provided

    Args:
        params (WithdrawEarningsInput): Validated input parameters containing:
            - executor_id (str): Your executor ID
            - amount_usdc (float): Amount to withdraw (None = all available)
            - destination_address (str): Optional different wallet address

    Returns:
        str: Withdrawal confirmation with transaction details, or error message.

    Fee Structure:
        - Platform fee: 13% (deducted from earnings, already accounted for)
        - Network gas: ~$0.50 (deducted from withdrawal amount)

    Networks:
        - Withdrawals are processed on Base network
        - USDC contract: 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913
    
ParametersJSON Schema
NameRequiredDescriptionDefault
paramsYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds significant behavioral context beyond annotations: it discloses fee structure (13% platform fee, ~$0.50 network gas), network details (Base network, USDC contract), and that withdrawals are processed via x402 protocol. Annotations indicate this is a non-readOnly, non-destructive operation (readOnlyHint=false, destructiveHint=false), which aligns with the description's 'withdraw' action. No contradiction exists.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (purpose, requirements, args, returns, fee structure, networks) and front-loaded key information. Most sentences earn their place, but some details like the USDC contract address could be considered extraneous for tool selection. It's appropriately sized for a financial transaction tool.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (financial withdrawal with fees and network details), the description is highly complete. It covers purpose, prerequisites, parameters, returns, fees, and network information. With annotations providing safety context (non-destructive) and an output schema existing (though not shown), the description fills all necessary gaps without redundancy.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage (schema provides only titles), the description compensates well by explaining all three parameters in the 'Args' section: executor_id (your executor ID), amount_usdc (amount to withdraw with None=all available), and destination_address (optional different wallet address). It adds meaning like 'None = all available' and default behavior, though it doesn't detail format constraints (e.g., 36-character length for executor_id).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verb ('withdraw') and resource ('your available earnings'), distinguishing it from siblings like em_get_payment_info or em_check_escrow_state which are read-only. It specifies the destination ('to your wallet') and mechanism ('via x402 protocol'), making the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidelines with a 'Requirements' section listing prerequisites (minimum withdrawal, available balance, registered wallet). It implicitly distinguishes from alternatives by focusing on withdrawal rather than checking balance (em_get_payment_info) or managing escrow (em_escrow_release). The context of 'after completing tasks and receiving payment approval' clarifies when to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Sign in to create a connector for this server.