Skip to main content
Glama

Server Details

Korean business record validation and workflow safety gates for AI agents.

Status
Healthy
Last Tested
Transport
Streamable HTTP
URL
Repository
hycore220/k-work-trust-public-demo
GitHub Stars
0

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client
Glama
MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.
Tool DescriptionsC

Average 2.9/5 across 26 of 26 tools scored. Lowest: 2.2/5.

Server CoherenceA
Disambiguation4/5

Most tools have clearly distinct purposes, especially the validation tools targeting different domains. However, there are a few pairs that could cause confusion, such as 'validate_policy_invariants' and 'prove_agent_policy_invariants', or 'estimate_agent_risk_bound' and 'estimate_agent_tail_risk'. The descriptions help, but an agent might misselect.

Naming Consistency5/5

All tool names follow a consistent verb_noun pattern using snake_case (e.g., compile_agent_plan, validate_vendor_onboarding). The naming style is uniform and predictable, making it easy for an agent to infer the action and target.

Tool Count2/5

With 26 tools, the server exceeds the recommended range of 3-15 for well-scoped tools. While the domain is broad, this many tools make the server feel overloaded and could be split into separate servers for agent safety and Korean document validation.

Completeness4/5

The tool set covers a comprehensive range of agent safety operations and Korean business document validations. Minor gaps exist, such as missing Korean email normalization, but overall the surface appears complete for the stated purpose.

Available Tools

28 tools
compile_agent_planBInspect

Compile an agent plan into a bounded transition model with action graph, normalized actions, and safety issues.

ParametersJSON Schema
NameRequiredDescriptionDefault
goalNo
actionsNo
plan_idNo
invariantsNo
environmentNo
money_budgetNo
risk_toleranceNo
data_sensitivityNo
max_autonomous_riskNo
operator_approval_idNo
operator_approval_presentNo
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, and the description offers minimal behavioral context. It does not indicate whether this tool performs a read-only analysis or modifies state, nor does it mention potential side effects, authorization needs, or output format beyond listing components. The name 'compile' suggests a transformation, but specifics are lacking.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence that conveys the core action and outputs. However, for a tool with 11 parameters and no schema descriptions, more structured breakdown (e.g., listing parameter roles) would improve usability without excessive length.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the large number of parameters (11), absence of output schema, and zero schema descriptions, the description is incomplete. It fails to explain what a 'bounded transition model' is, how actions are normalized, or what safety issues are identified. The tool's complexity demands richer documentation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, meaning no parameter descriptions exist in the schema. The description does not explain any of the 11 parameters (e.g., goal, actions, plan_id, risk_tolerance, operator_approval_id). The agent has no additional guidance beyond parameter names, which is insufficient for correct invocation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function: compiling an agent plan into a bounded transition model with action graph, normalized actions, and safety issues. It specifies both the input (agent plan) and the output components, making the purpose distinct from sibling tools that focus on validation, optimization, or normalization of other entities.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not provide explicit guidance on when to use this tool versus alternatives like 'estimate_agent_risk_bound' or 'prove_agent_policy_invariants'. Usage is implied by the action 'compile', but no context on prerequisites, when to avoid, or how it relates to other tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

compose_agent_trust_contractsBInspect

Compose multiple service trust contracts into a workflow graph certificate, surfacing weak services, incompatible dependency edges, and missing first-safe-call boundaries.

ParametersJSON Schema
NameRequiredDescriptionDefault
edgesNo
plan_idNo
servicesNoService contracts with id, trust score, profiles, failure modes, and first-safe-call metadata.
contractsNo
workflow_idNo
dependenciesNoEdges with from, to, required_profile, and required_failure_modes.
min_contract_scoreNoMinimum service contract score required for ready composition. Defaults to 70.
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden for behavioral disclosure. It mentions 'compose' and 'surfacing' but does not indicate whether this operation is read-only, destructive, requires authentication, or has side effects. Critical behavioral traits are omitted.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that is concise and front-loaded with the main action. Every word contributes to the purpose, and there is no redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of the tool (7 parameters, no required, no output schema, no annotations), the description is insufficient. It does not explain return values, prerequisites, or how the output relates to the input. The agent would lack critical context for proper invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 43%, so the description should compensate but does not. Most parameters (edges, plan_id, contracts, workflow_id) lack schema descriptions, and the tool description does not explain them. For the three parameters with schema descriptions (services, dependencies, min_contract_score), the description adds no new meaning beyond what is already in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool composes service trust contracts into a workflow graph certificate and surfaces specific issues like weak services and incompatible dependency edges. It uses a specific verb ('compose') and resource ('service trust contracts'), distinguishing it from sibling tools that are primarily validation or normalization tasks.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lacks any guidance on when to use this tool versus alternatives. No context is provided about prerequisites, scenarios, or when not to use it. Sibling tools are not mentioned, so the agent has no basis to choose this over related tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

estimate_agent_risk_boundCInspect

Estimate an upper risk bound for an agent plan using action risks and invariant penalties.

ParametersJSON Schema
NameRequiredDescriptionDefault
goalNo
actionsNo
plan_idNo
invariantsNo
environmentNo
money_budgetNo
risk_toleranceNo
data_sensitivityNo
max_autonomous_riskNo
operator_approval_idNo
operator_approval_presentNo
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description bears full responsibility. It mentions estimation but does not disclose side effects, required prior steps (e.g., plan compilation), or whether it is read-only or mutating. The behavioral disclosure is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single clear sentence that front-loads the purpose. It is concise, but more detail could be added without losing clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 11 parameters with no schema descriptions, no annotations, and no output schema, the description is severely incomplete. It lacks information on return format, required inputs, and usage context, making it insufficient for an agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, and the description only vaguely mentions 'action risks' and 'invariant penalties' which might correspond to the 'actions' and 'invariants' parameters, but it does not explain any of the 11 parameters (e.g., goal, money_budget, risk_tolerance). No parameter meaning is added beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it estimates an upper risk bound for an agent plan, specifying inputs (action risks and invariant penalties). However, it does not differentiate from sibling tools like compile_agent_plan or validate_agent_action_firewall, which could also involve risk assessment.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, no prerequisites, and no conditions under which it should or should not be invoked.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

estimate_agent_tail_riskCInspect

Estimate dependency-adjusted loss distribution, VaR, and CVaR tail risk for an agent plan.

ParametersJSON Schema
NameRequiredDescriptionDefault
goalNo
actionsNo
plan_idNo
loss_unitNoLoss unit such as KRW, USD, or risk_points.
environmentNo
money_budgetNo
loss_toleranceNoMaximum approved CVaR loss for autonomous execution.
risk_toleranceNo
confidence_levelNoVaR/CVaR confidence level, usually 0.95 or 0.99.
data_sensitivityNo
dependency_couplingNoNoisy-OR coupling from failed dependencies into downstream actions.
shared_risk_couplingNoCoupling for shared environment, data, secret, money, production, or flag risk.
max_loss_distribution_actionsNoMaximum number of actions for exact Bernoulli loss convolution.
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are present, and the description does not disclose behavioral traits such as whether the tool is read-only, destructive, requires specific authentication, or has rate limits. The minimal description only states the high-level function.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, concise sentence that efficiently communicates the core purpose without extraneous words. It is well-structured and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of the tool (13 parameters, no output schema, no annotations), the description is incomplete. It does not explain the return format, prerequisites, or how the estimates are intended to be used. The context provided is insufficient for an agent to fully understand the tool's functionality.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 46%, meaning many parameters lack descriptions. The tool's description does not add any meaning to the parameters beyond what is in the schema. It does not compensate for the missing schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool's purpose: estimating tail risk metrics (VaR, CVaR) for an agent plan, using a dependency-adjusted loss distribution. It is specific and sets it apart from generic risk tools, though it does not explicitly distinguish from sibling risk tools like estimate_agent_risk_bound.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. There is no mention of prerequisites, ideal use cases, or when not to use it. The description lacks any usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

find_agent_counterexampleCInspect

Find a short unsafe execution trace for an agent plan when supplied invariants fail.

ParametersJSON Schema
NameRequiredDescriptionDefault
goalNo
actionsNo
plan_idNo
invariantsNo
environmentNo
money_budgetNo
risk_toleranceNo
data_sensitivityNo
max_autonomous_riskNo
operator_approval_idNo
operator_approval_presentNo
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must fully disclose behavior. It mentions finding a 'short unsafe execution trace', but does not explain required permissions, side effects, or prerequisites (e.g., operator approval, budget constraints). The description is insufficient for a tool with no annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence, which is concise, but it lacks structure and essential details. It is not front-loaded with key information; it merely states the purpose without elaboration.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 11 parameters, no output schema, and no annotations, the description is far from complete. It does not explain the return value, usage constraints, or parameter roles, leaving the agent with significant ambiguity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, and the description provides no information about the 11 parameters (e.g., goal, actions, invariants). The description must compensate for the schema gap but fails entirely.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states what the tool does: find a short unsafe execution trace when invariants fail. It uses a specific verb and resource, and distinguishes from siblings like 'prove_agent_policy_invariants'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when invariants fail, but provides no explicit guidance on when to use this tool versus alternatives (e.g., prove_agent_policy_invariants) or when not to use it. No exclusions or context are given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

issue_agent_trust_certificateCInspect

Issue an Agent Trust Compiler certificate with plan model, invariant proof, temporal model check, counterexamples, risk bound, and autonomy budget.

ParametersJSON Schema
NameRequiredDescriptionDefault
goalNo
actionsNo
plan_idNo
invariantsNo
environmentNo
money_budgetNo
risk_toleranceNo
data_sensitivityNo
max_autonomous_riskNo
operator_approval_idNo
operator_approval_presentNo
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must fully disclose behavioral traits. It only says 'Issue a certificate' without specifying side effects, persistence, idempotency, permissions required, or whether it is destructive. The term 'issue' suggests creation but lacks confirmation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with no unnecessary words, but it is just a list of terms. While concise, it lacks structure (e.g., bullet points or ordered instructions). Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 11 parameters, no output schema, no annotations, and a complex domain, the description is severely incomplete. It does not explain return values, parameter relationships, or how to combine the components. Insufficient for an agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, and the description lists only a few conceptual components (plan model, invariant proof, etc.) which loosely map to some parameters but leaves many parameters (e.g., environment, money_budget, operator_approval_id) unexplained. No added meaning beyond parameter names.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Issue' and the resource 'Agent Trust Compiler certificate', and lists the included components (plan model, invariant proof, counterexamples, risk bound, autonomy budget), distinguishing it from sibling tools that produce individual components. However, it could be more precise about what issuing a certificate entails (e.g., final output, storage).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs. alternatives. The description implies it is the final step after gathering components, but does not state prerequisites, conditions, or when not to use. No mention of alternatives or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

issue_open_world_trust_certificateCInspect

Issue a bounded open-world trust certificate by composing proof packets, argument provenance, adversarial policy checks, assumptions, unknowns, and active monitors.

ParametersJSON Schema
NameRequiredDescriptionDefault
riskNoResidual risk, tolerance, CVaR, and loss limits.
actionNoThe high-stakes agent action to authorize.
evidenceNoEvidence items supporting required claims.
monitorsNoActive monitors covering assumptions or unknowns.
unknownsNoKnown unknowns with impact, monitorability, and mitigation.
argumentsNoAction arguments with source_type, critical, and transform provenance.
authorityNoScoped authority or approval evidence.
assumptionsNoExplicit world assumptions with checked_at, max_age_days, critical, and confidence.
current_controlsNo
attacker_strategiesNo
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description must fully disclose behavior. It only states that it 'issues' a certificate, but omits side effects, authorization needs, idempotence, or whether it is destructive. This is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that front-loads the verb and resource. It could be broken into shorter sentences for clarity, but it efficiently packs relevant components.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema exists, yet the description does not explain what the tool returns. It also fails to define key terms like 'bounded open-world trust certificate', leaving the agent without sufficient context for a 10-parameter tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 80%, so the baseline is 3. The description does not add meaning beyond listing parameter names; no extra details on how parameters are used together.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'issue' and the resource 'bounded open-world trust certificate', and lists composed elements. However, it is heavily jargon-laden and does not distinguish from the sibling 'issue_agent_trust_certificate'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus its siblings, nor any context about when not to use it. The description only explains what it does, not the decision criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

model_check_agent_temporal_safetyCInspect

Model-check every dependency-respecting execution order within bounds and return temporal counterexamples plus minimal interventions.

ParametersJSON Schema
NameRequiredDescriptionDefault
goalNo
actionsNo
plan_idNo
max_tracesNoMaximum dependency-respecting traces to explore before returning a partial proof.
environmentNo
money_budgetNo
risk_toleranceNo
data_sensitivityNo
max_model_check_actionsNoMaximum action count for exhaustive schedule enumeration.
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, description must suffice. It mentions returning counterexamples and interventions but does not disclose if the tool has side effects, prerequisites, or resource consumption beyond 'within bounds'. The term 'minimal interventions' is undefined.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

A single sentence, no fluff, but too terse given the tool's complexity. Lacks structure like bullet points or sections.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

9 parameters, no output schema, no annotations – the description fails to explain inputs, outputs, or constraints. Incomplete for an agent to invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 22% schema description coverage (2 out of 9 parameters described), the tool description adds no parameter details. Key parameters like 'goal', 'actions', 'environment', etc., are unexplained, forcing reliance on parameter names alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool model-checks execution orders for temporal safety and returns counterexamples and interventions. It distinguishes itself from siblings like 'find_agent_counterexample' by specifying 'dependency-respecting execution order' and 'temporal' focus, though not explicitly naming alternatives.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus siblings such as 'prove_agent_policy_invariants' or 'find_agent_counterexample'. No when-not-to-use or context provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

normalize_korean_addressAInspect

Normalize Korean address text and expose ambiguity that needs review or an external address API.

ParametersJSON Schema
NameRequiredDescriptionDefault
addressYesRaw Korean address.
postal_codeNoOptional postal code supplied by the caller.
use_external_address_apiNoWhether to call an approved address normalization source when configured.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description must cover behavior. It mentions normalization, ambiguity exposure, and optional external API call, but omits details like error handling, prerequisites, or side effects. Adequate but not comprehensive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that front-loads the key actions (normalize, expose ambiguity) without unnecessary words. Highly concise and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema is present, yet the description fails to mention what the tool returns (e.g., normalized address string, ambiguity flags). The mention of 'expose ambiguity' is vague. For a 3-parameter tool, more detail on outputs and behavior is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage for all parameters. The description adds functional context but does not significantly enhance parameter understanding beyond schema definitions. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool normalizes Korean addresses and exposes ambiguity, which is specific and distinct from sibling tools that handle company names, phone numbers, etc.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use for address normalization and ambiguity detection, but does not explicitly mention when not to use or compare with alternatives. However, sibling tools are clearly for different entities, so context is reasonably clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

normalize_korean_company_nameCInspect

Normalize Korean company names while preserving the original value and legal suffix signals.

ParametersJSON Schema
NameRequiredDescriptionDefault
company_nameYesRaw company, store, or legal entity name.
prefer_legal_formNoWhether to normalize common legal suffix variants when safe.
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It only mentions preserving original value and legal suffix signals, but does not disclose specific transformations, edge cases, or safety aspects. For a transformation tool, this is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise (one sentence), but lacks necessary structure. It does not front-load key information or provide sections for usage, behavior, or examples. Conciseness should not sacrifice completeness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema and no description of return values. The tool must produce some output (e.g., normalized string), but the description gives no clue. Given the presence of sibling validation tools, this omission is critical for an agent to understand the tool's behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with parameter descriptions. The tool description adds context about preserving original value and legal suffixes, but does not significantly enhance the meaning of the parameters beyond what the schema already provides. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it normalizes Korean company names and mentions preserving original value and legal suffix signals. It distinguishes itself from sibling tools that handle addresses, phone numbers, and validations. However, it lacks specificity on what 'normalize' entails (e.g., whitespace removal, standardization).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. While sibling names imply it's for company names only, there is no explicit when-to-use or when-not-to-use information.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

normalize_korean_phone_numberCInspect

Normalize Korean mobile, landline, and representative business phone numbers.

ParametersJSON Schema
NameRequiredDescriptionDefault
phone_numberYesRaw Korean phone number.
default_country_codeNoCountry code to use when an international prefix is not supplied.+82
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, and the description does not disclose behavioral traits such as whether the number is validated, formatted, or modified. It does not mention side effects or prerequisites.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, no wasted words. Could be slightly more informative without becoming verbose, but acceptable for the tool's simplicity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema exists, and the description does not explain the return value. The tool's behavior is underspecified, leaving the agent to guess what 'normalize' produces.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds no additional meaning beyond what the schema already provides for the two parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool normalizes Korean phone numbers and lists the types (mobile, landline, business). It distinguishes from sibling tools that handle addresses, company names, and validations. However, it does not specify what normalization entails (e.g., format, country code handling).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool vs. alternatives. Sibling tools are for different entities, but the description does not explicitly state usage context or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

optimize_autonomy_budgetCInspect

Choose which agent actions can run autonomously under risk and money budgets, and which need review.

ParametersJSON Schema
NameRequiredDescriptionDefault
goalNo
actionsNo
plan_idNo
invariantsNo
environmentNo
money_budgetNo
risk_toleranceNo
data_sensitivityNo
max_autonomous_riskNo
operator_approval_idNo
operator_approval_presentNo
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It states the tool 'chooses' actions but does not disclose side effects, authorization requirements, or output format. Parameters like operator_approval_id hint at review processes not mentioned.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence of 18 words is concise but overly terse given the tool's complexity. No structure like bullet points or sections to aid comprehension.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, no annotations, and a minimal description leave critical gaps: what the tool returns, how it uses inputs like plan_id or invariants, and what 'autonomous' means. The tool appears complex but description is insufficient for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage and 11 parameters, the description only vaguely references 'risk and money budgets' but does not map to specific parameters like risk_tolerance, max_autonomous_risk, or explain invariants, environment, etc. The agent has no guidance on parameter meaning or usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb 'choose' and resource 'agent actions' with constraints 'risk and money budgets', clearly distinguishing it from siblings like estimate_agent_risk_bound which estimates risk rather than making autonomy decisions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool vs alternatives like compile_agent_plan or estimate_agent_risk_bound. The description implies a decision-making step but does not specify prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

prove_agent_policy_invariantsCInspect

Check agent plan invariants and return proved invariants, failed invariants, and bounded counterexamples.

ParametersJSON Schema
NameRequiredDescriptionDefault
goalNo
actionsNo
plan_idNo
invariantsNo
environmentNo
money_budgetNo
risk_toleranceNo
data_sensitivityNo
max_autonomous_riskNo
operator_approval_idNo
operator_approval_presentNo
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description must fully disclose behavior. It states return values but does not mention whether the tool is read-only, has side effects, requires authentication, or has rate limits. The name suggests verification, but the description lacks explicit safety guarantees.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with no unnecessary words. However, given the complexity of the tool, it sacrifices necessary detail for brevity, which harms overall utility.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has 11 parameters, no annotations, and no output schema. The description fails to provide essential context for correct invocation, such as parameter formats, constraints, or how the invariants are evaluated. This is a severe gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, and the description does not explain any of the 11 parameters. The agent cannot infer what 'goal', 'actions', 'invariants', or other fields mean or how to provide them, making the tool effectively unusable from the description alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool checks agent plan invariants and lists return types (proved, failed, bounded counterexamples). However, it does not differentiate from the similar sibling 'validate_policy_invariants', leaving ambiguity about when to use which.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No usage context, prerequisites, or when-not-to-use guidance is provided. The description does not help the agent decide between this and sibling tools like 'validate_policy_invariants' or 'find_agent_counterexample'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

score_agent_evidence_qualityAInspect

Score an evidence packet before an agent relies on it using claim coverage, source entropy, freshness, provenance, and contradiction penalties.

ParametersJSON Schema
NameRequiredDescriptionDefault
claimsNo
evidenceNoEvidence items with source, summary, checked_at, claims, contradicts, confidence, and provenance fields.
artifactsNo
evidence_itemsNo
required_claimsNoClaims the evidence packet must support, such as tests_passed or source_verified.
min_quality_scoreNoMinimum score required for ready. Defaults to 0.75.
freshness_half_life_daysNoFreshness decay half-life in days. Defaults to 30.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It discloses the evaluation dimensions but does not state whether the tool is read-only, requires authentication, or has side effects. The description is adequate but lacks behavioral specifics such as idempotency or state changes.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that immediately conveys the tool's core function and the factors considered. Every word adds value, and the main action is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 7 parameters, no output schema, and a complex scoring task, the description is too minimal. It does not explain the return value format, required fields, or how the scores are computed, leaving the agent with insufficient context to invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 57%, and the description does not elaborate on any parameter beyond naming the criteria. It fails to explain how parameters like 'claims' or 'evidence' relate to the scoring process, leaving gaps for an agent.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's action ('score an evidence packet') and specifies the criteria used (claim coverage, source entropy, freshness, provenance, contradiction penalties). It effectively distinguishes from sibling tools, which focus on validation or normalization.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The phrase 'before an agent relies on it' provides a clear usage context, but no explicit guidance on when not to use this tool or alternatives among the many validation siblings. The description implies usage but lacks exclusion criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

validate_agent_action_firewallCInspect

Check whether a proposed agent action is safe to run, needs human approval, or must be blocked before execution.

ParametersJSON Schema
NameRequiredDescriptionDefault
risk_flagsNoExplicit risk flags known to the calling agent.
action_typeNoProposed action category, such as local_docs_update, github_pr, external_post, production_deploy, billing_change, or security_change.
destructiveNo
environmentNoTarget environment, such as local, staging, production, public_community, github, or account.
uses_secretNo
quota_changeNo
tests_passedNoWhether relevant local tests or smoke checks have passed for code/deploy actions.
action_summaryNoShort non-secret summary of the action the agent wants to perform.
target_resourceNoRepository, app, account, channel, endpoint, or file area the action touches.
touches_billingNo
changes_securityNo
data_sensitivityNoData class involved in the action, such as public, synthetic, internal, private, sensitive, or secret.
operator_approval_idNo
material_production_riskNo
operator_approval_presentNoWhether a matching explicit owner/operator approval record is already present.
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must fully disclose behavioral traits. It implies the tool is a safety gate but does not specify what happens when an action is blocked (e.g., returns a flag vs. raises error), whether it is read-only, or what permissions are needed. The description is too vague for an agent to anticipate outcomes.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, clear sentence with no wasted words. It is front-loaded and easy to parse. However, the extreme brevity comes at the cost of omitted useful details, preventing a perfect score.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 15 parameters and no output schema, the description is incomplete. It does not explain the three-option output (safe, needs approval, blocked), the format of the return value, or how the tool handles errors. An agent would lack sufficient context to use this tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 53%, meaning many parameters lack descriptions. The tool's description adds no information about parameters beyond what the schema provides. For example, the description does not explain how 'risk_flags' or 'operator_approval_id' influence the result. The agent must rely solely on the schema, which is insufficient for high-quality parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: to check if a proposed action is safe, needs human approval, or must be blocked. It uses a specific verb ('check') and resource ('proposed agent action'), and the tool name itself is descriptive. Sibling tools are all about normalization and validation of business documents, so this tool's purpose is distinct.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It does not mention any prerequisites, exclusion criteria, or mention of other tools that might handle similar tasks. The sentence only states what the tool does, not when to invoke it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

validate_agent_endpoint_fuzzingBInspect

Validate fuzz/property-test evidence for an agent-facing endpoint before exposing it to autonomous callers.

ParametersJSON Schema
NameRequiredDescriptionDefault
methodNo
corpus_typeNo
crash_countNo
environmentNo
replay_safeNo
service_nameNo
tests_passedNo
endpoint_pathNo
failure_countNo
timeout_countNo
evidence_itemsNo
p95_latency_msNo
test_cases_runNo
data_leak_countNo
status_5xx_rateNo
auth_bypass_countNo
rate_limit_observedNo
operator_approval_idNo
test_cases_generatedNo
schema_violation_countNo
operator_approval_presentNo
schema_validation_presentNo
idempotency_violation_countNo
destructive_payloads_blockedNo
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description fails to disclose side effects, permissions, or output format. It only vaguely states 'validate evidence' without behavioral details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single front-loaded sentence, concise but potentially too brief given the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 24 parameters, no schema coverage, no output schema, and no annotations, the description is severely incomplete, lacking return values, constraints, or usage steps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds no value beyond the input schema; all 24 parameters are undocumented in both schema and description, missing crucial semantic guidance.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool validates fuzz/property-test evidence for an agent-facing endpoint before autonomous exposure, distinguishing it from sibling validate tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a clear usage context ('before exposing to autonomous callers') but does not explicitly state when not to use or list alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

validate_api_contract_driftCInspect

Validate OpenAPI/MCP contract drift, breaking changes, versioning, and client-test evidence before integration.

ParametersJSON Schema
NameRequiredDescriptionDefault
base_urlNo
environmentNo
auth_changedNo
auth_changesNo
service_nameNo
removed_pathsNo
evidence_itemsNo
changed_schemasNo
current_versionNo
openapi_presentNo
breaking_changesNo
previous_versionNo
changelog_presentNo
rate_limit_changedNo
backward_compatibleNo
client_tests_passedNo
mcp_manifest_presentNo
operator_approval_idNo
migration_guide_presentNo
output_schema_versionedNo
failure_modes_documentedNo
operator_approval_presentNo
Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It does not disclose whether the tool modifies state, requires authorization, has rate limits, or any other behavioral traits. The description only says 'validate' without any operational impact context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence, which is brief but not sufficiently informative. It lacks structure such as bullet points or sections that would aid quick scanning for a complex tool with many parameters.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has 22 parameters, no output schema, and no annotations. The description does not explain return values, preconditions, or how to use the parameters effectively. For a validation tool of this complexity, the description is woefully incomplete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 22 parameters with 0% description coverage in schema itself. The description mentions concepts like drift and breaking changes but does not explain any parameter meanings or how they relate to those concepts. This leaves semantic voids.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool validates OpenAPI/MCP contract drift, breaking changes, versioning, and client-test evidence, specifying the resource and action. It distinguishes from siblings which focus on other domains like addresses or invoices.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidelines on when to use vs alternatives. The phrase 'before integration' provides vague context, but no exclusions or comparisons to sibling tools like validate_technical_due_diligence are given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

validate_business_registration_numberAInspect

Validate a Korean business registration number format and return normalized digits. External status checks require a configured approved source.

ParametersJSON Schema
NameRequiredDescriptionDefault
company_nameNoOptional company name for consistency checks.
registration_numberYesRaw Korean business registration number, with or without hyphens.
check_external_statusNoWhether to check an approved external source when configured.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description correctly discloses validation and normalization behavior and a dependency for external checks. However, it omits details like whether the tool is read-only, what happens on invalid input, or any side effects. It is functionally transparent but not comprehensive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the core purpose. Every word adds value, and there is no redundancy or irrelevant detail. It is a model of conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has three parameters, no output schema, and no annotations, the description covers the basic functionality and a key dependency. It does not mention return value structure or validation criteria (e.g., checksum rules), but the param descriptions compensate. It is complete enough for basic use but could be richer.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema descriptions are exhaustive (100% coverage). The description adds that the tool returns normalized digits, which is not in the schema. However, it does not elaborate on parameter constraints or formats beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb (validate), the resource (Korean business registration number format), and the output (return normalized digits). It effectively distinguishes the tool from siblings like normalize_korean_address or validate_invoice_readiness, which target different entities or actions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use external status checks (only with a configured approved source) but does not provide explicit guidance on when to choose this tool over alternatives, nor does it list prerequisites or exclusions. It is adequate but lacks proactive usage direction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

validate_document_cross_checkCInspect

Cross-check vendor, invoice, contract, and bank document fields before an AI agent writes CRM, onboarding, invoice, or payment workflow data.

ParametersJSON Schema
NameRequiredDescriptionDefault
emailNo
addressNo
currencyNo
documentsNoDocument-derived field packets. Use synthetic, public test, or authorized business data only.
company_nameNoExpected or primary company name.
invoice_amountNo
required_fieldsNoFields that must appear in at least one supplied source.
cross_check_fieldsNoFields to compare across the expected record and documents.
bank_account_holderNo
registration_numberNoExpected or primary Korean business registration number.
representative_nameNo
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must carry the full burden. It only says 'cross-check' without detailing behavior such as error handling, side effects, or requirements. The description is too vague to inform an agent of important behavioral traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence, which is concise, but it lacks structure and critical details. It earns its place but could be more informative without being excessively long.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the high parameter count (11), no output schema, and no annotations, the description is severely incomplete. It does not explain return values, prerequisites, or what happens after cross-checking. The tool's complexity demands a richer description.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 45%, and the description adds no additional meaning beyond what is already in the schema. For instance, parameters like 'email', 'address', 'currency' are not explained in the description. The description does not compensate for the low coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action 'cross-check' and the resources (vendor, invoice, contract, bank document fields). It distinguishes from sibling tools which focus on normalization or specific validations, but does not explicitly name siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description specifies when to use the tool ('before an AI agent writes CRM, onboarding, invoice, or payment workflow data'), providing useful context. However, it lacks explicit guidance on when not to use it or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

validate_invoice_readinessCInspect

Decide whether a Korean tax invoice profile is ready for invoice preparation.

ParametersJSON Schema
NameRequiredDescriptionDefault
emailYes
addressYes
company_nameYes
business_itemNo
business_typeNo
registration_numberYes
representative_nameYes
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided. The description does not disclose behavioral traits such as read-only nature, return values, or criteria. It only says 'decide whether ready' without elaboration.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence, making it concise. However, it lacks structure and could be more informative without sacrificing brevity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 7 parameters, no output schema, and no annotations, the description is incomplete. It does not explain return values, validation criteria, or differentiation from sibling validation tools.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, yet the description adds no parameter information. The description does not clarify how each parameter contributes to readiness beyond what the schema names imply.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool decides readiness for invoice preparation, but the term 'readiness' is vague. It does not differentiate from sibling tool 'validate_tax_invoice_profile'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use or not use this tool versus alternatives like validate_tax_invoice_profile. No context about prerequisites or scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

validate_payment_or_invoice_releaseCInspect

Gate a payment, invoice, settlement, purchase order, or other money-adjacent workflow step before an AI agent releases it.

ParametersJSON Schema
NameRequiredDescriptionDefault
emailNo
addressNo
currencyNoKRW
vendor_idNo
company_nameYes
phone_numberNo
invoice_emailNo
vendor_statusNoCurrent vendor approval state, such as approved, active, needs_review, blocked, or inactive.
company_numberNoUK Companies House company number.
dart_corp_codeNo8-digit OpenDART corporation code.
invoice_amountYes
payment_purposeNo
payment_due_dateNo
purchase_order_idNo
bank_account_holderNo
check_opensanctionsNoWhen KWT_OPENSANCTIONS_API_KEY is configured, request OpenSanctions screening evidence before release.
registration_numberYes
representative_nameNo
autonomous_limit_krwNo
check_companies_houseNoWhen KWT_COMPANIES_HOUSE_API_KEY is configured, request UK Companies House evidence before release.
check_dart_disclosuresNoWhen KWT_DART_API_KEY is configured, request recent OpenDART disclosure evidence.
vendor_last_verified_atNoISO date or datetime when the vendor was last verified.
manual_review_amount_krwNo
Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description fails to disclose any behavioral traits such as side effects, state changes, permissions required, rate limits, or return values. It only vaguely mentions 'gating' without saying what that means in practice.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence, making it concise but not sufficiently informative. It lacks structure such as front-loading key information or clearly separating purpose from details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 18 parameters, no output schema, and no annotations, the description is highly incomplete. It does not cover what criteria are used for gating, how results are returned, or any necessary preconditions.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds no explanation for any of the 18 input parameters. Schema description coverage is only 11%, and none of the parameters are described in the tool's description, leaving the agent to guess their roles.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a gating step before releasing payment/invoice/settlement, etc., using the verb 'gate' which implies validation. It distinguishes from siblings by focusing on release gate rather than normalization or specific validations, but the verb is uncommon and could be more precise.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives like validate_invoice_readiness or validate_vendor_onboarding. The description does not mention context, prerequisites, or when to avoid it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

validate_policy_invariantsCInspect

Validate policy-as-code or model-check invariants for approval, quota, secret, paid-action, and safety gates.

ParametersJSON Schema
NameRequiredDescriptionDefault
invariantsNo
environmentNo
policy_nameNo
action_scopeNo
tests_passedNo
evidence_itemsNo
counterexamplesNo
deny_by_defaultNo
audit_log_presentNo
failed_invariantsNo
no_plaintext_secretsNo
operator_approval_idNo
quota_limit_enforcedNo
rollback_plan_presentNo
approval_gate_enforcedNo
human_override_availableNo
egress_allowlist_enforcedNo
operator_approval_presentNo
model_output_schema_enforcedNo
read_only_first_call_enforcedNo
owner_approval_for_paid_actionsNo
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must carry the burden. It does not disclose side effects, return values, errors, or whether the tool is read-only. The behavioral context is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that conveys the core purpose without extraneous information. It is front-loaded and every word adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (21 parameters, no annotations, no output schema), the description is highly incomplete. It omits details on return values, errors, prerequisites, and parameter formats, leaving the agent with insufficient context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has 21 parameters with 0% description coverage. The description adds some meaning by grouping parameters into categories (approval, quota, secret, etc.), but does not explain individual parameters or their usage. It partially compensates for the lack of schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool validates policy-as-code or model-check invariants, specifying the domains (approval, quota, secret, etc.). It uses a specific verb and resource, but does not differentiate from sibling tools like 'validate_secret_flow' or 'validate_api_contract_drift'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. There is no mention of prerequisites, conditions, or situations where this tool is preferred over other validation tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

validate_secret_flowCInspect

Validate secret-scan, taint-flow, redaction, and secret-manager evidence before an agent handles credentials.

ParametersJSON Schema
NameRequiredDescriptionDefault
environmentNo
target_nameNo
secret_sinksNo
logs_redactedNo
evidence_itemsNo
secret_sourcesNo
taint_findingsNo
data_sensitivityNo
encrypted_at_restNo
secret_scan_passedNo
secret_manager_usedNo
no_secrets_in_promptNo
operator_approval_idNo
taint_analysis_passedNo
public_outputs_redactedNo
operator_approval_presentNo
outbound_allowlist_presentNo
sample_prompt_contains_secretNo
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It only says 'validate' but does not disclose whether the tool returns a boolean, throws errors, modifies state, or requires specific permissions, leaving behavior opaque.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence that efficiently communicates the tool's purpose. However, given the tool's complexity, it trades completeness for brevity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 18 parameters, no output schema, and zero annotations, the description is grossly incomplete. It fails to convey how the tool works, what it returns, or how parameters are used, leaving an agent unable to invoke it correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, and the description provides no information about the 18 parameters. It does not explain their meanings, dependence, or how they relate to the validation process, rendering the schema cryptic.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly specifies the verb 'validate' and the resources (secret-scan, taint-flow, redaction, secret-manager evidence), distinguishing it from sibling validation tools that target other domains.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The phrase 'before an agent handles credentials' implies a specific timing context, but it does not explicitly state when not to use the tool or mention alternative tools, leaving usage guidance vague.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

validate_supply_chain_provenanceCInspect

Validate SLSA-style provenance, signatures, SBOMs, pins, and dependency evidence before an agent trusts an artifact.

ParametersJSON Schema
NameRequiredDescriptionDefault
slsa_levelNo
environmentNo
package_nameNo
sbom_presentNo
artifact_nameNo
source_pinnedNo
digest_matchesNo
evidence_itemsNo
repository_urlNo
artifact_signedNo
builder_trustedNo
dependency_riskNo
data_sensitivityNo
provenance_presentNo
signature_verifiedNo
provenance_verifiedNo
operator_approval_idNo
known_vulnerabilitiesNo
dependency_scan_passedNo
operator_approval_presentNo
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden but does not disclose side effects, read/write nature, or what happens on validation failure. It only states the validation purpose without behavioral traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with no waste. It is concise but lacks structure (e.g., bullet points) for a tool with many parameters. Still, it is appropriately sized for the clarity achieved.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 20 parameters, no output schema, and no annotations, the description is severely incomplete. It does not explain return values, parameter usage, or behavioral implications. A tool of this complexity requires a much richer description.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has 20 parameters with 0% description coverage. The description mentions some concepts (provenance, signatures, SBOMs, pins, dependency evidence) that map to a few parameters, but many parameters (e.g., environment, data_sensitivity, known_vulnerabilities) are not addressed. It provides insufficient semantic meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool validates SLSA-style provenance, signatures, SBOMs, pins, and dependency evidence before trusting an artifact. It specifies a concrete verb and resource, and the context ('before an agent trusts an artifact'). While it differentiates from sibling validation tools by focusing on supply chain, it could be more explicit about what 'SLSA-style' entails.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus the many other validation tools. It only mentions 'before an agent trusts an artifact,' which is too generic. No alternatives or exclusions are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

validate_tax_invoice_profileCInspect

Check whether a Korean tax invoice profile has complete and internally consistent fields.

ParametersJSON Schema
NameRequiredDescriptionDefault
emailYes
addressYes
company_nameYes
business_itemNo
business_typeNo
registration_numberYes
representative_nameYes
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must carry the full burden. It only says it checks completeness and consistency but does not disclose what the tool returns (e.g., boolean, errors), whether it is read-only, or any side effects. For a validation tool, more detail is needed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence, which is concise, but it is too brief to be informative. It sacrifices completeness for brevity, resulting in a lack of necessary detail.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the 7 parameters with no descriptions, no output schema, and no annotations, the description is grossly insufficient. It fails to explain validation criteria, return format, or any constraints, making it nearly useless for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has 0% description coverage, and the tool description does not explain any parameter meaning beyond the field names. Adding context about what each field represents or how it is validated would be essential, but none is provided.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool checks completeness and internal consistency of a Korean tax invoice profile. The verb 'check' and resource 'Korean tax invoice profile' are specific, but it doesn't specify what 'internally consistent' means in detail. It somewhat distinguishes from sibling validators by focusing on the profile as a whole.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives like 'validate_business_registration_number' or 'validate_invoice_readiness'. There is no mention of prerequisites or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

validate_technical_due_diligenceBInspect

Preflight a repository, API, MCP server, package, SDK, or technical service before an agent uses it.

ParametersJSON Schema
NameRequiredDescriptionDefault
asset_typeNoTechnical asset type, such as api, mcp_server, github_repo, package, sdk, service, or tool.
ci_presentNo
target_urlNoPublic target URL or API base URL.
environmentNoTarget environment, such as local, sandbox, staging, or production.
target_nameNoHuman-readable asset or service name.
build_passedNo
intended_useNoWhat the calling agent wants to do with the asset.
package_nameNoPackage identifier when the target is a library or SDK.
tests_passedNo
tests_presentNo
evidence_itemsNoShort non-secret evidence snippets gathered by the calling agent.
last_commit_atNoOptional ISO timestamp or date for last commit, release, or update.
readme_presentNo
repository_urlNoPublic or authorized repository URL.
dependency_riskNoCaller-assessed dependency or supply-chain risk: low, medium, high, critical, or unknown.
license_presentNo
openapi_presentNo
requires_secretNo
data_sensitivityNoData class expected for first use, such as public, synthetic, internal, private, sensitive, or secret.
mcp_tools_listedNo
secret_scan_passedNo
mcp_manifest_presentNo
operator_approval_idNo
known_vulnerabilitiesNo
writes_external_stateNo
auth_policy_documentedNo
dependency_scan_passedNo
handles_sensitive_dataNo
quota_policy_documentedNo
security_policy_presentNo
failure_modes_documentedNo
operator_approval_presentNo
first_safe_call_documentedNo
skill_instructions_presentNo
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full disclosure burden. 'Preflight' suggests a review but fails to clarify whether the tool is read-only, modifies state, or what side effects occur. No behavioral traits are disclosed beyond the vague action.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that front-loads the action 'Preflight,' making it concise and easy to parse. However, it could be slightly more informative without adding bulk.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (34 parameters, no output schema, no annotations), the single-sentence description is far too brief. It does not explain expected output, success criteria, or how to use the many input fields, leaving the agent underinformed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 32%, leaving many parameters undocumented. The description does not explain any parameters, so the agent gains no additional semantic meaning beyond the sparse schema descriptions for boolean fields.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses the specific verb 'preflight' and lists six technical asset types (repository, API, MCP server, etc.), clearly distinguishing this tool from sibling validation tools that target different domains like business registration or invoice readiness.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The phrase 'before an agent uses it' implies a clear use case, but the description does not specify when to avoid this tool or how it compares to sibling validation tools, leaving usage guidance implicit rather than explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

validate_vendor_onboardingBInspect

Decide whether a Korean vendor record is ready to register, needs human review, or should be blocked before CRM, ERP, invoice, or procurement use.

ParametersJSON Schema
NameRequiredDescriptionDefault
emailNo
addressNo
company_nameYes
phone_numberNo
business_itemNo
business_typeNo
company_numberNoUK Companies House company number.
dart_corp_codeNo8-digit OpenDART corporation code.
check_opensanctionsNoWhen KWT_OPENSANCTIONS_API_KEY is configured, request OpenSanctions screening evidence for the company and representative.
registration_numberYes
representative_nameNo
check_companies_houseNoWhen KWT_COMPANIES_HOUSE_API_KEY is configured, request UK Companies House evidence.
check_dart_disclosuresNoWhen KWT_DART_API_KEY is configured, request recent OpenDART disclosure evidence.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses the three decision outcomes (ready, human review, blocked) and the context (CRM, ERP, invoice, procurement). Without annotations, it carries the burden well, though it omits side effects or permission needs.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence, concise and front-loaded with the core purpose. However, it could be slightly more structured (e.g., listing parameters briefly).

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 8 unannotated parameters and no output schema, the description is incomplete. It does not explain how inputs influence the decision or what the output looks like, leaving significant gaps for the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description provides no help on parameter meaning or usage. It does not reference any of the 8 input properties, leaving the agent to infer from names alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: to decide the readiness of a Korean vendor record for registration or use, with three possible outcomes (ready, human review, blocked). It distinguishes from sibling tools that normalize specific fields or validate individual aspects.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for overall onboarding decisions, but lacks explicit guidance on when to use this tool versus specific sibling validators (e.g., validate_business_registration_number). No when-not or alternatives are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

validate_vendor_reverificationAInspect

Check whether an existing vendor is fresh enough for continued autonomous use or must be reverified before CRM, invoice, payment, or procurement actions.

ParametersJSON Schema
NameRequiredDescriptionDefault
addressNo
vendor_idNo
postal_codeNo
company_nameYes
phone_numberNo
vendor_statusNoCurrent vendor status, such as approved, active, ready, verified, needs_review, blocked, inactive, suspended, or rejected.
company_numberNoUK Companies House company number.
high_risk_flagsNo
check_opensanctionsNoWhen KWT_OPENSANCTIONS_API_KEY is configured, request OpenSanctions screening evidence before vendor reuse.
registration_numberYes
check_companies_houseNoWhen KWT_COMPANIES_HOUSE_API_KEY is configured, request UK Companies House evidence before vendor reuse.
official_source_statusNoCaller-supplied official-source status when available, such as verified or source_unavailable.
vendor_last_verified_atNoISO date or datetime when the vendor was last verified.
official_source_requiredNo
reverification_grace_daysNo
reverification_interval_daysNo
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must disclose behavior. It implies a read-only validation but does not explicitly state it is non-destructive or return behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise (one sentence) and front-loaded, but lacks structure for conveying parameter details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 13 parameters, no output schema, and no annotations, the description is insufficient. It omits return value, prerequisites, and interpretation of results.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 23% schema description coverage, the description adds no meaning beyond the schema. It mentions required parameters but does not explain most optional parameters like address, phone_number, or high_risk_flags.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool checks if a vendor needs reverification before specific actions (CRM, invoice, etc.), distinguishing it from sibling tools like validate_invoice_readiness or validate_vendor_onboarding.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description indicates when to use the tool ('before CRM, invoice, payment, or procurement actions') but does not explicitly state when not to use it or name alternative tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Sign in to create a connector for this server.