agentguard

Name: agentguard
Author: ToolOracle

by io.tooloracle

Server Details

AgentGuard — 20-tool AI safety MCP: policy preflight, risk scoring, audit logging, rate limits.

Status: Healthy
Last Tested: 2026-05-28 01:34
Transport: Streamable HTTP
URL
Repository: ToolOracle/agentguard
GitHub Stars: 0

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client

Glama

MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.

Tool Definition Quality

A3.7/5.0

Tool DescriptionsB

Average 3.8/5 across 24 of 24 tools scored. Lowest: 2.7/5.

Server CoherenceA

Disambiguation5/5

Each tool has a clearly distinct purpose, covering different aspects of security and compliance (approvals, audit, anomaly detection, policy checks, etc.). There is no overlap that would cause confusion for an agent.

Naming Consistency5/5

All 24 tools use consistent snake_case naming with a verb_noun or noun_verb pattern (e.g., approval_list, payload_safety_check). No mixing of conventions, making tool names predictable and easy to understand.

Tool Count4/5

With 24 tools, the server is slightly above the typical 3-15 range, but still well-scoped for a comprehensive security governance system. Each tool addresses a specific need, and the count is reasonable given the domain complexity.

Completeness4/5

The tool surface covers most key security workflows: pre- and post-execution checks, approvals, audit logging, anomaly detection, and risk scoring. Minor gaps exist (e.g., no policy CRUD, no tenant management), but the core functionality is solid.

Available Tools

24 tools

approval_listAInspect

List approval requests. Filter by status (pending/approved/denied/all), agent_id, or tool_name. Shows summary stats.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Max results (default: 50, max: 200)
`status`	No	Filter by status (default: pending)
`agent_id`	No	Filter by agent ID
`tool_name`	No	Filter by tool name

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided; description implies read-only behavior and mentions summary stats but lacks details on response format, pagination, or rate limits. Adequate but not rich.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences convey purpose, filters, and additional feature (summary stats) with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 4 optional parameters and no output schema, description covers core functionality and filters. Lacks output format or pagination info, but sufficient for a listing tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. Description adds 'summary stats' but otherwise repeats filter options already in schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'List approval requests' and specifies filtering options (status, agent_id, tool_name) and summary stats. This distinguishes it from siblings like approval_resolve.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context on when to use the tool (listing with filters) but does not explicitly exclude scenarios or mention alternatives, though sibling names imply unique purposes.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

approval_requiredAInspect

Check if a tool call requires human approval before execution. Returns requires_approval=true/false with matched policy list. Set register_pending=true to create a trackable approval request with an approval_url for human review.

ParametersJSON Schema

Name	Required	Description
`agent_id`	No	Agent identifier
`tool_args`	No	Tool arguments
`tool_name`	Yes	Tool to check
`register_pending`	No	Create pending approval record

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It discloses the return fields and the optional side effect of creating a pending approval request. However, it does not detail potential side effects beyond that, authorization needs, or rate limits. The description adds moderate value but misses some behavioral details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise: two sentences clearly separating core function from optional behavior. Every sentence adds value with no redundancy. Well-structured for quick comprehension.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (4 parameters, no output schema), the description adequately explains the return value (requires_approval, policy list, approval_url) and the optional registration. However, it does not explain the purpose of 'tool_args' or 'agent_id' beyond the schema, leaving some room for improvement.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema covers all parameters (100% coverage). The description adds extra meaning for 'register_pending' by explaining that it creates a trackable approval request with an approval_url. This enhances understanding beyond the schema's brief description. No further elaboration on other parameters is needed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Check if a tool call requires human approval before execution.' It also mentions the return value and an optional registration feature. This distinguishes it from sibling tools like approval_list and approval_resolve.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use the tool (before executing a tool call to check approval), but does not explicitly compare with alternatives or provide 'when not to use' guidance. It lacks explicit context for which scenarios to prefer this over other approval-related tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

approval_resolveBInspect

Resolve a pending approval request. Approve or deny a tool call that was flagged for human review. Updates agent state and creates audit trail. Requires authentication.

ParametersJSON Schema

Name	Required	Description
`reason`	No	Optional reason for the decision
`decision`	Yes	Approve or deny
`request_id`	Yes	The approval request ID to resolve
`approved_by`	No	Identity of the approver (name/email/agent_id)

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description must communicate behavioral traits. It discloses that the tool 'Updates agent state and creates audit trail' and requires authentication, which adds value. However, it does not specify whether the action is reversible, the nature of state changes, or potential side effects (e.g., consequences of denial). The transparency is adequate but not thorough.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences long, front-loaded with the primary action, and contains no redundant or extraneous information. Every sentence adds value: the first states the verb and resource, the second clarifies the decision options, and the third notes side effects. It is a model of conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (no output schema, no nested objects), the description covers the essential aspects: what it does, the type of decisions, and key side effects (state update, audit trail). It could mention post-resolution behavior (e.g., finality of decision) but overall is sufficient for a basic approval action.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% parameter coverage, so the baseline is 3. The description does not add any additional semantic clues or constraints beyond what the schema already provides. For instance, it doesn't clarify the format or purpose of 'reason' or 'approved_by' beyond their schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses specific verbs ('Resolve', 'Approve or deny') and a clear resource ('pending approval request', 'tool call that was flagged for human review'). It effectively conveys the core action. However, it does not explicitly distinguish this tool from siblings like 'approval_list' or 'approval_required', which could lead to confusion in agent selection.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lacks explicit guidance on when to use this tool versus alternatives. It mentions 'flagged for human review' and 'Requires authentication' but provides no when-not conditions or direct comparisons to sibling tools. The agent is left to infer usage context from the tool name alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

audit_log_queryAInspect

Query the persistent audit trail. Filter by agent, tool, decision, time range. Returns signed entries with tamper-detection. Use for compliance reporting, anomaly detection, or agent behaviour analysis.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Max entries (default 50, max 500)
`since`	No	ISO timestamp start (e.g. 2026-03-27T00:00:00Z)
`until`	No	ISO timestamp end
`offset`	No	Pagination offset
`agent_id`	No	Filter by agent ID
`decision`	No	Filter: allowed\|denied\|flagged\|approved
`tool_name`	No	Filter by tool name

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It discloses that entries are signed and have tamper-detection, hinting at immutability. It does not detail auth requirements or rate limits, but the core behavioral trait (signed, persistent) is communicated.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences: purpose, filters, return details and use cases. No unnecessary words. Front-loaded with key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 7 parameters, no output schema, and no annotations, the description covers purpose, filters, use cases, and return nature (signed). However, it lacks details on output structure, ordering, error handling, or pagination hints beyond schema-listed parameters.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%; all parameters are described in the schema. The description mentions filter types corresponding to schema fields but adds no new semantics. Baseline score of 3 applies, and no additional value is provided.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Query' and resource 'persistent audit trail', specifies filtering capabilities (agent, tool, decision, time range), and mentions return type (signed entries with tamper-detection). It effectively distinguishes from sibling tools like audit_log_write and other non-query tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit use cases: 'Use for compliance reporting, anomaly detection, or agent behaviour analysis.' While it does not explicitly exclude other uses or name alternatives, the context is clear given no other read audit tool exists among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

audit_log_writeAInspect

Write a tool call result to the persistent, signed audit log. Call this AFTER tool execution with the outcome. Each entry is cryptographically signed for tamper-evidence. Essential for compliance, DORA, MiCA audit trail requirements.

ParametersJSON Schema

Name	Required	Description
`reason`	No	Why this decision was made
`outcome`	No	success\|error\|timeout
`agent_id`	No	Agent that made the call
`decision`	No	allowed\|denied\|flagged\|approved
`tool_args`	No	Arguments used
`tool_name`	Yes	Tool that was executed
`risk_score`	No	Risk score at time of call
`session_id`	No	Session identifier
`duration_ms`	No	Execution time in milliseconds

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Without annotations, the description partially covers behavioral traits (cryptographic signing, compliance). However, it omits details like idempotency, failure modes, duplicate handling, or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with purpose, no unnecessary words. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 9 parameters, no output schema, and complexity (nested objects, compliance context), the description lacks details on return value, failure behavior, permissions, or data retention.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline 3. Description does not add meaning beyond schema; it mentions writing results but does not explain parameters like outcome, decision, risk_score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses specific verb 'Write' and resource 'tool call result to persistent, signed audit log', clearly distinguishing it from sibling tools like audit_log_query (querying) and emergency_kill (emergency stop).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states 'Call this AFTER tool execution with the outcome', providing clear temporal guidance. Does not explicitly list when not to use, but context is clear enough for a specialized logging tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

cross_tool_anomaly_checkAInspect

Detect anomalous tool usage patterns across an agent recent history. Checks risky tool combinations (AML-then-payment, wallet-recon-then-transfer), high call frequency, repeated denials, broad reconnaissance, elevated risk scores.

ParametersJSON Schema

Name	Required	Description	Default
`agent_id`	Yes	Agent to analyze
`sensitivity`	No	low\|medium\|high	medium
`window_seconds`	No	Lookback seconds (default 300)

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It lists what it checks (combinations, frequency, denials, reconnaissance, risk scores) but does not disclose output format, side effects, or whether it's read-only. The description adds some context but lacks comprehensive behavioral disclosure.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence listing key detection patterns, but it is slightly long and could be more structured. It is generally efficient and front-loaded with the main action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema exists, and the description fails to explain return values, expected output format, or how results are interpreted. For a tool with 3 parameters and no output schema, this is a significant gap, leaving the agent uncertain about what the tool yields.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description does not add significant meaning beyond the schema: agent_id is 'Agent to analyze', sensitivity has default 'medium' with enum values, window_seconds default 300. No additional parameter semantics provided.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: detecting anomalous tool usage patterns across an agent's history. It enumerates specific patterns like AML-then-payment, high call frequency, etc., making the function distinct from sibling tools that focus on per-tool checks or individual risk scores.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for detecting cross-tool anomaly patterns but does not explicitly state when to use it vs. alternatives like threat_intel_check or risk_score. No guidance on when not to use it or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

decision_explainAInspect

Get a human-readable explanation of why a tool call was allowed or denied. Pass request_id from a previous policy_preflight for stored explanation, or provide tool_name + tool_args for fresh analysis. Explains matched policies, risk score breakdown, and recommendation.

ParametersJSON Schema

Name	Required	Description
`agent_id`	No	Agent identifier
`tool_args`	No	Arguments to analyze
`tool_name`	No	Tool to analyze (if no request_id)
`request_id`	No	request_id from previous preflight (optional)

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description bears full burden. It discloses the tool can retrieve stored or perform fresh analysis, and describes output content (matched policies, risk score, recommendation). No contradictions. Could mention side effects or auth requirements.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with purpose, followed by usage instructions and output content. No wasted words. Efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema; description specifies output includes policies, risk score, and recommendation. Sufficient for a tool of this complexity, though return format could be clarified.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, baseline 3. Description adds meaning by explaining the two modes of operation for parameters (request_id vs tool_name + tool_args), going beyond schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool's purpose: to get a human-readable explanation of a tool call decision. Uses specific verb 'Get' and resource 'explanation', distinguishing it from sibling tools that perform checks or approvals.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly provides two usage modes: using request_id for stored explanations, or tool_name + tool_args for fresh analysis. Offers clear guidance on when to use each, but lacks explicit mention of when not to use the tool or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

emergency_killAInspect

Emergency kill-switch. Immediately terminates agent session(s), revokes pending approvals, blocks rate limits, and audit-logs the emergency. Use for: compromised agents, runaway automation, suspicious behavior. kill_type: full|session_only|soft.

ParametersJSON Schema

Name	Required	Description	Default
`reason`	No	Why emergency kill was triggered
`agent_id`	No	Agent to kill
`kill_type`	No	full\|session_only\|soft	full
`session_id`	No	Specific session to kill

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description fully discloses the tool's destructive and broad effects: termination, revocation, rate limit blocking, and audit logging. This is appropriate for an emergency kill tool and provides the agent with clear expectations of consequences.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise: a one-line summary, a list of effects, use cases, and a parameter note. Every sentence adds value, and the key information is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool is a high-stakes, destructive operation with four parameters and no output schema, the description covers the core functionality, use cases, and the critical kill_type parameter. However, it could be slightly improved by clarifying the relationship between agent_id and session_id (e.g., whether session_id overrides agent_id).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and each parameter already has a description in the schema. The description only adds value for 'kill_type' by explaining the options inline, but otherwise does not elaborate on 'reason', 'agent_id', or 'session_id' beyond their schema definitions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it's an emergency kill-switch that terminates agents, revokes approvals, blocks rate limits, and audit-logs. It lists specific use cases (compromised agents, runaway automation, suspicious behavior), making its purpose distinct from sibling tools like approval_list or audit_log_query.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description explicitly provides when to use: 'compromised agents, runaway automation, suspicious behavior'. While it doesn't explicitly state when not to use or name alternative tools, the use case list is sufficiently specific to guide appropriate invocation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

guard_metricsCInspect

AgentGuard operational metrics. Returns decision stats, top agents/tools, risk distribution, daily activity, approval stats, and tier distribution.

ParametersJSON Schema

Name	Required	Description	Default
`days`	No	Time window in days (default: 30)

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden for behavioral disclosure. It only indicates the tool 'returns' data, implying a read operation, but does not explicitly confirm it is read-only, nor does it mention any potential side effects, auth requirements, or data freshness. The lack of clarity on safety is a significant gap.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, moderately long sentence, but it efficiently lists the key output categories without fluff. Slightly fragmented enumeration could be tightened, but overall it is acceptable for a straightforward retrieval tool.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has no output schema, so the description should clarify the return format or detail the listed metrics. It merely lists names (e.g., 'decision stats') without explaining structure or example values, leaving the agent to guess the output shape. This is insufficient for a metrics endpoint.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with the 'days' parameter adequately described in the schema. The description adds a list of returned metrics but does not enhance parameter meaning (e.g., how 'days' relates to 'daily activity'). Baseline 3 is appropriate as the description does not significantly add beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns operational metrics and enumerates the specific types: decision stats, top agents/tools, risk distribution, daily activity, approval stats, and tier distribution. This distinguishes it from sibling tools which are focused on individual operations (e.g., approval_list, audit_log_query). However, it lacks a direct verb like 'retrieves' or 'gets', making it slightly less explicit.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. The description does not mention typical use cases, prerequisites, or contrast with related tools. It simply lists what it returns, leaving the agent to infer context from sibling tool names.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

output_safety_scanAInspect

Post-execution output scanner. Checks tool output for PII leaks (email, phone, SSN, IBAN), secret exposure, data exfiltration patterns (outbound URLs, base64), and tool poisoning (injected instructions). Verdict: clean|warn|flag|block.

ParametersJSON Schema

Name	Required	Description
`output`	Yes	Tool output text or JSON to scan
`strict`	No	Block on any high-severity finding
`agent_id`	No
`tool_name`	No	Name of the tool that produced this output

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description lists checks and verdicts but does not explicitly state that the tool is read-only or non-destructive. Without annotations, it should mention that no output is modified. The verdict categories are helpful but behavioral safety is not clarified.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

A single sentence establishes purpose, followed by a bullet-like list of checks and verdicts. Every word contributes, and the purpose is front-loaded. No redundant or vague phrasing.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the tool's function, input, and verdict output. With no output schema, the verdict list provides essential return context. It lacks mention of side effects or prerequisites, but for a scanner, it is largely sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 75%, with descriptions for output, strict, and tool_name but not agent_id. The overall description adds context but does not elaborate on parameter formats or constraints beyond the schema. A baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly specifies the tool's role as a post-execution output scanner, enumerates specific checks (PII, secrets, exfiltration, poisoning), and defines the verdict scale. This distinctly separates it from sibling tools like secret_exposure_check, which have narrower scope.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage after tool execution but does not explicitly state when to use or avoid this tool vs. alternatives (e.g., when only a secret check is needed). No exclusions or prerequisites are given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

payload_safety_checkAInspect

Comprehensive safety scan for injection attacks and dangerous patterns. Detects: prompt injection, jailbreak/DAN attempts, role hijacking, SQL injection (UNION/DROP/OR 1=1), XSS, Python/JS/Shell code injection, path traversal, oversized payloads, null bytes. Returns safe=true/false with finding list and block/allow decision.

ParametersJSON Schema

Name	Required	Description
`payload`	Yes	Payload to scan (string or object)
`agent_id`	No	Agent identifier
`strict_mode`	No	Block on any finding (default: false)

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It explains the return format (safe=true/false, finding list, block/allow decision) and the types of patterns detected. However, it does not mention performance or authorization needs.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences with no unnecessary words. The first sentence states the purpose, the second lists detection types and output format. Highly concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description fully covers the return value (safe boolean, findings list, decision). It also lists all major detection categories and parameter details, making it complete for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and the description does not add significant meaning beyond what the schema already provides for the parameters (payload, agent_id, strict_mode). Baseline 3 due to high coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it's a 'Comprehensive safety scan for injection attacks and dangerous patterns' and lists specific attack types, making it distinct from sibling tools like 'output_safety_scan' or 'secret_exposure_check'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lists what it detects but lacks explicit guidance on when to use or not use this tool versus alternatives. No when-to-use or exclusions are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

payment_policy_checkAInspect

Validate a payment against policy rules before execution. Checks amount limits (>100k warns, >1M blocks), recipient allowlist/denylist, supported currencies/networks, AML reporting thresholds, and MiCA flags. Returns approved/rejected with full violation list and risk score.

ParametersJSON Schema

Name	Required	Description
`amount`	Yes	Payment amount
`network`	No	Payment network (ethereum, base, sepa...)
`purpose`	No	Payment purpose (required for compliance)
`agent_id`	No	Agent identifier
`currency`	No	Currency code (USD, EUR, USDC, ETH...)
`recipient`	No	Recipient address or ID

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description fully discloses behavioral traits: it warns on >100k, blocks >1M, checks recipient lists, currencies, AML, and MiCA flags. It details the return of approval status, violation list, and risk score. However, it omits side effects (none expected) and error handling details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (3 sentences) with no redundant information. It front-loads the core purpose and lists specific checks efficiently, making it easy for an agent to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a 6-parameter validation tool with no output schema, the description adequately covers the checks performed and return structure. It lacks details on parameter interactions or prerequisite conditions but is sufficient for agent invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage with basic descriptions. The description adds value by explaining amount thresholds (warn/block) but does not enhance other parameter meanings beyond the schema. Baseline 3 is appropriate given high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool validates payments against policy rules before execution, specifying the exact checks performed (amount limits, recipient lists, currencies, AML, MiCA). It distinguishes itself from sibling policy tools (e.g., spend_limit_check) by focusing on payment-specific rules.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains what the tool does but does not explicitly state when to use it over siblings like spend_limit_check or tenant_policy_check. It implies usage for payment validation but lacks explicit guidance on context or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

policy_preflightBInspect

Pre-flight security check before any tool call. Evaluates all policies, computes risk score, checks rate limits, and returns allow/deny/require_approval decision. Call this BEFORE executing any agent tool. Writes to audit log automatically.

ParametersJSON Schema

Name	Required	Description
`agent_id`	No	Unique identifier for the calling agent
`tool_args`	No	Arguments for the tool call
`tool_name`	Yes	Name of tool about to be called
`session_id`	No	Session identifier (optional)

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description must disclose behavioral traits. It mentions writing to the audit log automatically, but does not clarify other potential side effects, required permissions, or whether the tool is purely read-only. The description adds some context but lacks depth for a tool with no annotation safety hints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences covering purpose, usage instruction, and a behavioral note. It is concise and front-loaded with key actions, though the list of actions in the first sentence could be more structured. Overall, efficient with minimal waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (4 params, nested objects, no output schema), the description is somewhat lacking. It does not explain the decision types (allow/deny/require_approval) or what the agent should do next. It also does not integrate with sibling tools, leaving the agent to piece together the overall workflow.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All four parameters have schema descriptions (100% coverage), so the baseline is 3. The description does not provide additional meaning beyond the schema; for example, it does not explain how agent_id or session_id are used in the evaluation. Thus, no value added beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that this tool performs a pre-flight security check evaluating policies, risk score, and rate limits to return a decision. However, it does not differentiate itself from sibling tools like rate_limit_check or tool_risk_score, which handle individual aspects of this check.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly instructs to call this tool BEFORE executing any agent tool, providing clear usage timing. However, it offers no guidance on when not to use it or how it relates to alternative sibling tools (e.g., using rate_limit_check for just rate limits), leaving the agent to infer context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

policy_registerAInspect

View the central policy registry. Query tiers (T1-T4), tool classifications, escalation rules. Actions: summary (default), lookup (by tool_name), tiers, rules, tools (by tier_id).

ParametersJSON Schema

Name	Required	Description
`action`	No	What to query (default: summary)
`tier_id`	No	Tier ID for tools action
`tool_name`	No	Tool name for lookup action

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description implies a read-only query operation ('View'), and lists all available actions. No annotations exist, so it carries the full burden; it adequately discloses the tool's capabilities without side-effect details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences front-load the purpose and actions, with zero wasted words. Every element is necessary.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the main actions and parameter usage but does not detail the response format. For a query tool without an output schema, it is mostly complete, though a brief note on return values would improve it.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description explicitly maps actions to parameters (e.g., lookup uses tool_name, tools uses tier_id), adding significant value beyond the schema's enum descriptions which only list options.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is for viewing the central policy registry (verb+resource) and lists specific actions, distinguishing it from sibling tools that handle approvals or checks.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear guidance on when to use each action (summary default, lookup by tool_name, tools by tier_id) but does not explicitly contrast with alternatives or state when not to use this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rate_limit_checkAInspect

Check if an agent has exceeded rate limits. Returns per-window usage (minute/hour/day) with percentage used. Limits: 200/min, 5000/hr, 50000/day per agent. Use before high-frequency tool calls or for agent health monitoring.

ParametersJSON Schema

Name	Required	Description	Default
`agent_id`	Yes	Agent ID to check

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, but the description fully discloses behavior: returns per-window usage percentages and specifies exact limits per time window. Implies read-only operation with no side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise three-sentence description. Each sentence is purposeful: states purpose, specifies return value, and gives usage guidance. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple check tool with one parameter and no output schema, the description covers purpose, return values, and rate limits. Could mention error cases but adequate for typical use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% for the single parameter 'agent_id', which is already described in schema. Description adds no further semantic detail beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool checks rate limits for an agent, specifying the exact action and resource. Distinguishes from sibling 'check' tools by focusing on rate limits.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit guidance on when to use: 'before high-frequency tool calls or for agent health monitoring.' Does not explicitly mention when not to use or alternatives, but context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

replay_guard_checkAInspect

Detect replay attacks — identical requests sent multiple times in a time window. Uses SHA256 fingerprint of (agent_id + tool_name + args). Default window: 300 seconds (5 min). Returns is_replay=true/false with duplicate count and first/last seen timestamps.

ParametersJSON Schema

Name	Required	Description
`agent_id`	No	Agent identifier
`tool_args`	No	Tool arguments (used for fingerprint)
`tool_name`	Yes	Tool name to check
`window_seconds`	No	Replay window in seconds

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full transparency burden. It discloses the fingerprint composition, default time window, and return fields (is_replay, count, timestamps). It does not mention error cases or side effects, but the behavior is well-scoped and non-destructive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences efficiently convey the tool's purpose, mechanism, default, and output. No redundant or filler content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description adequately explains the input parameters (via schema) and output fields for a moderate-complexity tool lacking an output schema. It lacks detail on error handling or behavior when optional parameters are missing, but these are not critical for basic usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so each parameter is already described in the schema. The description adds minimal extra meaning: it confirms the use of agent_id, tool_name, and args in the fingerprint and notes the default window_seconds. This meets the baseline expectation but does not provide additional constraints or formatting details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool detects replay attacks, specifies the fingerprinting method (SHA256 of agent_id, tool_name, args), and distinguishes it from sibling security checks like rate_limit_check or session_validate by focusing on duplicate detection.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains what the tool does but does not provide explicit guidance on when to use it versus alternative tools (e.g., rate_limit_check, secret_exposure_check). The context of the sibling tools suggests it's for replay detection, but no exclusions or comparative advice are given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

scope_checkAInspect

Check if agent has required scope for a tool. Roles: admin, compliance_officer, trader, auditor, developer, readonly. Returns has_scope + missing scope + granting roles.

ParametersJSON Schema

Name	Required	Default
`role`	No	readonly
`scopes`	No
`agent_id`	No
`tool_name`	Yes
`session_id`	No

Tool Definition Quality

A3.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It discloses the return fields (has_scope, missing scope, granting roles) and the list of roles. This is sufficient for a read-only check tool, though it doesn't explicitly state the tool is non-destructive. The description adds value beyond what's in the schema.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, consisting of two sentences that convey key information: purpose, roles, and return values. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 5 parameters, no output schema, and many sibling check tools, the description is incomplete. It does not explain all parameters, provide usage examples, or clarify differences from other check tools. The agent would need additional context to use it correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 5 parameters with 0% description coverage. The description only mentions role and implicitly scopes via roles, but does not explain tool_name, agent_id, session_id, or scopes. It adds minimal meaning beyond the schema itself.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: checking if an agent has required scope for a tool. It specifies the verb 'check' and the resource 'scope', and lists relevant roles and return fields. This distinguishes it from sibling tools like approval_required or rate_limit_check.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a brief overview but lacks explicit guidance on when to use this tool versus alternatives. It does not specify prerequisites or exclude use cases. The mention of roles and returns implies usage, but no direct when/when-not advice.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

secret_exposure_checkAInspect

Deep scan any text/payload for secrets, credentials, and PII. Detects: API keys (OpenAI, GitHub, AWS), tokens (Slack, Bearer), private keys (ETH, Bitcoin), credentials (passwords, secrets), and PII (emails, credit cards, SSNs). Returns findings with severity and remediation guidance.

ParametersJSON Schema

Name	Required	Description	Default
`payload`	Yes	Text or JSON string to scan
`scan_type`	No	all \| keys \| tokens \| pii	all

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It clearly states that the tool performs a scan, returns findings with severity and remediation, and does not imply destructive actions. However, it does not explicitly state it is read-only or mention any side effects, which would merit a 5.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, all front-loaded with the main action. Every sentence adds value: purpose, detection list, and output summary. No wasted words, achieving high density of useful information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the input (text/payload), the scanning behavior, and the output (findings with severity and remediation). Given no output schema, the description sufficiently explains what the agent can expect. The tool's complexity is moderate, and the description handles it fully.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description adds general context about detection types but does not elaborate on the parameters beyond what the schema already provides (payload and scan_type with descriptions). No additional value for parameter usage is offered.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Deep scan any text/payload for secrets, credentials, and PII.' It lists specific detection types (API keys, tokens, private keys, etc.), making the function unambiguous and distinct from sibling tools like payload_safety_check which may cover broader safety issues.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains what the tool does but does not provide explicit guidance on when to use it versus alternatives (e.g., output_safety_scan, threat_intel_check). There is no mention of prerequisites, limitations, or scenarios where this tool is inappropriate, leaving the agent to infer usage from context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

session_validateCInspect

Create/validate/invalidate agent sessions with role, scopes, TTL and call budget. Actions: create|validate|invalidate|info.

ParametersJSON Schema

Name	Required	Default
`role`	No	readonly
`action`	Yes	validate
`scopes`	No
`agent_id`	No
`tenant_id`	No	default
`session_id`	No
`call_budget`	No
`ttl_seconds`	No

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided; description does not disclose whether actions are read-only, destructive, or require specific permissions. For a tool that can create and invalidate sessions, safety implications are omitted.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single concise sentence covering core functionality. However, the list of actions could be more clearly integrated. Minimal waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, no annotations, and 8 loosely defined parameters, the description is incomplete. Missing information on return values, error conditions, prerequisites, or behavior per action.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0% as description mentions role, scopes, TTL, and call budget but does not explain any parameter in detail. For 8 parameters, the description adds minimal value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool manages sessions with four distinct actions (create, validate, invalidate, info). It contrasts with sibling tools that are checks and approvals, making its purpose distinct.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool over alternatives or how to choose among the actions. Siblings don't overlap functionally, but the description lacks explicit context for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

spend_limit_checkAInspect

Check if a payment amount stays within agent spend limits. Default limits: 10,000/call, 50,000/hr, 200,000/day. Trusted agents: 100,000/call, 500,000/hr, 2,000,000/day. Returns within_limits=true/false with headroom percentage.

ParametersJSON Schema

Name	Required	Description	Default
`amount`	Yes	Amount to check
`agent_id`	No	Agent identifier
`currency`	No	Currency code
`trust_level`	No	default or trusted	default

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must disclose behavioral traits. It explains return format (within_limits and headroom percentage) and default limits, but doesn't mention error conditions, side effects, or what happens when limits are exceeded.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences effectively convey purpose, limits, and return value with no wasted words. Information is front-loaded and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple check tool with no output schema, the description clearly explains the return values and default limits. It is sufficient for an AI agent to understand the tool's behavior among many sibling check tools.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage with descriptions for each parameter. The description adds context by providing default and trusted limits, but doesn't elaborate on individual parameters beyond what the schema already states, which is adequate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool checks if a payment amount stays within agent spend limits, specifies default and trusted agent limits, and distinguishes from siblings like rate_limit_check or scope_check by focusing on spend limits.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for checking payment amounts against spend limits but lacks explicit when-not-to-use guidance or comparison with sibling tools such as rate_limit_check or payment_policy_check.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tenant_policy_checkCInspect

Multi-tenant governance. Tenants: default, fintech_eu (MiCA/DORA), defi_protocol, enterprise_read. Checks tool blocklists, max risk scores, spend limits. Actions: check|list.

ParametersJSON Schema

Name	Required	Default
`action`	No	check
`agent_id`	No
`tenant_id`	No	default
`tool_name`	No

Tool Definition Quality

C2.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description should disclose behavioral traits. It states the tool checks certain criteria but does not mention whether it is read-only, requires authorization, rate limits, or what happens upon violation. This is insufficient for a governance tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is brief (two sentences) and front-loaded with the core concept. However, it could be restructured to separate the tenant list from the actions for improved readability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and four parameters with zero schema description, the description is incomplete. It does not explain return format, result interpretation, or parameter interdependencies. A more thorough description is needed for a policy checking tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description must compensate. It explains the 'action' parameter (check|list) and the 'tenant_id' values, but does not describe 'agent_id', 'tool_name', or the meaning of 'list' vs 'check'. The default values are not explained, leaving ambiguity.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly indicates the tool is for multi-tenant governance and checks blocklists, risk scores, and spend limits. It lists tenants and actions, providing a specific verb+resource. However, it does not differentiate from sibling tools like spend_limit_check or rate_limit_check, which partially overlaps.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It implies usage through the tenant list and actions, but no exclusions or comparisons to sibling check tools are given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

threat_intel_checkAInspect

Check entity against threat intelligence. Auto-detects ETH addresses, IPs, domains. Checks sanctions (Tornado Cash), disposable services, behavioral analysis from audit log. Returns threat_level: none|low|medium|high|critical.

ParametersJSON Schema

Name	Required	Description	Default
`entity`	Yes	Address, IP, domain or agent_id to check
`agent_id`	No
`entity_type`	No		auto

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Without annotations, the description does a good job disclosing behavior: it auto-detects entity type, checks sanctions, disposable services, and behavioral analysis from audit log, and returns a threat level. It does not mention side effects or safety, but being a read-only check, it is mostly transparent. However, it could explicitly state it is non-destructive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, each informative. The first sentence states the core purpose, the second adds auto-detection details, the third lists checks and output. No fluff, all sentences are necessary and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and no annotations, the description covers the tool's purpose, inputs, and output well. It explains what checks are performed and the possible threat levels. However, it does not describe the full output structure or explain the agent_id parameter, leaving some gaps for a complex interaction.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is low (33%), but the description adds meaning: it explains that entity can be an ETH address, IP, or domain, and that entity_type defaults to 'auto'. This compensates for the lack of schema descriptions. The agent_id parameter is not explained, which is a minor gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool checks an entity against threat intelligence, auto-detects entity types (ETH addresses, IPs, domains), and performs specific checks. This differentiates it from sibling check tools like payload_safety_check or scope_check by focusing on threat intelligence.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies the tool should be used when checking an entity for threat intelligence, but it does not explicitly state when to use it over alternatives or provide exclusion criteria. No guidance on when not to use is given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tool_manifest_verifyAInspect

Supply-chain verification for MCP tools. Checks publisher identity against allowlist, scans tool descriptions for prompt injection, validates server domain and signing capability. Verdict: trusted|caution|block.

ParametersJSON Schema

Name	Required	Description
`publisher`	No	Claimed publisher name
`tool_name`	No	Tool name to check
`server_url`	No	MCP server URL to verify
`tool_description`	No	Tool description to scan for injections

Tool Definition Quality

A3.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must fully disclose behavior. It lists the checks and verdict, but lacks information on whether the tool is read-only, requires specific permissions, or has any side effects. The agent cannot assess if this operation is safe or what happens on failure.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences front-loaded with the purpose, listing specific checks and the verdict output. Every sentence provides essential information without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the 4 parameters and no output schema, the description is comprehensive enough for a supply-chain verification tool, stating the verdict options. Minor missing details on edge cases (e.g., unreachable server) but overall adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage with descriptions for all 4 parameters. The tool description adds minimal extra meaning beyond the schema, such as that tool_description is scanned for injections. Baseline 3 applies since schema already explains parameters adequately.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as supply-chain verification for MCP tools, listing specific checks (publisher identity, prompt injection, server domain, signing capability) and the verdict options (trusted|caution|block). This distinguishes it from sibling tools like tool_risk_score which likely outputs a numeric score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states what the tool does but provides no explicit guidance on when to use it versus alternatives (e.g., tool_risk_score, threat_intel_check). The context of being a supply-chain verification tool is implicit, but without when-not or alternative references.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tool_risk_scoreAInspect

Compute 0-100 risk score for any tool + input combination. 0=minimal risk (read-only), 100=critical (payment/irreversible). Detects secrets, injection attempts, high-value amounts. Use before deciding whether to proceed with a tool call.

ParametersJSON Schema

Name	Required	Description
`agent_id`	No	Agent identifier (affects trust factor)
`tool_args`	No	Tool arguments to analyze
`tool_name`	Yes	Tool name to score

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses that the tool detects secrets, injection attempts, and high-value amounts, but it does not describe the output format or whether side effects occur. Since the tool is read-only (risk assessment), the absence of side-effect disclosure is acceptable, but the lack of output specification (e.g., returns only score or also metadata) is a gap.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, highly concise, and front-loaded with the core purpose and scale. Every word adds value: the scale definition, detection capabilities, and usage guidance are packed efficiently with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has no output schema, so the description should explain the return value (e.g., the score alone or with breakdown). It mentions computing a score but doesn't specify the output structure, leaving an important gap. However, given the tool's simplicity and the presence of sibling tools that handle details, it is moderately complete for its role.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% (all parameters have descriptions in the schema). The description adds context about what is detected (secrets, injection, amounts) but does not provide further meaning beyond the schema's descriptions. Baseline 3 is appropriate because the schema already documents the parameters, and the description offers limited additional value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool computes a 0-100 risk score for any tool+input combination, specifying the scale (0=minimal, 100=critical) and what it detects (secrets, injection, high-value amounts). This is a specific verb+resource that distinguishes it from sibling check tools like approval_required or secret_exposure_check, which focus on individual aspects.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly advises to use the tool 'before deciding whether to proceed with a tool call,' which clearly states when to use it. It does not mention when not to use it or name specific alternatives, but the sibling list implies other checks are for more specific scenarios. This guidance is clear enough but lacks exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [{ "email": "your-email@example.com" }]
}

The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.

agentguard

Server Details

Tool Definition Quality

Available Tools

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Discussions

Your Connectors

Resources