dns

by com.blackveilsecurity

Server Details

DNS and email security scanner with 80 MCP tools for SPF, DMARC, DNSSEC, SSL, and brand audits.

Status: Healthy
Last Tested: 2026-07-17 07:42
Transport: Streamable HTTP
URL
Repository: MadaBurns/bv-mcp
GitHub Stars: 7
Server Listing: Blackveil DNS

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client

Glama

MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.

Tool Definition Quality

A3.6/5.0

Tool DescriptionsA

Average 4.2/5 across 80 of 80 tools scored. Lowest: 3.4/5.

Server CoherenceB

Disambiguation4/5

Most tools have clearly distinct purposes, with detailed descriptions that help differentiate overlapping areas (e.g., check_dane vs check_dane_https). However, there are sets of similar tools (brand audit, OSINT, polling) that could cause confusion if descriptions are not carefully read.

Naming Consistency3/5

The naming is mostly readable but inconsistent: many check_* tools follow a verb_noun pattern, but there are also noun_verb names (scan_domain, cymru_asn), bare verbs (generate), and varied patterns for async operations (discover_brand_domains_start vs discover_brand_domains).

Tool Count2/5

With 80 tools, the server feels overstuffed. It covers multiple domains (DNS, email, brand, OSINT, M365) that could benefit from separation. The high number includes many polling/status tools that add overhead.

Completeness3/5

The server covers core DNS and email security checks thoroughly, including many edge cases (e.g., BIMI, MTA-STS, subdomain takeover). However, there are gaps like lack of direct DNS record management and some OSINT tools are restricted to operator deployment, leaving agents with dead ends.

Available Tools

80 tools

analyze_driftA

Read-onlyIdempotent

Inspect

Measure whether a domain's DNS security posture improved or regressed by comparing the current state against a prior scan snapshot. Returns a drift classification (improving/stable/regressing/mixed), score delta, and lists of improvements and regressions. Use to answer "did our security score improve or regress since last time?" — distinct from compare_baseline which checks compliance against a fixed policy (not improvement over time).

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to analyze drift for
`format`	No	Output verbosity. Auto-detected if omitted.
`baseline`	Yes	Prior scan reference for drift-over-time analysis: a previous ScanScore JSON STRING, or the literal "cached" to reuse the last cached scan. NOT a policy/requirements object — for compliance enforcement against required controls, use compare_baseline instead.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Tool Definition Quality

A4.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and openWorldHint, so the description adds value by explaining the drift analysis mechanism, return components (classification, delta, lists), and the dual nature of the baseline parameter (previous scan JSON or 'cached'). No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences: first states core operation, second lists output components, third provides usage guidance and sibling differentiation. No redundant words, front-loaded with essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite lacking an output schema, the description covers key return values (classification, delta, improvement/regression lists), explains parameter semantics fully, and context signals show all parameters are described. The description is complete enough for an AI agent to understand the tool's purpose and usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds significant context beyond the schema: it clarifies that baseline must be a ScanScore JSON string or 'cached' (not a policy object), explains the format parameter's auto-detection, and notes force_refresh usage. This aids correct parameter usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb 'Measure whether... improved or regressed' and resource 'domain's DNS security posture' and clearly distinguishes from the sibling tool compare_baseline which checks compliance against a fixed policy.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly provides when-to-use guidance ('Use to answer "did our security score improve or regress since last time?"') and directly contrasts with compare_baseline, ensuring correct tool selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

assess_coverageA

Read-onlyIdempotent

Inspect

Assess Conditional Access coverage gaps for a Microsoft Entra tenant — identifies users and apps not protected by any enforced policy. Requires m365Proxy service binding; returns { unprovisioned: true } when absent. A representative: true field in the response marks sample (non-live) data until live Graph reads land.

ParametersJSON Schema

Name	Required	Description	Default
`ms_tenant_id`	Yes	Microsoft Entra tenant ID (GUID or domain).

Tool Definition Quality

A4.1/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, idempotentHint, and destructiveHint. The description adds valuable behavioral context: the need for a service binding, the fallback response { unprovisioned: true } when absent, and the representative: true flag indicating sample data until live Graph reads land. This enhances transparency without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, no wasted words. The first sentence delivers the core purpose, and the second provides critical behavioral notes. It is front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

While the description covers important edge cases (service binding requirement, representative data), it does not describe the typical response format (e.g., list of unprotected users/apps, policy details). Without an output schema, more information about the normal response would be beneficial for a complete understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% for the single parameter ms_tenant_id, with a clear description in the schema. The tool description does not add any additional parameter semantics beyond what is already provided in the schema, so the baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Assess Conditional Access coverage gaps for a Microsoft Entra tenant — identifies users and apps not protected by any enforced policy.' The verb 'assess' and specific resource 'coverage gaps' are precise, and the outcome is defined. It distinguishes itself from siblings like get_ca_policies (which lists policies) and map_compliance by focusing on gap identification.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes a requirement ('Requires m365Proxy service binding') and a behavior when missing, but does not provide explicit guidance on when to use this tool versus alternatives like get_ca_policies or map_compliance. The context is implied but not articulated as a decision rule.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

assess_spoofabilityA

Read-onlyIdempotent

Inspect

Compute a composite email spoofability risk score (0–100, higher = more spoofable) by combining SPF trust surface, DMARC enforcement, and DKIM coverage. Returns a risk level (minimal→critical), per-control sub-scores, and plain-language summary of how easy it would be to spoof email from the domain. Use when asked how easy it is to spoof email from a domain, or for a composite email spoofing risk score.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Tool Definition Quality

A4.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, openWorldHint, and destructiveHint=false. The description adds the composite nature and output details but does not reveal new behavioral traits beyond the hints. It provides moderate context without contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two succinct sentences: the first explains the computation and output, the second provides usage guidance. Every sentence adds value, with no redundancy or unnecessary text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description sufficiently explains the return values (risk level, sub-scores, summary) and the scoring scale. It covers the tool's complexity and provides enough detail for an AI agent to understand what to expect.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description does not add parameter-specific details beyond what is in the input schema (e.g., format auto-detection, force_refresh for DNS changes are already described in schema).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool's purpose: computing a composite email spoofability risk score from SPF, DMARC, and DKIM. It distinguishes itself from sibling tools like check_spf, check_dmarc, and check_dkim by focusing on a combined assessment, and explains the output components.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use the tool: 'when asked how easy it is to spoof email from a domain, or for a composite email spoofing risk score.' It provides clear context but does not explicitly list exclusions or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

batch_scanA

Read-onlyIdempotent

Inspect

Bulk-scan up to 10 domains in parallel. Runs a full security audit on each domain in the list and returns score, NIST-aligned letter grade (6-band A+/A/B/C/D/F), and finding counts per domain. Use when you want to audit multiple domains at once or do a bulk scan of several domains simultaneously — distinct from compare_domains which does a side-by-side analysis of 2–5 domains.

ParametersJSON Schema

Name	Required	Description
`format`	No	Output verbosity. Auto-detected if omitted.
`domains`	Yes	Domains to scan (max 10 per request)
`force_refresh`	No	Bypass cache and run fresh scans.

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds value beyond annotations by revealing parallel execution (up to 10 domains) and the specific return data (score, grade, finding counts). It does not contradict annotations. Minor omission: no explicit mention of caching behavior, but force_refresh parameter covers it.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, each serving a distinct purpose: action/constraint, output, usage guidance. Front-loaded with core function. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and 100% parameter documentation, the description covers all critical aspects: purpose, constraints (max 10, parallel), output format, and usage context. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so description does not need to add parameter details. The description mentions domain count limit (10) which aligns with the schema's maxItems constraint, but does not elaborate on format or force_refresh beyond what the schema already provides. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's action: bulk-scan up to 10 domains in parallel, and specifies the output (score, NIST-aligned grade, finding counts). It also distinguishes itself from compare_domains, a sibling tool.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use when you want to audit multiple domains at once or do a bulk scan' and contrasts with compare_domains, providing clear when-to-use and alternative guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

brand_audit_batch_startAInspect

Enqueue an async brand audit across up to 50 target domains with optional standard/deep discovery depth, brand aliases, and caller-supplied candidate domains. Returns { auditId, queuedAt, targetCount, etaSeconds } immediately; poll with brand_audit_status and fetch results with brand_audit_get_report once complete. Each target consumes 1 unit of the monthly BRAND_AUDIT_QUOTAS budget.

ParametersJSON Schema

Name	Required	Description
`view`	No	Output view mode. 'csc_complement' produces a CSC-tuned payload; requires enterprise tier. Default 'standard'.
`depth`	No	Discovery depth. standard is default; deep expands candidate seeding and enrichment fanout.
`format`	No	Inline output mode. Defaults to "both".
`domains`	Yes	Domains to audit (max 50 per batch). Duplicates are merged.
`planner_mode`	No	Planner mode for staged discovery fanout. observe emits metrics; enforce applies candidate-backed signal caps.
`brand_aliases`	No	Optional public brand aliases to seed, such as product or legal-entity labels.
`discovery_mode`	No	Brand-discovery pipeline mode. classic = legacy sweep; tiered = tenant/graph/evidence wrappers first (BlackVeil-internal).
`min_confidence`	No	Drop candidates whose combined confidence falls below this threshold (0-1, default 0.5).
`candidate_domains`	No	Optional candidate domains supplied by the caller for corroboration.
`ownership_verified`	No	Caller attests that the target domains are owned or authorized for scanning. Required when discovery_mode is "tiered" and the caller is not an enterprise/owner/partner principal.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description discloses async nature, immediate return with { auditId, queuedAt, targetCount, etaSeconds }, quota consumption per target. Annotations (readOnlyHint=false, etc.) are consistent and the description adds behavioral context that annotations alone cannot convey, making it highly transparent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with key action and constraints. Every clause adds value (quota, polling workflow). No fluff. Excellently structured for quick parsing.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (10 params, async, quota), the description covers the essential workflow (enqueue, poll, fetch), quota impact, and output shape. Combined with rich schema and output schema, the description is fully complete for an AI agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage with descriptions for all 10 parameters. Description adds value by noting that 'csc_complement' view requires enterprise tier, which is not in the schema. Also explains the purpose of brand_aliases and candidate_domains in context. Contributes beyond schema without being repetitive.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb 'enqueue' and identifies the resource as 'async brand audit across up to 50 target domains'. It clearly distinguishes from sibling tools like brand_audit_single (single domain) and brand_audit_status/brand_audit_get_report (polling and results).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states to poll with brand_audit_status and fetch results with brand_audit_get_report. Mentions quota consumption. Does not explicitly state when not to use or compare with brand_audit_single, but context implies batch use. Clear enough for an AI agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

brand_audit_get_reportA

Read-onlyIdempotent

Inspect

Fetch the result JSON for a completed brand audit. With target set, returns the per-target CheckResult; without, returns the audit-level aggregate. Returns notReady when polling an in-flight audit. When a rendered PDF sidecar exists, metadata includes pdfUrl — an authenticated /reports/ download link (same bearer credential as this call). Completed targets whose PDF is still rendering include pdfPending so callers can poll again.

ParametersJSON Schema

Name	Required	Description	Default
`target`	No	Specific target domain. Omit for audit-level aggregate.
`auditId`	Yes	Audit ID returned by brand_audit_batch_start.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds significant behavioral context beyond annotations: returns 'notReady' for in-flight audits, mentions PDF sidecar with authenticated download link, and pdfPending flag for still-rendering PDFs. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise and well-structured: first sentence states core purpose, then outlines conditional behavior and additional metadata. No filler sentences; every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists (describing possible fields), the description covers polling, PDF sidecar, authentication, and result differentiation. It is complete for a report retrieval tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% (descriptions for both parameters). The description adds meaning: explains that omitting target returns aggregate, setting target returns per-result, and ties auditId to brand_audit_batch_start. This is above the baseline 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it fetches the result JSON for a completed brand audit, distinguishes between aggregate and per-target results, and is specific about the resource and verb. It stands out from sibling tools like brand_audit_status which only check status.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear context on when to use (for completed audits) and optional behavior with target. However, it does not explicitly exclude usage for non-audit-related tasks or mention alternatives like brand_audit_status for status-only queries.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

brand_audit_singleA

Read-onlyIdempotent

Inspect

Run a full brand audit on a single target with optional standard/deep discovery depth, brand aliases, and caller-supplied candidate domains. Discovers brand-related domains, looks up registrar + registrant for each candidate, and classifies each into consolidated, real registrar-sprawl shadowIt, authorized vendor dependency, indeterminate, or impersonation relationships. Gated tier-wide by monthly BRAND_AUDIT_QUOTAS (free/agent=0, developer=50, partner=200, enterprise=500, owner=unlimited).

ParametersJSON Schema

Name	Required	Description
`view`	No	Output view mode. 'csc_complement' produces a CSC-tuned payload; requires enterprise tier. Default 'standard'.
`depth`	No	Discovery depth. standard is default; deep expands candidate seeding and enrichment fanout.
`domain`	Yes	Target domain to audit (e.g., apple.com).
`format`	No	Inline output mode. Defaults to "both".
`planner_mode`	No	Planner mode for staged discovery fanout. observe emits metrics; enforce applies candidate-backed signal caps.
`brand_aliases`	No	Optional public brand aliases to seed, such as product or legal-entity labels.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.
`discovery_mode`	No	Brand-discovery pipeline mode. classic = legacy sweep; tiered = tenant/graph/evidence wrappers first (BlackVeil-internal).
`min_confidence`	No	Drop candidates whose combined confidence falls below this threshold (0-1, default 0.5).
`candidate_domains`	No	Optional candidate domains supplied by the caller for corroboration.
`ownership_verified`	No	Caller attests that the target domain is owned or authorized for scanning. Required when discovery_mode is "tiered" and the caller is not an enterprise/owner/partner principal.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, idempotent, non-destructive behavior. The description adds substantial behavioral detail: what the audit discovers (brand-related domains, registrar/registrant lookups), classification logic, and quota gating. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences deliver the core purpose, process, and quota constraints without extraneous words. The first sentence front-loads the primary action, making efficient use of space.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description adequately covers the tool's main functionality, behavioral nuances, and quota gating given the presence of an output schema (which handles return values) and the high schema coverage. No critical missing context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds context for key parameters like depth, brand_aliases, and candidate_domains ('optional standard/deep discovery depth', 'optional public brand aliases', 'caller-supplied candidate domains'), going beyond the schema's descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('Run a full brand audit') and identifies the resource ('single target'), clearly distinguishing this from sibling tools that focus on individual checks (e.g., check_dmarc, check_spf) or batch processes (brand_audit_batch_start).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implicitly differentiates from batch tools by specifying 'single target' and mentions tier-based quotas. However, it does not explicitly state when to choose this tool over siblings like brand_audit_batch_start or single-check tools, leaving some ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

brand_audit_statusA

Read-onlyIdempotent

Inspect

Poll the status of an enqueued brand audit. Returns audit-level status (queued | running | completed | failed), progress 'N/M', and per-target statuses. Owner-scoped — auditIds owned by other principals surface as notFound.

ParametersJSON Schema

Name	Required	Description	Default
`auditId`	Yes	Audit ID returned by brand_audit_batch_start.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, idempotentHint, and destructiveHint, so the behavioral risk is low. The description adds value by explaining the polling nature, progress format, and the notFound response for unauthorized auditIds, which goes beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no fluff. The first sentence front-loads the action ('Poll the status'), and the second adds essential context about owner-scoping. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With a single parameter, 100% schema coverage, and the presence of an output schema, the description is complete. It covers purpose, behavior, and scoping without needing to elaborate on return values. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has one parameter with full description coverage, referencing brand_audit_batch_start. The description adds marginal value by noting owner-scoping but does not elaborate on the parameter format or constraints beyond what the schema provides. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states it polls the status of an enqueued brand audit, specifying returned fields (audit-level status, progress, per-target statuses) and owner-scoping. It clearly differentiates from sibling tools like brand_audit_batch_start and brand_audit_get_report.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description clearly indicates when to use the tool (after enqueuing a brand audit) and notes the owner-scoping behavior. It does not explicitly list alternative tools or when not to use, but the context is sufficient for an AI agent to infer usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_agent_discoveryA

Read-onlyIdempotent

Inspect

Assess the security posture of IETF BANDAID agent-discovery records (draft-mozleywilliams-dnsop-dnsaid). Detects SVCB agent records under _agents/index.{protocol}._agents, reports whether the discovery zone is DNSSEC-anchored (unsigned = spoofable agent endpoints), evaluates DANE/TLSA binding trust (RFC 6698 §10.1), and checks capability-document integrity (cap / cap-sha256). Read-only; uses Private-Use SVCB param code points pending IANA assignment.

ParametersJSON Schema

Name	Required	Description
`name`	No	Resolve a single named agent ({name}.{domain}) instead of enumerating the zone.
`domain`	Yes	Domain to check for published agent-discovery records (e.g., example.com).
`format`	No	Output verbosity. Auto-detected if omitted.
`protocol`	No	Scope discovery to a single agent protocol index (_index._{protocol}._agents). Omit to sweep the zone.
`verify_cap`	No	Fetch each declared capability document (cap=) over HTTPS via safeFetch and verify it against the cap-sha256 integrity pin. Default false (declaration/existence check only).
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, idempotentHint, and non-destructive behavior. The description adds useful context: it uses Private-Use SVCB param code points pending IANA assignment, and mentions caching behavior via the force_refresh parameter. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is about five sentences, front-loaded with the main purpose, and efficiently covers the tool's scope. It is concise without being overly verbose, though it could be slightly more structured for clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, 1 required, output schema exists), the description covers the main security checks and notes read-only behavior. It does not repeat output schema details. It is fairly complete for a security assessment tool, though mentioning the output format briefly could help.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with good parameter descriptions. The description adds functional context beyond the schema, such as explaining that verify_cap relates to capability-document integrity checking and that force_refresh bypasses cache. This enhances understanding of how parameters affect behavior.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool assesses the security posture of IETF BANDAID agent-discovery records, listing specific checks (SVCB records, DNSSEC anchoring, DANE/TLSA binding trust, capability-document integrity). The verb 'assess' and resource 'agent-discovery records' are specific and distinguish it from sibling tools like check_dnssec or check_dane.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for checking BANDAID agent-discovery security but does not explicitly state when to use this tool versus alternatives like check_dnssec or check_dane. There is no guidance on context or exclusions, only a statement of what it does.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_authoritative_dns_infraA

Read-onlyIdempotent

Inspect

Check authoritative DNS infrastructure posture for a hostname. Uses BV_INFRA_PROBE when available for raw DNS, routing, RPKI, and vantage-point evidence.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, idempotentHint, and non-destructive. The description adds context about using BV_INFRA_PROBE for evidence types (raw DNS, routing, RPKI, vantage-point), but no further behavioral traits like caching, performance, or error handling. With annotations, this is adequate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first states the core purpose, second adds implementation detail. No unnecessary words, well-structured and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and complete parameter descriptions, the description covers the tool's basic function. However, it lacks usage context relative to siblings and does not elaborate on output structure or edge cases. Adequate but not exceptional.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% (all 3 parameters have descriptions). The description does not add meaning beyond the schema; it only restates the tool's general purpose. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Check' and the resource 'authoritative DNS infrastructure posture for a hostname'. It distinguishes from siblings like check_ns or check_dnssec by being a broader posture assessment. The mention of BV_INFRA_PROBE adds specificity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs. siblings (e.g., check_ns, check_dnssec). The description implies it is for a comprehensive check, but lacks when-not or alternative recommendations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_bimiA

Read-onlyIdempotent

Inspect

Check the BIMI brand-logo record at default._bimi.. Validates the logo URL (l=) and VMC certificate evidence (a=), and verifies the DMARC enforcement prerequisite (p=quarantine/reject) that mail clients require before displaying a BIMI logo. Returns findings for a missing/malformed record or unmet prerequisites. Use to assess brand-indicator readiness in inboxes. Part of the scan_domain audit.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, openWorldHint, idempotentHint, and non-destructive nature. The description adds context by explaining the specific checks (logo URL, VMC certificate, DMARC prerequisite) and the type of findings returned, which is beyond the annotation metadata.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is four well-structured sentences, each adding essential information without redundancy. It starts with the main action, then details the validation, mentions the output, and ends with the use case. No fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the rich annotations and output schema, the description provides sufficient context: what is checked, the DNS record location, validation criteria, prerequisites, and intended use. It fulfills all necessary information for an agent to decide when and how to invoke this tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear parameter descriptions. The tool description adds value by linking the domain parameter to the specific subdomain (default._bimi.<domain>) and explaining the relevance of the format and force_refresh parameters in the context of BIMI checks.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool checks the BIMI brand-logo record, specifies the exact DNS location (default._bimi.<domain>), details the validated elements (l=, a=, p=), and explains its purpose of verifying conditions for mail clients. It distinguishes from sibling tools by focusing on BIMI readiness.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states to 'Use to assess brand-indicator readiness in inboxes' and notes it is part of the scan_domain audit. However, it does not explicitly compare to sibling tools or state when not to use it, leaving some ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_caaA

Read-onlyIdempotent

Inspect

Look up CAA records for a domain. Shows which Certificate Authorities are authorized to issue certificates. Part of the scan_domain audit.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint false. The description adds that it shows authorized CAs, which is consistent. No new behavioral traits beyond the annotations are disclosed. The description does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loading the core function and value. Every sentence serves a purpose. No redundant or vague phrasing.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With an output schema present, the description does not need to detail return values. It covers the core purpose and context (part of an audit). Slightly more context about when to use it or its relationship to the broader audit could move it to 5, but it's satisfactory.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description does not need to elaborate on parameters. The description provides contextual meaning for the domain parameter via 'CAA records' but does not add details beyond the schema. Baseline 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool looks up CAA records for a domain and specifies what CAA records indicate (authorized CAs). It distinguishes itself from sibling 'check_*' tools by targeting a specific DNS record type. The context 'Part of the scan_domain audit' further clarifies its role.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when CAA records are needed but does not explicitly state when not to use this tool or suggest alternatives. Given the sibling list includes many DNS checks, the agent can infer context, but explicit guidance is missing.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_daneA

Read-onlyIdempotent

Inspect

Check DANE/TLSA certificate pinning for SMTP at port 25. Resolves the domain's MX hosts and looks up TLSA records at _25._tcp., verifying whether SMTP mail-server certificates are bound in DNS (DNSSEC-backed protection against CA misissuance and MITM on inbound mail). Use when asked if SMTP connections are protected by DANE/TLSA pinning. For HTTPS DANE at port 443, use check_dane_https instead. Part of the scan_domain audit.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description details internal behavior: resolves MX hosts, looks up TLSA records, verifies DNS binding. Annotations already declare readOnlyHint, openWorldHint, idempotentHint, destructiveHint, so description adds context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise, front-loaded with purpose, then usage guidance, then alternative. Every sentence adds value with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given annotations and output schema presence, description covers purpose, usage, and internal steps. Could mention output format but output schema handles that.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed descriptions for all three parameters. Description does not add additional parameter semantics beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Check DANE/TLSA certificate pinning for SMTP at port 25', specifying verb, resource, protocol, and port. It distinguishes from sibling tool check_dane_https for HTTPS.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'Use when asked if SMTP connections are protected by DANE/TLSA pinning' and provides alternative: 'For HTTPS DANE at port 443, use check_dane_https instead.'

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_dane_httpsA

Read-onlyIdempotent

Inspect

Verify DANE certificate pinning for HTTPS connections. Looks up TLSA records at _443._tcp.{domain} (port 443) to confirm the web certificate is pinned in DNS. Distinct from check_dane which covers SMTP at port 25. Part of the scan_domain audit.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnly, idempotent, and non-destructive behavior. The description adds the specific DNS lookup behavior (TLSA records at _443._tcp.{domain}), which is useful beyond annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, each adding value: purpose, technical detail, and distinction from sibling. No wasted words, well-structured, and front-loaded with the main action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema, the description does not need to explain return values. It covers the core behavior and context. Could implicitly hint at the output (e.g., pinning status), but overall it is complete enough for an agent to understand.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The description does not add significant meaning beyond what the schema already provides for the parameters (domain, format, force_refresh). It implies domain is used for the lookup but adds no new semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool verifies DANE certificate pinning for HTTPS connections by looking up TLSA records at a specific DNS location. It uses a specific verb ('Verify') and resource, and distinguishes itself from the sibling tool check_dane.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly contrasts with check_dane (SMTP port 25), guiding when to use this tool. It also mentions it's part of the scan_domain audit, providing context. It does not list exclusions or prerequisites, but the distinction is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_dblA

Read-onlyIdempotent

Inspect

Check domain reputation against DNS-based Domain Block Lists (Spamhaus DBL, URIBL, SURBL). Returns listing status with decoded return codes.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already cover safety (readOnly, idempotent, non-destructive). The description adds that it returns decoded return codes, which is useful but not extensive. No mention of caching behavior or rate limits beyond the force_refresh parameter hint.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise (two sentences, ~20 words), front-loaded with the core purpose, and no superfluous information. Every sentence contributes value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the existence of an output schema and the simplicity of the tool (checking reputation against specific lists), the description adequately covers the main purpose, lists, and return type. Minor gap: doesn't elaborate on how checks work, but output schema likely handles return details.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions for all 3 parameters. The tool description does not add meaningful semantic details beyond what the schema already provides, staying at the baseline for high-coverage schemas.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly specifies the tool checks domain reputation against specific DNS-based block lists (Spamhaus DBL, URIBL, SURBL) and returns decoded return codes, distinguishing it from sibling tools like check_rbl or check_realtime_threat_feed.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool over the many sibling check tools (e.g., check_rbl, check_realtime_threat_feed). Usage is implied by the named lists but lacks alternative comparisons.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_dkimA

Read-onlyIdempotent

Inspect

Look up DKIM records for a domain. Probes common selectors, validates the signing algorithm used for outgoing email (RSA-1024/2048, Ed25519), and reports key strength. Use to verify that outbound email signatures are cryptographically sound. Part of the scan_domain audit.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`selector`	No	DKIM selector. Omit to probe common ones.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnly, idempotent, and non-destructive hints. The description adds behavioral details: probing common selectors, validating algorithm (RSA-1024/2048, Ed25519), and reporting key strength. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, each adding value. First sentence states purpose, second adds algorithm detail, third provides usage and context (part of scan_domain audit). No fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (explaining return values), the description covers the tool's purpose, key behavior (selector probing, algorithm validation), and usage context. It is sufficient for an agent to understand when and how to invoke the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the schema already describes all parameters. The description adds context about probing common selectors (related to the selector parameter) and output specifics, but does not significantly enhance parameter understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Look up DKIM records') and the resource ('for a domain'), with specifics on probing common selectors and validating signing algorithms. It distinguishes from siblings by focusing on DKIM specifically, which is unique among many DNS check tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use to verify that outbound email signatures are cryptographically sound,' providing clear use case. While it doesn't explicitly name alternatives like check_dmarc or check_spf, the context of sibling tools makes the distinction implicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_dmarcA

Read-onlyIdempotent

Inspect

Look up and validate the DMARC record for a domain. Shows the enforcement level (none/quarantine/reject), alignment mode (strict/relaxed), and aggregate/forensic reporting destinations. Use to determine a domain's DMARC enforcement level, whether it sends aggregate reports, or if it is protected against email impersonation — distinct from check_shadow_domains (which checks TLD variants) and assess_spoofability (composite score). Part of the scan_domain audit.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, idempotentHint as true, and destructiveHint false. The description adds transparency about caching (force_refresh parameter bypasses cache) and validation behavior. It does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, front-loads the main purpose, then details outputs, and ends with usage differentiation. It is efficient without being overly terse, but could slightly tighten wording.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and annotations, the description covers key aspects: core function, output fields, usage context, and differentiation. It is complete for an AI agent to understand when and how to use the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with adequate descriptions for all 3 parameters. The description does not add additional parameter semantics beyond what's in the schema; it focuses on output interpretation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the verb 'Look up and validate' and the resource 'DMARC record for a domain'. It lists specific output fields (enforcement level, alignment mode, reporting destinations) and explicitly distinguishes from sibling tools check_shadow_domains and assess_spoofability.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description clearly states when to use: to determine DMARC enforcement level, report status, or protection against email impersonation. It explicitly contrasts with two sibling tools, providing differentiation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_dnskey_strengthA

Read-onlyIdempotent

Inspect

Audit the cryptographic strength of DNSKEY signing algorithms used for DNSSEC. Reports which algorithm is used for DNSSEC signing keys (RSA/SHA-1, RSA/SHA-256, ECDSA P-256, Ed25519, etc.), flags deprecated algorithms (RSA/SHA-1, DSA), independent of whether the DNSSEC chain validates. Use when asked what algorithm is used for DNSSEC signing keys, or if deprecated DNSKEY algorithms are in use. Part of the scan_domain audit.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds behavioral details beyond annotations, such as reporting algorithms and flagging deprecated ones, and clarifies it is independent of chain validation. Annotations already declare it as read-only and idempotent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with two clear sentences: first defining purpose and behavior, second stating when to use. No redundant or unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (3 parameters, output schema exists, many sibling tools), the description is complete. It covers purpose, usage, behavioral nuances, and context (part of scan_domain audit).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description does not add parameter-specific details beyond what the schema provides, but the schema already sufficiently describes each parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool audits cryptographic strength of DNSKEY algorithms, specifies what it reports (algorithm names, deprecated flags), and distinguishes it from sibling tools like check_dnssec by noting independence from DNSSEC chain validation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit usage guidance is provided: 'Use when asked what algorithm is used for DNSSEC signing keys, or if deprecated DNSKEY algorithms are in use.' It does not explicitly mention when not to use, but the context implies alternatives exist among sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_dnssecA

Read-onlyIdempotent

Inspect

Check DNSSEC status for a domain. Verifies whether DNS is tamper-proof and protected against cache poisoning and DNS spoofing attacks by validating DNSKEY and DS records. Reports whether DNSSEC is enabled and validating. Part of the scan_domain audit.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false. The description adds context about verifying DNSSEC status and reporting if enabled, which aligns with these annotations but doesn't disclose additional behavioral traits like caching behavior or error handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with three sentences: first states the purpose, second explains what it verifies, third gives context. It is front-loaded and contains no extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (3 parameters, one required), annotations covering safety and idempotency, and the presence of an output schema (not shown but exists), the description is sufficiently complete. It also notes the tool's role in scan_domain audit.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the input schema already describes all parameters with good descriptions. The tool description does not add further semantic meaning beyond what the schema provides, resulting in a baseline score of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool checks DNSSEC status for a domain and explains what it verifies (tamper-proof, cache poisoning protection). It distinguishes from sibling check_* tools by focusing specifically on DNSSEC, using a specific verb and resource.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

While the description mentions it's part of scan_domain audit, it provides no explicit guidance on when to use this tool versus alternatives like check_dnssec_chain or check_dnskey_strength. No exclusionary language or recommended contexts are given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_dnssec_chainA

Read-onlyIdempotent

Inspect

Walk the full DNSSEC chain of trust from the DNS root down to the target domain, tracing DS/DNSKEY records and algorithm usage at each zone level. Use when asked to trace the chain of trust from the DNS root, or to see the full DNSSEC delegation path step by step.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, openWorldHint=true, idempotentHint=true, destructiveHint=false. The description adds behavioral context: it walks the full chain, traces records, and is step-by-step. No contradictions, and it complements the annotations well.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Exactly two sentences: first defines the tool's action, second specifies usage context. No redundant information, and the most critical information is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the core purpose and usage context. The presence of an output schema (indicated true) means return values are handled. The tool is complex but the description is adequate, though it could mention caching behavior or prerequisites.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%: all three parameters (domain, format, force_refresh) have descriptions in the schema. The tool description does not add extra meaning beyond what the schema provides, so a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses specific verbs and nouns: 'Walk the full DNSSEC chain of trust from the DNS root down to the target domain, tracing DS/DNSKEY records and algorithm usage at each zone level.' This clearly distinguishes the tool from siblings like check_dnssec (which likely just checks if DNSSEC is enabled) and check_dnskey_strength.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use: 'Use when asked to trace the chain of trust from the DNS root, or to see the full DNSSEC delegation path step by step.' While it does not mention when not to use, the context of siblings provides implicit alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_fast_fluxA

Read-onlyIdempotent

Inspect

Detect fast-flux DNS behavior: performs multiple rounds of A/AAAA queries and checks whether IP addresses are rotating rapidly on each DNS query (a sign of botnet or malicious infrastructure). Compares IP answer sets and TTLs across rounds to identify rapidly rotating infrastructure used to hide malicious activity.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`rounds`	No	Number of query rounds (3-5, default 3).
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark the tool as read-only, idempotent, and not destructive. The description adds behavioral details such as performing multiple query rounds, comparing IP sets and TTLs, and mentions caching behavior via the 'force_refresh' parameter. This provides additional context beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences long and front-loads the purpose. It is concise but could be slightly tighter; some repetition of the method occurs. Overall, it efficiently conveys the key information without fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given that an output schema exists, the description does not need to detail return values. It covers the detection technique, parameters are fully described in schema, and annotations provide safety context. It might lack guidance on performance or rate limits, but for a read-only tool it is sufficiently complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All parameters have schema descriptions (100% coverage), so the baseline is 3. The tool description adds no new information about individual parameters; it only contextualizes the overall process. The relationship between 'rounds' and the multiple queries is implicit but not elaborated.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool's purpose: detect fast-flux DNS behavior. It describes the method (multiple rounds of A/AAAA queries, checking IP rotation) and distinguishes from sibling check_* tools by targeting a specific DNS anomaly.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains what the tool does but does not provide explicit guidance on when to use it versus alternatives. There is no mention of prerequisites, scenarios, or sister tools like check_dnssec or check_mx that might be more appropriate for other checks.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_http_securityA

Read-onlyIdempotent

Inspect

Audit a domain's browser-facing HTTP security headers over HTTPS. Inspects Content-Security-Policy (flagging unsafe-inline/unsafe-eval/wildcards), X-Frame-Options, X-Content-Type-Options, Referrer-Policy, Permissions-Policy, and the cross-origin isolation headers (COOP/COEP/CORP), and detects CDN/WAF interception. Returns per-header findings for missing or weak protections against XSS, clickjacking, and cross-origin attacks. Part of the scan_domain audit.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare tool as read-only, idempotent, and non-destructive. Description adds detail: inspects specific headers, returns per-header findings, detects CDN/WAF interception. No contradictions, and it enriches behavioral understanding beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is three sentences, front-loaded with purpose, and every sentence adds value. No redundant or rambling content. Efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (not shown but indicated), the description appropriately omits return value details. It covers all relevant aspects: purpose, scope, specific checks, and integration with broader audit. Complete for an audit tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage for parameter descriptions, so baseline is 3. Description does not add new information about parameters beyond what schema provides, e.g., 'force_refresh' explanation matches schema. No extra semantics added.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description uses specific verb 'Audit' and clearly states it inspects browser-facing HTTP security headers over HTTPS. It lists specific headers and attack types, making the tool's purpose unambiguous. Distinguishes from siblings by focusing on a specific audit aspect.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description notes it is 'Part of the scan_domain audit', giving context for when to use. However, it does not explicitly compare with sibling tools or state when not to use, though the specialization is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_lookalikesA

Read-onlyIdempotent

Inspect

Detect active typosquat and lookalike/homoglyph domains that impersonate your brand and could be used in phishing. Identifies character-substitution and visual-confusion domains registered by attackers. Distinct from check_shadow_domains (TLD variants with auth gaps) and discover_brand_domains (legitimate brand portfolio).

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already convey safe, idempotent read-only behavior. Description adds specifics about detecting character-substitution and visual-confusion domains, and that domains are registered by attackers, offering useful context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences only, front-loaded with purpose and key differentiators. No fluff, every word adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With robust annotations, output schema present, and clear differentiation from siblings, the description is fully complete for agent decision-making.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with good parameter descriptions. The tool description adds no additional parameter meaning, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool detects active typosquat and lookalike/homoglyph domains for brand impersonation, with explicit differentiation from sibling tools check_shadow_domains and discover_brand_domains.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly specifies when to use (detect phishing domains) and contrasts with alternatives, providing agents with clear decision criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_mta_stsA

Read-onlyIdempotent

Inspect

Check whether a domain enforces SMTP TLS for inbound mail via MTA-STS, protecting against downgrade attacks. Queries _mta-sts. and fetches the policy file, reports mode (enforce/testing/none) and MX coverage. Use to verify whether inbound SMTP is protected against TLS downgrade or MITM — distinct from check_dane which uses TLSA pinning. Part of the scan_domain audit.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, idempotent, open-world, and non-destructive behavior. The description adds value by explaining the internal mechanism (queries DNS, fetches policy file) and mentions caching behavior via the 'force_refresh' parameter. This supplements the annotation context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (four sentences) with no redundant information. Each sentence adds value: defining purpose, explaining method, stating usage and alternatives, and contextualizing within the broader audit toolset. It is well-structured and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (not shown but indicated), the description does not need to detail return values. It covers the tool's purpose, methodology, report outputs (mode and MX coverage), and relationship to other tools and workflows (part of scan_domain audit). This is complete for effective tool selection.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters thoroughly. The description provides minimal additional insight into parameter specifics, such as mentioning DNS query for domain and policy fetch, but does not significantly enhance understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool checks MTA-STS enforcement for a domain, using a specific verb ('Check whether') and resource ('domain enforces SMTP TLS for inbound mail via MTA-STS'). It distinguishes itself from the sibling tool 'check_dane' by noting the different mechanism (MTA-STS vs TLSA pinning).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly advises when to use ('Use to verify whether inbound SMTP is protected against TLS downgrade or MITM') and differentiates from one key sibling ('distinct from check_dane'). While it doesn't exhaustively list alternatives, it provides clear context for appropriate usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_mxA

Read-onlyIdempotent

Inspect

Look up MX records for a domain. Identifies which mail servers receive inbound email for the domain and which email hosting provider is used (Google Workspace, Microsoft 365, Proofpoint, etc.). Use when asked which email provider hosts inbound mail for a domain, or to see MX record configuration. Part of the scan_domain audit.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, destructiveHint=false, and idempotentHint=true. The description adds value by specifying that it identifies the email hosting provider (e.g., Google Workspace, Microsoft 365), which goes beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: two sentences that cover purpose, usage, and context without any wasted words. Front-loaded with the primary action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given that an output schema exists (not shown but indicated), the description need not explain return values. It covers purpose, usage, and context thoroughly enough for an AI agent to decide when and how to invoke it.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the JSON schema already describes all three parameters. The description adds no additional parameter semantics beyond what the schema provides, which meets the baseline expectation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it looks up MX records to identify mail servers and the email hosting provider. It distinguishes itself from sibling tools by specifying it's part of the scan_domain audit and by its specific use case.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: 'Use when asked which email provider hosts inbound mail for a domain, or to see MX record configuration.' Also mentions it's part of a larger audit, providing context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_mx_reputationA

Read-onlyIdempotent

Inspect

Check whether the mail server (MX) IP addresses are listed on spam blocklists (Spamhaus, Barracuda, SORBS, and other RBLs). Also verifies reverse DNS for MX hosts. Use when you want to know if your mail server IP is blacklisted, or if your MX is on any blocklist — distinct from check_rbl which checks a specific IP directly.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, destructiveHint=false, idempotentHint=true. Description adds no additional behavioral context beyond verifying reverse DNS, which is implicit. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, first sentence states action and scope, second sentence gives usage guidance and sibling differentiation. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers what it does, when to use, specific blocklists, reverse DNS check, and distinguishes from sibling. Output schema exists so return values are covered.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with good parameter descriptions. The tool description adds no new semantics beyond what is already in the schema, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it checks MX IPs on spam blocklists and verifies reverse DNS. Distinguishes from sibling tool check_rbl by specifying it checks MX for a domain rather than a specific IP.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit use case: 'Use when you want to know if your mail server IP is blacklisted', with alternative tool mention 'distinct from check_rbl'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_nsA

Read-onlyIdempotent

Inspect

Look up NS (nameserver) records for a domain. Identifies the DNS nameserver provider (Cloudflare, Route53, NS1, etc.) and shows delegation and redundancy. Use to find out which authoritative nameserver or DNS hosting service is used for a domain. Part of the scan_domain audit.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint, idempotentHint, and destructiveHint. The description adds context about identifying the provider and showing delegation/redundancy, which goes beyond the annotations and clarifies the tool's output behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loading the core action and then adding context. Every sentence adds value without redundancy or unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists (not shown but indicated) and the annotations cover safety, the description sufficiently explains the tool's purpose and behavior. It mentions the key outputs (provider, delegation, redundancy) and the audit context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description does not need to elaborate on parameters. It adds minimal extra meaning beyond the schema, such as mentioning 'delegation and redundancy', but does not describe parameter details. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Look up' and the resource 'NS (nameserver) records'. It specifies the purpose: to identify the DNS nameserver provider, delegation, and redundancy. This distinguishes it from sibling tools like check_mx or check_dmarc.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description advises 'Use to find out which authoritative nameserver or DNS hosting service is used for a domain.' It lacks explicit when-not-to-use indications or alternatives, but the context of sibling tools makes the usage clear enough.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_nsec_walkabilityA

Read-onlyIdempotent

Inspect

Assess zone walkability risk by analyzing NSEC3PARAM configuration. Detects plain NSEC zones, weak NSEC3 parameters, and opt-out flags.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare the tool as read-only, idempotent, and non-destructive, so the description does not need to repeat that. It adds context about what configurations are analyzed, but does not disclose additional behaviors like caching behavior, network dependencies, or how output is structured. Given the annotation coverage, this is adequate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that is front-loaded with the core purpose and then lists specific detections. It is concise with no redundant words, earning its place efficiently.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a focused check tool with 3 parameters, an output schema, and annotations covering safety, the description is largely complete. It explains the why and what is detected. However, it lacks usage guidelines and does not mention the availability of an output schema or caching behavior, making it slightly less than fully self-contained.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage for all three parameters. The description adds some context by mentioning that NSEC3PARAM configuration is analyzed, which relates to the 'domain' parameter, but it does not elaborate on the 'format' or 'force_refresh' parameters. Since the schema already documents all parameters thoroughly, the description provides minimal added value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: assess zone walkability risk by analyzing NSEC3PARAM configuration. It specifies what it detects (plain NSEC zones, weak NSEC3 parameters, opt-out flags), which is a specific verb-resource combination. Among sibling tools like 'check_dnssec' and 'check_zone_hygiene', this tool is uniquely focused on NSEC walkability, providing clear differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. It does not mention prerequisites, when to prefer it over similar checks (e.g., 'check_dnssec', 'check_zone_hygiene'), or when not to use it. The description only states what it does, leaving usage context implied.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_ptrA

Read-onlyIdempotent

Inspect

Verify forward-confirmed reverse DNS (PTR/FCrDNS) for mail servers. Part of the scan_domain audit.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A3.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnly, openWorld, idempotent, and non-destructive. Description adds only that it verifies PTR and is part of an audit, adding minimal behavioral insight beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, zero waste. Core action is front-loaded. Every word contributes essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Output schema exists, so return values are covered. Description is adequate for a simple DNS check, though it could briefly mention that it performs both forward and reverse lookups. Given sibling tools, context is sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description does not elaborate on parameters (domain, format, force_refresh); schema descriptions already handle that.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it verifies PTR/FCrDNS for mail servers and identifies as part of a larger audit. The verb 'verify' and resource 'forward-confirmed reverse DNS' are specific, and it distinguishes itself from sibling tools that check other DNS records (e.g., DKIM, DMARC).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implies use when needing PTR verification, but no explicit statement of when to use vs. alternatives like check_mx or check_dmarc. No exclusions or conditions provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_rblA

Read-onlyIdempotent

Inspect

Check MX server IP reputation against 7 DNS-based Real-time Blocklists (SpamCop, UCEProtect, Mailspike, Barracuda, PSBL, SORBS). Resolves MX hosts to IPs first.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds that the tool resolves MX hosts to IPs, checks against 7 RBLs, and may use cache (with force_refresh for fresh checks). This context goes beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first states the core action and specific RBLs, second explains the resolution step. No unnecessary words, front-loaded with key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With an output schema present, the description need not cover return values. It explains the RBL coverage, resolution, and caching behavior. Minor gap: no mention of what happens if domain has no MX records or if all RBL checks pass.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all 3 parameters (domain, format, force_refresh). The description adds context about MX resolution and RBL type but does not elaborate beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool checks MX server IP reputation against 7 specific DNS-based RBLs (SpamCop, UCEProtect, etc.) and resolves MX hosts to IPs first. This distinguishes it from sibling tools like check_mx (MX records) or check_dbl (domain blocklist).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not explicitly state when to use this tool versus alternatives like check_mx_reputation. Usage context is implied by the specific RBLs listed, but no when-not or alternative guidance is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_realtime_threat_feedA

Read-onlyIdempotent

Inspect

Check a domain against BlackVeil real-time threat intelligence (curated intel-gateway feed). Distinct from DNSBL checks. Operator-deploy only; degrades to info when unprovisioned.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds significant behavioral context beyond the annotations: deployment constraint ('Operator-deploy only'), degradation behavior ('degrades to info when unprovisioned'), and source specificity ('curated intel-gateway feed'). Annotations already indicate read-only, idempotent, and non-destructive, and the description complements these without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with the core purpose. Every word is informative; no redundancy or fluff. Highly concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers purpose, distinction, deployment constraints, and fallback behavior. With a full output schema (not shown but present) and rich annotations, the description is nearly complete. It could marginally benefit from clarifying when to use this vs. other threat feed tools, but given the sibling list context, it's sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The description does not add additional meaning beyond what the schema provides for any parameter (domain, format, force_refresh). No parameter-specific elaboration is given.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Check a domain against BlackVeil real-time threat intelligence (curated intel-gateway feed).' It specifies the action (checking a domain) and the resource (intel-gateway feed), and explicitly distinguishes itself from DNSBL checks, differentiating it from similar sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides usage context: 'Operator-deploy only; degrades to info when unprovisioned.' It also mentions 'Distinct from DNSBL checks,' offering some differentiation. However, it does not explicitly state when to use this tool vs. other threat-checking siblings or provide comprehensive alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_resolver_consistencyA

Read-onlyIdempotent

Inspect

Check DNS consistency across 4 public resolvers.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`record_type`	No	Record type. Omit for A/AAAA/MX/TXT/NS.

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, openWorldHint, idempotentHint, and destructiveHint. The description adds no additional behavioral context (e.g., what constitutes inconsistency, which specific resolvers are used). With good annotation coverage, a score of 3 is appropriate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence that conveys the core purpose without waste. Every word is necessary and the structure is optimal for a simple tool.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has 3 parameters and no output schema, yet the description is very brief. It does not explain what 'consistency' means, what the output looks like, or which resolvers are used. This leaves significant gaps for an agent to interpret results or set expectations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% and each parameter is clearly described in the schema. The tool description adds no further parameter semantics (e.g., does not restate or elaborate). Baseline score of 3 as per guidelines.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states exactly what the tool does: 'Check DNS consistency across 4 public resolvers.' It uses a specific verb ('check') and resource ('DNS consistency'), and distinguishes it from many other check_* tools by focusing on resolver-level consistency.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for checking DNS consistency across resolvers but provides no explicit guidance on when to use it versus alternatives, nor any exclusions or prerequisites. Among many DNS-related siblings, more direction would help.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_root_server_setA

Read-onlyIdempotent

Inspect

Check the DNS root server set against official root hints, root glue, delegation, serial, and DNSKEY cross-root evidence.

ParametersJSON Schema

Name	Required	Description	Default
`format`	No	Output verbosity. Auto-detected if omitted.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, idempotentHint, and destructiveHint, so the safety profile is clear. The description adds the specific evidence types checked, which is useful context but does not disclose additional behavioral traits (e.g., rate limits, auth needs).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence of 20 words, front-loaded with the action and resource. It contains no fluff and is highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has an output schema, one fully described parameter, and annotations covering safety. The description enumerates the types of cross-root evidence checked, making the function clear. However, it omits any prerequisites or context (e.g., how the root server set is obtained), but given the low complexity, this is minor.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description's lack of parameter info is acceptable per rules. The schema alone defines the 'format' parameter with enum and description. The description adds no extra meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool's purpose with a specific verb 'Check' and resource 'DNS root server set', and lists the types of evidence checked (root hints, root glue, delegation, serial, DNSKEY cross-root). This distinguishes it from other check_* tools that focus on different DNS aspects.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not explicitly state when to use this tool over alternatives. While the specific check scope implies context, no when-not or alternative comparisons are provided. Given many sibling check tools, some guidance would improve usability.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_shadow_domainsA

Read-onlyIdempotent

Inspect

Find alternate TLD variants of a domain (e.g. example.net, example.co) that have weak or missing email authentication and could be used to spoof email. Use when asked about TLD variants with email auth gaps — distinct from check_lookalikes which detects typosquat/homoglyph impersonation domains.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, destructiveHint=false, and idempotentHint=true, so the description's incremental contribution is limited. It adds context about what the results signify (weak email auth, spoofing potential). No annotation contradiction is present.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, consisting of two well-structured sentences. It front-loads the core action and provides a usage hint in the second sentence. Every word adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (3 parameters, output schema present, good annotations), the description is complete. It specifies the domain of applicability, differentiates from a sibling, and notes the security context. An agent has enough information to select and invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so each parameter has a description in the schema. The tool description does not add additional semantics beyond the schema baseline. A score of 3 is appropriate as the description doesn't need to compensate for missing schema details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Find alternate TLD variants of a domain ... that have weak or missing email authentication'. It specifies the resource (domain), action (find variants), and condition (weak auth). It also explicitly distinguishes itself from the sibling tool check_lookalikes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidance: 'Use when asked about TLD variants with email auth gaps'. It also contrasts with check_lookalikes, stating that tool handles typosquat/homoglyph impersonation, which helps the agent decide which tool to invoke.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_spfA

Read-onlyIdempotent

Inspect

Look up and validate the SPF record for a domain. Lists all IP addresses and third-party senders authorised to send email on behalf of the domain, flags syntax errors, and shows the trust surface (which mail servers are whitelisted). Use when you need to know who is permitted to send email as a domain. Part of the scan_domain audit.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds context beyond annotations by detailing what the tool does (list IPs, flag errors, show trust surface) and its caching behavior (implied by force_refresh parameter). Annotations already indicate read-only and idempotent, so the description complements them well.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three focused sentences, front-loaded with the main action, followed by outputs and usage. Every sentence provides value without redundancy or filler.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (3 parameters, output schema exists), the description covers purpose, key outputs, usage scenario, and relationship to scan_domain. It is comprehensive enough for an agent to decide when and how to invoke it.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the parameters are fully documented in the schema. The description does not add extra meaning to the parameters (domain, format, force_refresh) beyond what the schema already provides. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('look up and validate'), the resource ('SPF record for a domain'), and the specific outputs (IPs, third-party senders, syntax errors, trust surface). It effectively differentiates from related DNS check siblings like check_dkim or check_dmarc by focusing on SPF.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use when you need to know who is permitted to send email as a domain', providing a clear use case. However, it does not mention when not to use or compare with the sibling tool 'resolve_spf_chain', which could be a related alternative.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_srvA

Read-onlyIdempotent

Inspect

Map a domain's DNS-visible service footprint by probing ~16 common SRV record prefixes (email, calendar, messaging, web, directory) in parallel. Returns discovered services and flags insecure service advertisements — e.g. plaintext IMAP/POP3 without an encrypted variant. Use when asked to map DNS-visible services or flag insecure service advertisements.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, openWorldHint, idempotentHint, and destructiveHint as true/true/true/false. The description adds valuable behavioral context beyond the annotations: it mentions parallel probing of ~16 prefixes and the flagging of insecure service advertisements. This enhances understanding without contradicting the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: two sentences. The first sentence states the tool's primary function and key details, and the second provides usage guidance. There is no redundant information, and every word adds value. Front-loaded with the most critical information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (3 parameters fully documented in schema, annotations covering safety and idempotency, and an output schema existing), the description completes the picture by explaining the parallel probing and security flagging behavior. No gaps are apparent for an agent to correctly invoke the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, with all three parameters (domain, format, force_refresh) having descriptions in the input schema. The tool's description does not add additional meaning beyond what the schema already provides, so the baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool maps a domain's DNS-visible service footprint by probing ~16 common SRV record prefixes in parallel, covering email, calendar, messaging, web, and directory. It explicitly distinguishes from sibling check_ tools by focusing on SRV records and flagging insecure advertisements, which other tools do not do.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description ends with 'Use when asked to map DNS-visible services or flag insecure service advertisements,' providing explicit context for when to invoke the tool. While it doesn't list when not to use it or alternatives, the sibling tool list implies that for specific record checks (e.g., DMARC), other tools should be used. This is clear but not as comprehensive as it could be.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_sslA

Read-onlyIdempotent

Inspect

Check the SSL/TLS certificate for a domain. Shows the issuer (Certificate Authority), expiry date (when the certificate expires), supported protocol versions (TLS 1.2/1.3), and HTTPS configuration. Use to verify certificate validity and who issued it. Part of the scan_domain audit.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnly, openWorld, idempotent, non-destructive. Description adds value by listing specific output fields (issuer, expiry, protocols, HTTPS config) beyond what annotations provide. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences: first states purpose and outputs, second gives usage, third places it in context. Front-loaded with the key action, no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple read-only tool with annotations and output schema, the description is fully adequate. It covers what, why, and where it fits into the broader workflow.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All parameters are described in the schema (100% coverage) with meaningful descriptions (domain example, format enum with auto-detection, force_refresh explanation). The description adds little beyond the schema, so a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Check the SSL/TLS certificate for a domain' with specific outputs (issuer, expiry, protocols, HTTPS config). It distinguishes from siblings like check_dmarc or check_dnssec which cover different aspects of domain security.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'Use to verify certificate validity and who issued it' and mentions it's part of scan_domain audit. However, it doesn't contrast with similar tools like check_http_security or describe when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_subdomailingA

Read-onlyIdempotent

Inspect

Detect SubdoMailing risk: analyzes the SPF include chain for dangling or hijackable subdomains that could let an attacker send email as the domain. Use when you want to know if an SPF include chain can be hijacked through a dangling domain, or to detect subdomain mailing risk hidden in SPF includes. Part of the scan_domain audit.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, destructiveHint, idempotentHint, so the safety profile is covered. The description adds that it analyzes SPF include chains and detects risks, but does not disclose additional behavioral traits like caching behavior or permission requirements beyond what annotations imply. It adequately supplements the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three concise sentences. It front-loads the core purpose, then provides usage context, and finally notes it is part of a larger audit. Every sentence adds value with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (not shown but noted), the description adequately covers the tool's function and usage. It could briefly mention the output type (e.g., risk report) but the schema likely handles that. The description is complete for a specialized detection tool with good annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with detailed parameter descriptions for domain, format, and force_refresh. The tool description adds no new information about parameters beyond what the schema already provides. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool detects SubdoMailing risk by analyzing SPF include chains for dangling or hijackable subdomains. It differentiates from siblings like check_spf (general SPF check) and check_subdomain_takeover (general subdomain takeover) by focusing specifically on SPF include chain hijacking.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly tells when to use the tool: 'when you want to know if an SPF include chain can be hijacked through a dangling domain, or to detect subdomain mailing risk hidden in SPF includes.' It does not mention alternatives or when not to use, but the usage context is clear and actionable.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_subdomain_takeoverA

Read-onlyIdempotent

Inspect

Sweep subdomains for dangling CNAMEs pointing to deprovisioned cloud services that could be claimed by an attacker (subdomain takeover vulnerabilities). Detects 16 provider families (AWS S3/CloudFront, Azure Front Door/CDN/Blob/App Service, GCP Cloud Storage, Heroku, GitHub Pages, Vercel, Firebase, Shopify, etc.). Use when asked if subdomains are pointing to deprovisioned cloud services. Pair with discover_subdomains for full inventory.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com).
`format`	No	Output verbosity. Auto-detected if omitted.
`subdomains`	No	Optional explicit subdomain list (full FQDNs or short labels). When provided (deduped, capped at 1000), this list is swept instead of the 15-name built-in. Source from Certificate-Transparency enumeration or brand-audit discovery.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.8/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, idempotentHint, and non-destructive. The description adds that it sweeps subdomains, detects 16 provider families, supports explicit subdomain lists (capped at 1000), and includes a force_refresh option to bypass cache. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three efficient sentences: main purpose, provider families, and usage guidance. No wasted words, front-loaded with key action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With an output schema present, the description need not cover return values. It covers all aspects: purpose, input parameters (including optional vs built-in), caching behavior, and integration with discover_subdomains. Complete for a scanning tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and the description adds valuable context: subdomains parameter details (deduped, capped at 1000, source suggestions), format auto-detection, and force_refresh use case. This goes beyond the schema's basic descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: sweeping subdomains for dangling CNAMEs indicating subdomain takeover vulnerabilities. It specifies detection of 16 provider families, making it distinct from sibling tools like check_http_security or check_ssl.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description says 'Use when asked if subdomains are pointing to deprovisioned cloud services' and suggests pairing with discover_subdomains for full inventory. While it provides clear context, it does not explicitly state when not to use this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_svcb_httpsA

Read-onlyIdempotent

Inspect

Validate HTTPS/SVCB records (RFC 9460) for modern transport capability advertisement. Part of the scan_domain audit.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare the tool as read-only, idempotent, open-world, and non-destructive. The description adds context about the validation purpose and its role in an audit, which is consistent and provides behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: one sentence specifying the function and one sentence placing it in the audit workflow. It is front-loaded with the verb and resource, and every part adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema, the description appropriately does not detail return values. It covers purpose and context well. However, among many similar checks, it could briefly mention when this particular check is relevant (e.g., after DNS changes or for HTTPS compliance).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions for all three parameters (domain, format, force_refresh). The tool's description does not add additional parameter information beyond what the schema already provides, so it meets the baseline of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it validates HTTPS/SVCB records per RFC 9460 for modern transport capability advertisement, and notes it's part of scan_domain audit. This distinguishes it from sibling check_* tools by specifying the exact record type and standard.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description only says 'Part of the scan_domain audit', which gives minimal context. It does not explicitly state when to use this tool vs alternatives like check_dane_https, nor does it mention prerequisites or when not to use it. The force_refresh parameter description offers a slight hint but is part of schema.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_tlsrptA

Read-onlyIdempotent

Inspect

Check whether a domain has SMTP TLS Reporting (TLS-RPT) configured. Queries _smtp._tls. for the v=TLSRPTv1 record and validates its reporting destination (rua= mailto:/https:), flagging a missing record, duplicate records, or an invalid/absent reporting URI. Complements MTA-STS by giving visibility into TLS delivery failures. Part of the scan_domain audit.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, idempotent, non-destructive. Description adds details on DNS query and validation criteria, providing additional transparency beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences, front-loaded with purpose, no unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers what the tool does, DNS query, and validation logic. Output schema presumably covers return values, so description is sufficiently complete for a simple DNS check.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with good param descriptions. The tool description adds overall context but does not elaborate on individual parameters beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool checks TLS-RPT configuration, specifies DNS record queried, and validation logic. It distinguishes from sibling MTA-STS by noting it complements that check.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description implies usage context (complementing MTA-STS, part of scan_domain audit) but does not explicitly state when not to use or list alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_txt_hygieneA

Read-onlyIdempotent

Inspect

Audit TXT records for stale entries and SaaS exposure.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, so the tool is safe. The description adds behavioral context by specifying the audit focuses (stale entries and SaaS exposure), which goes beyond raw record retrieval. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single 9-word sentence, highly concise and front-loaded. Every word contributes to understanding the tool's purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and high schema coverage, the description is adequate. It could briefly explain what constitutes 'stale entries' or 'SaaS exposure' for better context, but the tool is simple enough to be understood.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% (all three parameters have descriptions). The tool description does not add extra meaning beyond the schema, so it meets the baseline of 3 without enhancement.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool audits TXT records, specifically for stale entries and SaaS exposure. It uses a specific verb ('Audit') and resource ('TXT records'), distinguishing it from sibling tools like check_spf or check_dmarc.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for TXT record hygiene checks, but does not explicitly compare to alternative sibling tools or provide when-not-to-use guidance. Given the unique focus, it is clear but lacks explicit exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_zone_hygieneA

Read-onlyIdempotent

Inspect

Audit DNS zone hygiene: identifies sensitive or forgotten subdomains exposed in DNS, stale SOA records, and zone propagation issues. Use to find any sensitive subdomains that should not be publicly visible, or to audit overall DNS zone cleanliness.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false, so the agent knows it's safe. The description adds value by detailing what the audit identifies (subdomains, SOA records, propagation issues), providing behavioral context beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the core purpose, and every sentence adds value. No extraneous words or repetition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and comprehensive annotations, the description adequately covers the tool's purpose and use cases. It could mention the output format briefly, but the output schema handles that responsibility.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all three parameters. The tool description does not add extra meaning beyond the schema, so a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'audit' and the resource 'DNS zone hygiene', and lists specific items it identifies (sensitive subdomains, stale SOA records, propagation issues). It distinguishes itself from sibling tools like check_dmarc or check_spf by focusing on broad zone cleanliness rather than individual record types.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides two explicit use cases: finding sensitive subdomains and auditing overall zone cleanliness. While it does not explicitly mention when not to use or alternatives, the context is clear and sufficient for most scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

compare_baselineA

Read-onlyIdempotent

Inspect

Compare a domain's current security configuration against a fixed policy baseline to determine compliance. Use to check whether a domain meets a policy requirement — not for tracking improvement/regression over time (use analyze_drift) and not for comparing multiple domains (use compare_domains).

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to scan and compare.
`format`	No	Output verbosity. Auto-detected if omitted.
`baseline`	Yes	Policy/requirements baseline OBJECT for compliance enforcement — "does this domain meet these required controls?" (grade/score floors, require_* flags, max_*_findings). NOT a prior scan. For drift-over-time vs a previous ScanScore (or the literal "cached"), use analyze_drift instead.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, establishing a safe, non-destructive profile. The description adds context by specifying the baseline is a 'fixed policy' and not a prior scan, clarifying what the tool does not do. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the main purpose, and every word earns its place. No unnecessary elaboration or repetition of schema content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description fully covers purpose, usage, and parameter semantics. However, there is no mention of the output format (e.g., compliance status). Given the tool's simplicity and good annotations, this is a minor gap, but completeness is otherwise high.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed parameter descriptions. The tool description adds value by emphasizing the 'fixed policy baseline' and the baseline parameter description clarifies it is 'NOT a prior scan' and distinguishes from analyze_drift. This adds semantic meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states a specific verb ('Compare'), resource ('domain's current security configuration against a fixed policy baseline'), and purpose ('to determine compliance'). It explicitly distinguishes from sibling tools: 'not for tracking improvement/regression over time (use analyze_drift) and not for comparing multiple domains (use compare_domains).'

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Clearly indicates when to use ('check whether a domain meets a policy requirement') and explicitly states when not to use, with named alternatives ('use analyze_drift' for drift, 'use compare_domains' for multi-domain comparison).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

compare_domainsA

Read-onlyIdempotent

Inspect

Side-by-side security comparison of 2–5 domains. Shows relative scores, category gaps, and unique weaknesses for each domain. Use when comparing your security posture against a competitor, or doing a head-to-head comparison between multiple domains.

ParametersJSON Schema

Name	Required	Description
`format`	No	Output verbosity. Auto-detected if omitted.
`domains`	Yes	Domains to compare (2–5 domains)
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint=false. The description adds value by detailing what the comparison shows (relative scores, category gaps, unique weaknesses), providing clarity beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no wasted words. First sentence defines purpose and outputs, second sentence gives usage context. Efficient and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 3 parameters, no output schema, and clear annotations, the description sufficiently sets expectations about outputs and usage. Could be improved by mentioning the caching behavior related to force_refresh, but not required.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and each parameter is well-documented in the schema. The description mentions the domain count range but does not add significant new parameter semantics beyond the schema. Baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Side-by-side security comparison of 2–5 domains' with specific outputs like relative scores, category gaps, and unique weaknesses. It distinguishes from siblings like compare_baseline by focusing on security posture comparison.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit use cases: 'Use when comparing your security posture against a competitor, or doing a head-to-head comparison between multiple domains.' Does not explicitly mention when not to use, but context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

cymru_asnA

Read-onlyIdempotent

Inspect

Map domain IPs to Autonomous System Numbers via Team Cymru DNS. Returns ASN, prefix, country, registry, and organization for each IP. Flags high-risk hosting ASNs.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already cover read-only, idempotent, and non-destructive behaviors. The description adds value by noting the tool flags high-risk ASNs and uses Team Cymru DNS, implying external network calls. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences cover action, outputs, and a key differentiator. No unnecessary words; every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of output schema and comprehensive annotations, the description is largely complete. It describes the core functionality and outputs. Minor gaps include lack of mention of error conditions or domain resolvability prerequisites.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description does not elaborate on the 'format' or 'force_refresh' parameters beyond the schema's own descriptions. Minimal added meaning for parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool maps domain IPs to ASNs via Team Cymru DNS, listing specific return fields and a distinguishing feature (flagging high-risk ASNs). It differentiates from sibling tools by specifying the data source and output focus.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when ASN information is needed for a domain's IPs, but does not explicitly state when not to use or name alternatives among sibling tools (e.g., rdap_lookup). It provides clear context but lacks exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_brand_audit_watchA

DestructiveIdempotent

Inspect

Permanently removes a recurring brand-audit watch by watchId. Owner-scoped — a watchId owned by another principal surfaces as notFound. Returns confirmation of deletion.

ParametersJSON Schema

Name	Required	Description	Default
`watchId`	Yes	Watch ID returned by register_brand_audit_watch.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses permanent removal (aligning with destructiveHint=true) and return type (confirmation). Adds ownership scoping behavior beyond annotations, enhancing transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no redundant information, front-loaded with key action and immediate consequence.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given one parameter with 100% schema coverage, annotations present, and output schema, the description adequately covers usage, scope, and return value without gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The single parameter watchId is fully described in the schema. The description does not add additional syntax or semantics beyond referencing the source of the ID, so meets baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (permanently removes) and the resource (recurring brand-audit watch), distinguishing it from sibling tools like register_brand_audit_watch and list_brand_audit_watches.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly mentions owner-scoping and the notFound behavior for watches owned by others, providing clear context on when to use. Lacks explicit when-not-to-use but scope is well-defined.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

discover_brand_domainsA

Read-onlyIdempotent

Inspect

Discover all domains that belong to a brand's portfolio by aggregating certificate, DNS, redirect, and mail-policy signals. Use when asked what domains are part of a brand portfolio, or to find all domains related to a brand. Pass the EXACT seed domain verbatim — do NOT normalize or substitute a canonical domain.

ParametersJSON Schema

Name	Required	Description	Default
`depth`	No	Discovery depth. standard is default; deep expands candidate seeding and enrichment fanout.
`domain`	Yes	The exact seed domain to expand, scanned verbatim (e.g., example.com). Do NOT normalize, resolve, or substitute a brand's canonical/main domain — pass the literal domain the user named (e.g. pass `clau.de`, not `anthropic.com`). Use `brand_aliases` for related brand labels.
`format`	No	Output verbosity. Auto-detected if omitted.
`signals`	No	Signal modules to invoke. Defaults to all 12 discovery/enrichment signals.
`planner_mode`	No	Planner mode for staged discovery fanout. observe emits metrics; enforce applies candidate-backed signal caps.
`brand_aliases`	No	Optional public brand aliases to seed, such as product or legal-entity labels.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.
`discovery_mode`	Yes	Discovery mode. "classic" (default, BSL-licensed) runs the public signal-sweep pipeline. "tiered" layers Tier 0 (tenant-declared portfolio), Tier 1 (infrastructure-graph), and Tier 2 (declared-evidence) lookups in front of the legacy sweep, falling back to Tier 3 (the existing sweep) only on cache miss / very_stale fingerprint / uncovered caller candidates. Tiered mode requires private BlackVeil service bindings — BSL self-hosts should leave this on "classic".	classic
`dkim_selectors`	No	Optional DKIM selectors to probe. Defaults to a built-in common-selector list.
`min_confidence`	No	Drop candidates whose combined confidence falls below this threshold (0-1, default 0.5).
`candidate_domains`	No	Optional candidate domains supplied by the caller for corroboration.
`ownership_verified`	No	Caller attests that the seed domain is owned or authorized for scanning. Required when discovery_mode is "tiered" and the caller is not an enterprise/owner/partner principal. Prevents unauthorized mass reconnaissance via deep tier lookups.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and openWorldHint. The description adds behavioral context by explaining the aggregation of certificate, DNS, redirect, and mail-policy signals. It also mentions cache bypass via force_refresh. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise: two sentences plus a key instruction. It front-loads the purpose and avoids any fluff. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (12 parameters, 2 required) and the existence of an output schema, the description adequately covers core use cases and key behavioral notes. It does not explain every parameter in detail, but that is handled by the schema. The description is complete enough for an AI agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so all parameters are documented. The description adds extra guidance on the 'domain' parameter (pass verbatim, no normalization) and briefly explains 'discovery_mode'. This adds marginal value beyond the schema, justifying a baseline of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it discovers domains belonging to a brand's portfolio using multiple signals. It provides a specific verb and resource and distinguishes this tool from siblings like 'discover_subdomains' by focusing on brand-wide portfolio discovery.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: 'when asked what domains are part of a brand portfolio, or to find all domains related to a brand.' Also gives a critical usage instruction: pass the exact seed domain verbatim. It lacks explicit when-not-to-use guidance, but the provided context is strong.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

discover_brand_domains_findingsA

Read-onlyIdempotent

Inspect

Fetch the ranked candidate domains (the discovery CheckResult) for an async run started with discover_brand_domains_start. Returns notReady while the discovery is still in-flight; the discovery result once complete. Owner-scoped.

ParametersJSON Schema

Name	Required	Description	Default
`operationId`	Yes	Operation ID returned by discover_brand_domains_start.

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate safe read-only, idempotent, non-destructive behavior. The description adds key behavioral detail: returns 'notReady' while discovery is in-flight, transitioning to the result upon completion. This context is valuable beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with two sentences that convey the purpose, async behavior, and scope. Every sentence is essential and front-loaded with the main action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with one parameter and no output schema, the description adequately covers the core behavior: fetching results, handling in-flight state, and ownership scope. It does not detail the result structure, but given the lack of output schema and the tool's simplicity, this is acceptable.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The sole parameter 'operationId' has a schema description stating it comes from discover_brand_domains_start. The tool description reiterates this origin, but does not add new semantic meaning beyond the schema. With 100% schema coverage, a baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it fetches the ranked candidate domains (CheckResult) from an async discovery started by discover_brand_domains_start. It distinguishes itself from sibling tools like discover_brand_domains_start and discover_brand_domains_status by specifying it retrieves results, not starts or checks status.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use it: after starting an async discovery with discover_brand_domains_start. It notes the 'notReady' response while in-flight and the result once complete, guiding the agent to poll until ready. While it doesn't explicitly mention alternatives, the async workflow is implied.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

discover_brand_domains_startAInspect

Start an async brand-domain discovery for the EXACT seed domain provided (the async sibling of discover_brand_domains, which can run ~24s and time out interactive clients). Same args as discover_brand_domains. Returns { auditId, queuedAt, etaSeconds } immediately; poll with discover_brand_domains_status and fetch ranked candidates with discover_brand_domains_findings once complete.

ParametersJSON Schema

Name	Required	Description	Default
`depth`	No	Discovery depth. standard is default; deep expands candidate seeding and enrichment fanout.
`domain`	Yes	The exact seed domain to expand, scanned verbatim (e.g., example.com). Do NOT normalize, resolve, or substitute a brand's canonical/main domain — pass the literal domain the user named (e.g. pass `clau.de`, not `anthropic.com`). Use `brand_aliases` for related brand labels.
`format`	No	Output verbosity. Auto-detected if omitted.
`signals`	No	Signal modules to invoke. Defaults to all 12 discovery/enrichment signals.
`planner_mode`	No	Planner mode for staged discovery fanout. observe emits metrics; enforce applies candidate-backed signal caps.
`brand_aliases`	No	Optional public brand aliases to seed, such as product or legal-entity labels.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.
`discovery_mode`	Yes	Discovery mode. "classic" (default, BSL-licensed) runs the public signal-sweep pipeline. "tiered" layers Tier 0 (tenant-declared portfolio), Tier 1 (infrastructure-graph), and Tier 2 (declared-evidence) lookups in front of the legacy sweep, falling back to Tier 3 (the existing sweep) only on cache miss / very_stale fingerprint / uncovered caller candidates. Tiered mode requires private BlackVeil service bindings — BSL self-hosts should leave this on "classic".	classic
`dkim_selectors`	No	Optional DKIM selectors to probe. Defaults to a built-in common-selector list.
`min_confidence`	No	Drop candidates whose combined confidence falls below this threshold (0-1, default 0.5).
`candidate_domains`	No	Optional candidate domains supplied by the caller for corroboration.
`ownership_verified`	No	Caller attests that the seed domain is owned or authorized for scanning. Required when discovery_mode is "tiered" and the caller is not an enterprise/owner/partner principal. Prevents unauthorized mass reconnaissance via deep tier lookups.

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses the async behavior, immediate return with { auditId, queuedAt, etaSeconds }, and the polling pattern. Annotations already show openWorldHint=true, so side effects are implied. The description adds value by detailing the job lifecycle and return format, beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences efficiently convey purpose, async nature, and workflow. The first sentence front-loads the key differentiator. Minor improvement could be including the return shape earlier, but it's well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 12 parameters and no output schema, the description explains the async workflow, return fields, and how to poll for results. It references sibling tools for status and findings. Missing error handling or failure cases, but overall complete for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, with all 12 parameters described. The description does not add new parameter information beyond stating 'Same args as discover_brand_domains'. Since the schema already handles parameter meaning, the description adds marginal value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it starts an async brand-domain discovery for the exact seed domain, distinguishing it from the synchronous sibling 'discover_brand_domains'. It specifies the action (start async discovery) and the resource (brand-domain for seed domain).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description positions this tool as the async alternative to 'discover_brand_domains' which can timeout interactive clients due to ~24s runtime. It tells the agent to use this for long-running tasks and to poll with status and findings tools. However, it does not explicitly list when not to use it (e.g., for immediate results).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

discover_brand_domains_statusA

Read-onlyIdempotent

Inspect

Poll the status of an async brand-domain discovery started with discover_brand_domains_start. Returns status (queued | running | completed | failed) and progress. Owner-scoped — operationIds owned by other principals surface as notFound.

ParametersJSON Schema

Name	Required	Description	Default
`operationId`	Yes	Operation ID returned by discover_brand_domains_start.

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint=false. The description adds behavioral details beyond annotations: the specific status values (queued, running, completed, failed), progress indication, and owner-scoped notFound behavior. This is meaningful additional context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the action verb 'Poll', and contains no redundant or extraneous information. Every sentence adds essential details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple polling tool with one parameter and no output schema, the description covers the key aspects: purpose, status values, progress, and scoping. It could elaborate on the progress format, but overall it is sufficiently complete for an agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage for the single parameter operationId, which is adequately described. The tool description reinforces the parameter's origin ('started with discover_brand_domains_start') and adds context about owner-scoping, providing value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('poll') and resource ('status of an async brand-domain discovery'), clearly linking to the sibling start tool. It distinguishes itself from other status tools by specifying the async discovery context and owner-scoping behavior.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states the tool should be used after discover_brand_domains_start and explains that operationIds from other principals yield notFound, providing clear context. However, it does not explicitly mention when not to use it or list alternatives, though the linkage to the start tool suffices.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

discover_subdomainsA

Read-onlyIdempotent

Inspect

Find subdomains of a domain using Certificate Transparency logs. Reveals shadow IT, forgotten services, and unauthorized certificate issuance.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false. The description adds valuable context: it uses CT logs, reveals shadow IT, and is a non-destructive discovery action. This goes beyond the annotations by explaining the data source and the types of findings.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two short, front-loaded sentences with no wasted words. The first sentence states the action and method, the second adds context. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a discovery tool, the description is sufficiently complete given the annotations. It explains the purpose and value, but does not detail the return format or structure. However, as there is no output schema, a brief note about output could improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with adequate descriptions for all three parameters. The description does not add additional parameter-specific semantics, but the baseline is 3 when schema covers all parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool finds subdomains using Certificate Transparency logs, with a specific verb ('Find') and resource ('subdomains of a domain'). It explains the value (revealing shadow IT, forgotten services, unauthorized certificates), distinguishing it from siblings that check specific security controls.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for discovering subdomains but does not explicitly state when to use this tool versus alternatives like 'check_subdomailing' or 'check_subdomain_takeover'. No when-not-to-use guidance is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

explain_findingB

Read-onlyIdempotent

Inspect

Explain a finding with impact and remediation.

ParametersJSON Schema

Name	Required	Description
`format`	No	Output verbosity. Auto-detected if omitted.
`status`	Yes	Finding severity or status.
`details`	No	Additional detail from check result.
`checkType`	Yes	Check type (e.g., 'SPF', 'DMARC').

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, etc. The description adds 'with impact and remediation' hinting at output content, but this is minimal extra behavioral context beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

One sentence, immediately clear, no wasted words. Perfectly concise and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description is extremely brief for a tool with 4 parameters (2 required), no output schema, and many sibling tools that likely produce findings. It fails to explain output format, valid input combinations, or how to construct a good explanation request.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all parameters. The tool description does not add any additional meaning beyond what the schema provides, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Explain' and the resource 'a finding' with additional context 'with impact and remediation'. It distinguishes from sibling check tools that produce findings rather than explaining them.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The siblings include many check tools that generate findings, but the description does not indicate that this tool consumes those findings or any prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

generateA

Read-onlyIdempotent

Inspect

Generate a DNS/email security remediation artifact. Artifact types: spf_record (build a new SPF record), dmarc_record (create a DMARC policy), dkim_config (DKIM key setup), mta_sts_policy (generate an MTA-STS policy file), fix_plan (prioritized remediation plan for all findings), or rollout_plan (phased DMARC enforcement timeline). Use when asked to generate or create a record or policy.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`policy`	No	dmarc_record: policy (default "reject").
`artifact`	Yes	Which artifact to generate (e.g., "dmarc_record", "fix_plan").
`mx_hosts`	No	mta_sts_policy: MX hosts. Omit to detect from DNS.
`provider`	No	dkim_config: provider (e.g., "google"). Omit for generic.
`timeline`	No	rollout_plan: rollout speed (default: standard).
`rua_email`	No	dmarc_record: report email. Default: dmarc-reports@{domain}.
`force_refresh`	No	fix_plan: bypass cache and run a fresh scan.
`target_policy`	No	rollout_plan: target DMARC policy (default: reject).
`include_providers`	No	spf_record: providers to include (e.g., ["google"]).

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, idempotentHint, and destructiveHint=false, indicating safe behavior. The description adds context about artifact types and scope but does not disclose additional behavioral traits like caching or error handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with no wasted words. All essential information is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 11 parameters and no output schema, the description could mention the return format or what the generated artifact looks like. However, annotations and schema cover safety and structure, so it is minimally complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the input schema fully documents parameter semantics. The description repeats the artifact enum but adds no new details beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the verb 'Generate' and the resource 'DNS/email security remediation artifact', listing six specific artifact types. No sibling tool has a similar purpose, so it is well-differentiated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use ('when asked to generate or create a record or policy'), which is helpful. Could be improved by noting when not to use or suggesting alternatives like validate_fix, but it is clear enough.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_benchmarkA

Read-onlyIdempotent

Inspect

Get industry benchmark data: shows what percentile a domain's security score ranks at within its sector or country cohort, the mean score, and the most common DNS security failures across the industry. Use when asked how a score compares to the industry average, what percentile a score is in, or what the most common security failures are in an industry or sector.

ParametersJSON Schema

Name	Required	Description	Default
`format`	No	Output verbosity. Auto-detected if omitted.
`profile`	No	Profile to benchmark (default "mail_enabled").

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, idempotent, non-destructive. Description adds specifics about what data is returned (percentile, mean, failures), which is valuable context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with purpose, no extraneous text. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but description explains return values well. However, it does not explicitly mention that it operates on the current domain (implied from context), which is a minor gap for completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with enum descriptions including auto-detection for format. Description does not add new parameter information beyond the schema, meeting the baseline for high coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool fetches industry benchmark data including percentile, mean score, and common DNS failures. It distinguishes from sibling tools which are mostly check/scan tools, making this unique.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit use cases are given: comparing to industry average, percentile, common failures. No alternative tools are mentioned, but the sibling list contains no similar tool, so the guidance is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_ca_policiesA

Read-onlyIdempotent

Inspect

Retrieve Conditional Access policies for a Microsoft Entra tenant. Requires m365Proxy service binding; returns { unprovisioned: true } when absent. A representative: true field in the response marks sample (non-live) data until live Graph reads land.

ParametersJSON Schema

Name	Required	Description	Default
`ms_tenant_id`	Yes	Microsoft Entra tenant ID (GUID or domain).

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate read-only, idempotent, non-destructive. Description adds meaningful behavioral details beyond annotations: explains the unprovisioned response and representative flag for sample data.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with no waste. Front-loads the core purpose, then adds key behavioral notes.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but description explains return values (unprovisioned and representative fields). Covers prerequisite and behavior for a simple one-param tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with adequate param description. Description does not add further param-specific meaning beyond the schema baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clear verb 'Retrieve' with specific resource 'Conditional Access policies for a Microsoft Entra tenant'. Distinct from sibling tools which focus on DNS/email security, not CA policies.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

States prerequisite 'Requires m365Proxy service binding' but does not explicitly compare to alternatives or specify when to avoid. Usage context is implied by the tool's niche.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_domain_rankA

Read-onlyIdempotent

Inspect

Rank a domain against its country or global cohort using the GSI benchmark corpus. Accepts a domain score (from scan_domain) and optional country/sector; returns a percentile: "scores better than X% of peers". Owner-gate exempt — public cohort data only.

ParametersJSON Schema

Name	Required	Description
`score`	Yes	Domain score (0–100) from scan_domain. Used to compute the cohort percentile.
`domain`	Yes	Domain to rank against its cohort (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`sector`	No	Sector label (e.g., "finance"). Forwarded to the cohort endpoint; sector filtering is planned for a future release.
`country`	No	ISO 3166-1 alpha-2 country code to use the country cohort (e.g., "NZ"). Omit for global cohort.

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, idempotentHint, and openWorldHint. The description adds value by specifying 'Owner-gate exempt' and 'public cohort data only,' as well as the return format. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, concise and front-loaded with the primary action. Every sentence adds essential information without fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description explains the return value (percentile) and mentions that the data is public and owner-gate exempt. It does not explicitly describe the two output formats (full vs compact), but the schema enum covers that. Overall, it is nearly complete for a read-only ranking tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds context by stating that the score comes from scan_domain, that country is ISO alpha-2, and that sector filtering is planned for a future release—this adds meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'rank' and the resource 'domain against its country or global cohort using the GSI benchmark corpus.' It also specifies the output format (percentile) and distinguishes from siblings by mentioning the GSI corpus and relation to scan_domain.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context, noting that the tool accepts a score from scan_domain and that it is 'owner-gate exempt — public cohort data only.' However, it does not explicitly mention when not to use this tool or list alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_provider_insightsA

Read-onlyIdempotent

Inspect

Get security benchmarks and common configuration issues for a specific email or DNS service-provider cohort (e.g. Google Workspace customers, Microsoft 365 customers). Use when asked how an email service provider compares to competitors on security posture, or to see typical misconfigurations for a named vendor's customers.

ParametersJSON Schema

Name	Required	Description
`format`	No	Output verbosity. Auto-detected if omitted.
`profile`	No	Profile (default "mail_enabled").
`provider`	Yes	Provider (e.g., "google workspace").

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and non-destructive nature. Description adds that it returns benchmarks and misconfigurations, which is aligned but does not disclose additional behavioral traits like pagination or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences: first defines purpose, second provides usage guidance. No wasted words, front-loaded with core functionality.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 3 parameters, no output schema, and thorough annotations, the description adequately explains when to invoke the tool. It does not detail return format, but the name and context imply what to expect. Slight gap in describing output structure.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. Description adds context by mentioning examples like 'Google Workspace customers' and broadening to 'email or DNS service-provider cohort', which helps the agent understand the 'provider' parameter better.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states verb 'Get', resource 'security benchmarks and common configuration issues', and target 'specific email or DNS service-provider cohort'. Distinguishes from sibling check_* tools by focusing on aggregated provider insights versus individual domain checks.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: when asked about provider comparisons on security posture or typical misconfigurations. Does not explicitly state when not to use or list alternatives, but the context is clear and actionable.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_brand_audit_watchesA

Read-onlyIdempotent

Inspect

Returns the caller's recurring brand-audit watches: watchId, domain, interval, webhook presence, last-run time, and active state. Owner-scoped. Read-only.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark readOnlyHint=true, openWorldHint=true, idempotentHint=true, destructiveHint=false. Description adds value by specifying 'owner-scoped' and listing return fields, which is beyond what annotations provide. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with the key purpose, no wasted words. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no parameters and an output schema present, the description provides sufficient context: it returns watches with specific fields, owner-scoped, read-only. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters, schema coverage 100%. Description adds meaning by clarifying the tool is owner-scoped and read-only, and lists the return fields. Baseline for zero params is 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool returns the caller's brand-audit watches, listing specific fields (watchId, domain, etc.) and notes owner-scoped and read-only. It distinguishes from sibling tools like register_brand_audit_watch and delete_brand_audit_watch via the 'owner-scoped' and 'returns' phrasing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implies usage for listing own watches via 'owner-scoped' and 'Read-only', but lacks explicit when-to-use or when-not-to-use guidance. No alternative tools mentioned, though siblings exist (register, delete).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

map_complianceA

Read-onlyIdempotent

Inspect

Map scan findings to compliance frameworks: NIST 800-177, PCI DSS 4.0, SOC 2, CIS Controls. Shows pass/fail/partial status per control.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true, so the description adds only minimal behavioral context (e.g., 'shows pass/fail/partial status'). It does not disclose additional traits like auth requirements or cache behavior, but also does not contradict annotations. The description carries no extra burden beyond annotations, and no contradiction is present.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences with zero waste. The first sentence clearly states purpose and lists frameworks; the second explains output format. Both sentences earn their place, and key information is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with full schema coverage and clear annotations, the description is sufficiently complete. It explains the output format (pass/fail/partial per control) despite no output schema. However, it could be clearer about the prerequisite of prior scan data, though the required 'domain' parameter implies this.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, meaning all parameters have descriptions. The tool description does not add meaning beyond what the schema already provides (e.g., domain format, enum values). The baseline score of 3 is appropriate since the schema adequately describes the parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool's purpose: mapping scan findings to specific compliance frameworks (NIST 800-177, PCI DSS 4.0, SOC 2, CIS Controls). It uses a specific verb ('map') and resource ('scan findings'), and clearly distinguishes itself from sibling tools, none of which offer compliance mapping.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when compliance mapping is needed but provides no explicit guidance on when to use this tool versus alternatives (e.g., assess_coverage or scan_domain). No when-not-to-use instructions or alternative suggestions are given, relying on the agent to infer context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

map_supply_chainA

Read-onlyIdempotent

Inspect

Map DNS-visible third-party service dependencies for a domain. Correlates SPF, NS, TXT verifications, SRV services, and CAA records to reveal which third-party vendors can send email as the domain, control DNS, or access integrated services. Use when asked to map third-party or supply-chain dependencies — not for listing who can send email (use check_spf for that).

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false, so the agent knows it's non-destructive. The description adds context on what DNS records are examined and the kind of information revealed (vendors that can send email, control DNS, access services), enhancing transparency beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences: one for purpose and method, one for usage guidance. Both sentences are essential, no unnecessary words. Highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (3 parameters, no output schema) and rich annotations, the description provides enough context: what it does, what records it looks at, and what it reveals about vendors. No gaps identified.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% (all three parameters have descriptions in the schema). The tool description does not add additional semantics beyond confirming the domain parameter usage. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool maps DNS-visible third-party service dependencies for a domain, specifying the records correlated (SPF, NS, TXT, SRV, CAA) and the vendor categories revealed. It explicitly distinguishes from sibling tool check_spf.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit when-to-use guidance ('Use when asked to map third-party or supply-chain dependencies') and when-not-to-use with a specific alternative ('not for listing who can send email (use check_spf for that)'). Could mention other alternatives but sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

osint_investigate_domain_startAInspect

Start an async OSINT investigation for a domain. Operator-deploy only; degrades to info when unprovisioned. Returns an investigationId immediately — poll with osint_investigation_status and retrieve results with osint_investigation_report.

ParametersJSON Schema

Name	Required	Description	Default
`query`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint=false and openWorldHint=true, but the description adds valuable behavioral details: async execution, immediate return of investigationId, degradation behavior when unprovisioned, and operator-deploy restriction. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loaded with the purpose, and each sentence adds essential information (purpose, constraint, follow-up). No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool is an async start operation, the description covers the return value (investigationId) and workflow (poll, retrieve). Considering the presence of an output schema and the openWorldHint, the description is largely complete. Minor gaps: no detail on what the investigation includes, but acceptable for a start tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description must compensate. The single parameter 'query' is implied to be the domain, but not explicitly stated. The description mentions 'domain' but does not connect to the parameter name. This lack of explicitness reduces clarity for an AI agent.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Start an async OSINT investigation for a domain.' It uses a specific verb ('start') and resource ('domain'), and distinguishes from sibling tools like osint_investigate_email_start by the domain focus.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context on when to use the tool: 'Operator-deploy only; degrades to info when unprovisioned.' It also guides the workflow by mentioning follow-up tools (osint_investigation_status and osint_investigation_report). However, it does not explicitly exclude alternatives or compare to other start tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

osint_investigate_email_startAInspect

Start an async OSINT investigation for an email address (breach exposure, account correlation). Owner/enterprise tier only — people-centric OSINT is restricted to prevent misuse. Returns an investigationId immediately — poll with osint_investigation_status and retrieve results with osint_investigation_report.

ParametersJSON Schema

Name	Required	Description	Default
`query`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses the async nature (returns investigationId immediately) and the need for polling, complementing annotations that indicate non-readOnly and non-idempotent behavior. It could be more explicit about underlying network requests, but the async pattern and result retrieval are clear.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, each serving a distinct purpose: stating the action and scope, noting restrictions, and explaining the async flow. No unnecessary words, front-loaded with key info.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple input (one string) and the existence of an output schema, the description sufficiently covers what the tool does, how to use it, and how to proceed after invocation (polling and retrieval). It does not need to detail output format as that is handled by the schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With schema coverage at 0%, the description is essential. It adds that the 'query' parameter is an email address, providing critical context that the schema lacks. This enables the agent to correctly interpret and use the parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Start', the resource 'OSINT investigation for an email address', and provides examples like 'breach exposure, account correlation'. It also distinguishes this tool from siblings by specifying email as the query type and noting the tier restriction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use this tool ('Owner/enterprise tier only') and provides follow-up actions: 'poll with osint_investigation_status and retrieve results with osint_investigation_report'. It implies that for email investigations, this is the correct tool among many OSINT sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

osint_investigate_infrastructure_startAInspect

Start an async deep-infrastructure OSINT investigation for a query (domain, IP, or org). Operator-deploy only; degrades to info when unprovisioned. Returns an investigationId immediately — poll with osint_investigation_status.

ParametersJSON Schema

Name	Required	Description	Default
`query`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint=false, destructiveHint=false), the description adds that the tool is async, returns an investigationId immediately, and degrades to info when unprovisioned. These are important behavioral traits not captured in annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with no redundancy. Every sentence adds unique value: purpose, restrictions, and follow-up pattern.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (though not shown) and simple parameter, the description is sufficient. It covers the start-pattern and behavior. Could elaborate on what 'deep-infrastructure' entails, but not necessary.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0% description coverage, but the description explains the 'query' parameter accepts domain, IP, or org. This adds crucial meaning beyond the schema definition.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool starts an async deep-infrastructure OSINT investigation for a query (domain, IP, or org). It distinguishes from sibling tools like osint_investigate_domain_start by specifying 'deep-infrastructure', but does not explicitly contrast with them.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions 'Operator-deploy only' and the polling pattern, providing some usage context. However, it does not specify when not to use this tool or suggest alternative tools for different investigation types.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

osint_investigate_supply_chain_startAInspect

Start an async supply-chain OSINT investigation for a query. Operator-deploy only; degrades to info when unprovisioned. Returns an investigationId immediately — poll with osint_investigation_status.

ParametersJSON Schema

Name	Required	Description	Default
`query`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint false (write) and destructiveHint false. Description adds it's async and returns investigationId. Does not elaborate on side effects of 'operator-deploy only' or behavior when unprovisioned.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no waste: first states purpose, second adds async behavior and polling instruction. Front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given async nature with output schema and sibling tools, the description is adequate. It covers initiation, return value, and follow-up. Could be slightly more specific on query semantics.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 0% description coverage for the query parameter. Description says 'for a query' but does not specify expected format or examples (e.g., company name, domain, etc.), leaving ambiguity.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it starts an async supply-chain OSINT investigation, and the verb 'investigate' with specific resource 'supply-chain' distinguishes it from sibling tools like osint_investigate_domain_start.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'Operator-deploy only' and 'degrades to info when unprovisioned', and directs to poll with osint_investigation_status. However, no explicit comparison to alternatives for other investigation types.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

osint_investigate_username_startAInspect

Start an async OSINT investigation for a username (cross-platform presence, breach correlation). Owner/enterprise tier only — people-centric OSINT is restricted to prevent misuse. Returns an investigationId immediately — poll with osint_investigation_status and retrieve results with osint_investigation_report.

ParametersJSON Schema

Name	Required	Description	Default
`query`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate openWorldHint=true and destructiveHint=false; the description explains the async nature (returns investigationId immediately) and the need to poll. No contradiction with annotations. Discloses that it is a start operation, not a final result.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences, front-loaded with the main purpose. Every sentence provides essential information without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple async tool with one parameter and an output schema available, the description covers purpose, usage restriction, and next steps. No gaps for the given complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only one parameter 'query' with no description in schema (0% coverage). The description adds that it expects a 'username', but gives no format or constraints beyond schema min/max length. Partially compensates but lacks detail.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Start an async OSINT investigation') and the resource ('username'). It also specifies the scope ('cross-platform presence, breach correlation'), which distinguishes it from other investigation types like domain or email.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states access restriction ('Owner/enterprise tier only') and provides a reason ('prevent misuse'). Also indicates the async workflow by telling the agent to poll with other tools. However, it does not explicitly list alternatives or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

osint_investigation_reportA

Read-onlyIdempotent

Inspect

Retrieve the final report of a completed OSINT investigation by investigationId. Operator-deploy only; degrades to info when unprovisioned or not yet complete.

ParametersJSON Schema

Name	Required	Description	Default
`investigationId`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, idempotent, and non-destructive nature. The description adds valuable behavioral context about degradation to info when unprovisioned or incomplete, and operator-only deployment, which annotations do not cover.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence with a parenthetical qualification. It is front-loaded with the main action and provides essential constraints without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has one parameter and an output schema (exists but not detailed). The description covers when to use, the condition for degraded info, and the action. Minor gaps: no mention of report format or contents, but output schema likely covers that.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description must compensate. It only mentions 'by investigationId' without explaining the parameter's format, constraints, or source. This adds minimal meaning beyond the schema's type and length restrictions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool retrieves the final report of a completed OSINT investigation by investigationId. It is specific in verb and resource, but does not explicitly distinguish from sibling tools like osint_investigation_status, which also deals with investigation state.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context: 'Operator-deploy only' and 'degrades to info when unprovisioned or not yet complete.' This guides when to use, but no explicit alternatives or when-not-to-use scenarios are given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

osint_investigation_statusA

Read-onlyIdempotent

Inspect

Poll the status of an OSINT investigation by investigationId. Operator-deploy only; degrades to info when unprovisioned. Returns current status (running | completed | failed) and progress metadata.

ParametersJSON Schema

Name	Required	Description	Default
`investigationId`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, idempotentHint, and destructiveHint. The description adds non-obvious behaviors: 'degrades to info when unprovisioned' and specifics about return values (running, completed, failed) and progress metadata, enhancing transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description consists of two sentences, each serving a distinct purpose: purpose/constraint and return info. It is front-loaded, concise, and contains no superfluous words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has an output schema, so description need not fully detail returns. It covers key aspects: when to use (after starting), constraints, and return types. Missing error handling or timeout info, but adequate for a polling tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With schema description coverage at 0%, the description should compensate. It mentions 'by investigationId' but adds no further constraints or origin details. For a single simple parameter, the minimal addition is adequate but not excellent.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly specifies the action (poll), the resource (status of an OSINT investigation), and distinguishes from sibling tools like osint_investigate_*_start and osint_investigation_report by focusing on status polling.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states 'Operator-deploy only' and 'degrades to info when unprovisioned', providing clear context on when to use. It implies usage after starting an investigation but does not explicitly list alternatives, though the purpose is distinct enough.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

prioritize_csc_leadsA

Read-onlyIdempotent

Inspect

Rank a brand’s portfolio (or an explicit domain set) into prioritized CSC sales leads by product-gap value × severity. Multi-domain, paid. Reuses map_csc_products per domain, then ranks. Distinct from map_csc_products (per-domain product mapping) and batch_scan (raw scores).

ParametersJSON Schema

Name	Required	Description
`brand`	No	Brand seed apex; discovers the portfolio, derives ownership buckets, then ranks the top candidates.
`format`	No	Output verbosity. Auto-detected if omitted.
`domains`	No	Explicit domain set to rank (max 10). Ownership bucket = "unknown".
`force_refresh`	No	Bypass cache and run fresh scans.

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds context beyond annotations by noting it reuses map_csc_products and is paid, which aligns with the readOnlyHint and idempotentHint annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences with no wasted words: first states the core function, second adds key characteristics, third provides sibling differentiation. Front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, usage context, and distinction well, but given no output schema, a brief note on return format would enhance completeness. Still solid overall.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all parameters; the tool description does not add new parameter-level detail beyond stating the overall ranking logic, meeting baseline but not exceeding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description specifies the action ('Rank'), the resource ('brand portfolio or explicit domain set'), and the criteria ('product-gap value × severity'), while explicitly distinguishing from siblings map_csc_products and batch_scan.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It states the tool is multi-domain and paid, and contrasts it with similar tools for per-domain mapping and raw scores, providing clear when-to-use guidance, though it lacks explicit when-not-to-use conditions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

query_signinsA

Read-onlyIdempotent

Inspect

Query Microsoft Entra sign-in logs for a tenant. Optionally filter by user principal name, failure status, or lookback window. Requires m365Proxy service binding; returns { unprovisioned: true } when absent. A representative: true field in the response marks sample (non-live) data until live Graph reads land.

ParametersJSON Schema

Name	Required	Description
`since_hours`	No	Lookback window in hours (default: 24, max: 720).
`ms_tenant_id`	Yes	Microsoft Entra tenant ID (GUID or domain).
`failures_only`	No	When true, return only failed sign-ins.
`user_principal_name`	No	Filter to a specific user (UPN). Omit for all users.

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint, openWorldHint, idempotentHint, and destructiveHint. The description adds value by noting the m365Proxy service binding requirement, the unprovisioned response, and the representative flag for sample data.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences with no wasted words. Each sentence contributes a distinct piece of information: purpose, filter options, and behavioral notes.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers key behavioral aspects (service binding, response fields) but omits pagination, rate limits, or error handling. Given the complexity and annotations, it is fairly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for each parameter. The description summarizes filters but does not add significant new meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool queries Microsoft Entra sign-in logs, with a specific verb and resource. It distinguishes from siblings like 'query_ual' by targeting sign-in specific logs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions optional filters (user principal name, failure status, lookback window) but does not explicitly state when to use this tool over alternatives or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

query_ualA

Read-onlyIdempotent

Inspect

Query the Microsoft 365 Unified Audit Log for a tenant. Optionally filter by operation type, user, or lookback window. Requires m365Proxy service binding; returns { unprovisioned: true } when absent. A representative: true field in the response marks sample (non-live) data until live Graph reads land.

ParametersJSON Schema

Name	Required	Description
`operation`	No	Filter to a specific Unified Audit Log operation (e.g., "MailItemsAccessed").
`since_hours`	No	Lookback window in hours (default: 24, max: 720).
`ms_tenant_id`	Yes	Microsoft Entra tenant ID (GUID or domain).
`user_principal_name`	No	Filter to a specific user (UPN). Omit for all users.

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds behavioral details beyond annotations: the `representative: true` marker for sample data and the unprovisioned response. This aligns with readOnlyHint and idempotentHint, with no contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences front-load the core action, then add filtering, prerequisites, and a behavioral note. Every sentence adds value, with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Without an output schema, the description explains key response fields (unprovisioned, representative). It covers purpose, filters, and a prerequisite. Minor gap: no mention of pagination or maximum result limit.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so descriptions already document parameters. The description summarizes filtering options but adds no new constraints or formats beyond the schema. Baseline score is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as querying the Microsoft 365 Unified Audit Log, specifying the resource and action. It distinguishes from siblings by being the only audit log query tool among many checks and scans.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states optional filters and a prerequisite (m365Proxy service binding), including a fallback response. It does not explicitly exclude scenarios or list alternatives, but the context is clear enough for an agent to decide.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rdap_lookupA

Read-onlyIdempotent

Inspect

Fetch domain registration data via RDAP (modern WHOIS replacement). Returns the domain registrar (the company the domain was registered with), registrant contact, creation/expiration dates, EPP status codes, and domain age. Use when asked who registered the domain, who the registrar is, or when the registration expires — distinct from check_ns which identifies the DNS nameserver provider.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotent=true, destructiveHint=false. The description adds context on caching (force_refresh bypasses cache) and auto-detection of output format, providing additional behavioral insights.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no waste. First sentence states purpose and data returned, second provides usage guidance and differentiation. Efficiently structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Tool is simple (3 params, one required). Description covers purpose, returned data, caching behavior, and usage context. Output schema exists, so return format is covered elsewhere. Complete for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds value by explaining 'force_refresh' as 'Bypass cache and run a fresh check. Useful after DNS changes' and 'format' auto-detection, which goes beyond schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Fetch domain registration data via RDAP' and lists specific data items (registrar, contacts, dates, etc.), distinguishing it from check_ns which identifies DNS nameserver providers.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: 'asked who registered the domain, who the registrar is, or when the registration expires' and contrasts with sibling check_ns.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

register_brand_audit_watchAInspect

Creates a recurring brand-audit watch for a domain on a daily/weekly/monthly cadence. Each run enqueues a fresh brand_audit_batch_start and (when a webhook is configured) POSTs a diff webhook on classification drift. Returns the new watchId. Owner-scoped; per-principal cap of 20 active watches.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to watch.
`interval`	Yes	Recurrence interval.
`webhook_url`	No	Optional webhook URL — POSTed on classification drift. Re-validated for SSRF at both register and delivery time.

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses side effects (enqueues brand_audit_batch_start, POSTs webhook on drift), return type (watchId), and per-principal cap, going well beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three efficient sentences cover purpose, side effects, return value, and constraints without waste; front-loaded with key action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers all essential aspects: creation, recurrence, side effects, output (watchId), and constraints; matches tool complexity with 3 parameters and output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%; description adds value by explaining webhook usage (on classification drift) and SSRF re-validation, though baseline is 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it creates a recurring brand-audit watch for a domain with daily/weekly/monthly cadence, distinguishing it from list/delete sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides usage context (recurring audits) and constraints (owner-scoped, cap of 20), but does not explicitly state when not to use or list alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

resolve_spf_chainA

Read-onlyIdempotent

Inspect

Trace the full SPF include chain for a domain. Recursively resolves all includes, shows lookup count, tree depth, and flags circular includes or exceeding the 10-lookup limit.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, idempotent, non-destructive behavior. The description adds valuable behavioral context: recursive resolution, flagging circular includes and 10-lookup limit. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with purpose, no wasted words. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a relatively simple tool with 3 parameters and no output schema, the description covers key behaviors (recursive resolution, flags for circular includes and 10-lookup limit). It is complete enough for an agent to understand what the tool does and what to expect.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% of parameters with clear descriptions. The tool description adds no additional parameter semantics beyond the schema, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool traces the full SPF include chain for a domain, recursively resolving includes, showing lookup count, tree depth, and flagging issues. It uses specific verbs and resource, distinguishing it from sibling tools like check_spf.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no explicit guidance on when to use this tool versus alternatives. It implies deep analysis but does not state when-not or contrast with sibling tools like check_spf.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

scan_buckets_findingsA

Read-onlyIdempotent

Inspect

Retrieve findings from a completed cloud-bucket discovery scan by scanId. Operator-deploy only; degrades to info when unprovisioned. The scanId is required so reads can be owner-scoped; target and provider filters are optional.

ParametersJSON Schema

Name	Required	Description	Default
`scanId`	Yes
`target`	No
`providers`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, idempotentHint, and destructiveHint. The description adds behavioral context: 'Operator-deploy only' and 'degrades to info when unprovisioned', which goes beyond annotations. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, each serving a distinct purpose: first defines the core action, second adds deployment and parameter context. No redundant or extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers main purpose, required/optional parameters, deployment context, and behavior when unprovisioned. Since there is an output schema, return values need not be described. Minor gap: does not mention pagination or ordering of findings, but likely covered by output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, so description must compensate. It adds that scanId is required and owner-scoped, and target/providers are optional filters. However, it does not elaborate on format, constraints, or behavior beyond the schema, leaving some ambiguity.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Retrieve findings') and the resource ('completed cloud-bucket discovery scan by scanId'). It distinguishes from sibling tools like scan_buckets_start and scan_buckets_status by implying this is the retrieval step after completion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear context: 'Operator-deploy only' and 'degrades to info when unprovisioned' gives guidance on when the tool is available. Also specifies that scanId is required for owner-scoping and target/providers are optional, but does not explicitly compare to alternatives or state when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

scan_buckets_startAInspect

Start an async cloud-bucket discovery scan for a target domain. Operator-deploy only; degrades to info when unprovisioned. Returns a scanId immediately — poll progress with scan_buckets_status and retrieve results with scan_buckets_findings.

ParametersJSON Schema

Name	Required	Description	Default
`target`	Yes
`providers`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint=false, destructiveHint=false, idempotentHint=false), the description adds significant behavioral context: it's asynchronous, returns a scanId, degrades to info when unprovisioned, and is operator-deploy only. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with the core purpose, no redundancy. Every sentence adds value: action, constraint, next steps. Efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers async behavior, deployment constraint, fallback behavior, and immediate return value (scanId). Refers to output schema for return details. Missing explanation of 'providers' parameter, but overall sufficient given output schema presence.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, so description must compensate. 'target domain' clarifies the 'target' parameter. However, the optional 'providers' parameter is not explained, leaving ambiguity about allowed values or meaning. This is a gap in a two-parameter tool.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Start an async cloud-bucket discovery scan') and the target resource ('a target domain'). It distinguishes itself from sibling tools by referencing scan_buckets_status and scan_buckets_findings for polling and retrieval, making the tool's role unique.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states 'Operator-deploy only; degrades to info when unprovisioned', providing a clear when-to-use condition. Also directs to sibling tools for polling (scan_buckets_status) and retrieval (scan_buckets_findings), offering explicit alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

scan_buckets_statusA

Read-onlyIdempotent

Inspect

Poll the status of a cloud-bucket discovery scan by scanId. Operator-deploy only; degrades to info when unprovisioned. Returns scan status (running | completed | failed) and progress metadata.

ParametersJSON Schema

Name	Required	Description	Default
`scanId`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`score`	Yes
`passed`	Yes
`category`	Yes
`findings`	Yes

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

While annotations declare readOnlyHint, destructiveHint, idempotentHint, the description adds deployment dependency and behavior on unprovisioned status. However, it doesn't discuss authentication, rate limits, or other behavioral traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Very concise: one sentence plus a compact fragment. Front-loaded purpose, no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers the basic purpose and some special behaviors (operator-only, unprovisioned degradation), but lacks details on return format (though output schema exists) and edge cases. Adequate for a simple read tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description barely addresses the sole parameter scanId (only mentions 'by scanId'), not adding format, constraints, or examples beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it polls the status of a cloud-bucket discovery scan by scanId, with specific verb 'poll' and resource 'scan status'. It distinguishes from siblings like scan_buckets_start and scan_buckets_findings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides context that it is operator-deploy only and degrades when unprovisioned, but lacks explicit guidance on when to use vs alternatives or when not to use. No sibling comparisons.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

scan_domainA

Read-onlyIdempotent

Inspect

Run a full DNS and email security audit for a single domain. Aggregates every scan-included check in parallel (SPF, DKIM, DMARC, DNSSEC, TLS/SSL, MTA-STS, CAA, BIMI, subdomain takeover, and more) and returns an overall security score, NIST-aligned letter grade (6-band A+/A/B/C/D/F), maturity stage, and prioritized findings. Use for a comprehensive single-domain audit, to get a domain's overall security grade, or to assess email security maturity.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`profile`	No	Scoring profile. Default "auto" detects.
`force_refresh`	No	Bypass cache and run a fresh scan. Useful after DNS changes.

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, openWorldHint=true, idempotentHint=true, destructiveHint=false. The description adds behavioral details: parallel execution of checks, caching with force_refresh option, and output format. No contradictions. It provides useful context beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences with no wasted words. First sentence states the purpose, second details scope and returns, third gives usage cases. Highly efficient and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the moderate complexity of an aggregate scan tool, the description covers purpose, scope, and output. No output schema exists, but return values are described (score, grade, maturity, findings). Annotations cover safety and idempotency. No missing information for an agent to use this tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and descriptions for each parameter are clear. The tool description does not add significant extra meaning for parameters beyond what is in the schema. The description mentions caching implicitly (force_refresh) but no new semantic depth. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Run a full DNS and email security audit for a single domain' with a specific verb and resource. It enumerates many included checks (SPF, DKIM, etc.) and explains the outputs (security score, grade, maturity, findings). This differentiates it from sibling tools that focus on individual checks or batch operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use for a comprehensive single-domain audit, to get a domain's overall security grade, or to assess email security maturity.' While it doesn't explicitly state when not to use it, the context and sibling tools imply that for specific checks one should use the dedicated tools. No misleading guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

simulate_attack_pathsA

Read-onlyIdempotent

Inspect

Analyze current DNS posture and enumerate specific attack paths an adversary could exploit, with severity, feasibility, steps, and mitigations.

ParametersJSON Schema

Name	Required	Description
`domain`	Yes	Domain to check (e.g., example.com)
`format`	No	Output verbosity. Auto-detected if omitted.
`force_refresh`	No	Bypass cache and run a fresh check. Useful after DNS changes.

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, non-destructive behavior. The description adds that the tool enumerates attack paths with specific attributes (severity, feasibility, steps, mitigations), providing valuable behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, front-loaded with action and resource, no redundant or superfluous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the three parameters and no output schema, the description provides adequate context about what the tool does and what it returns. Could be slightly more detailed about output structure but sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters. The description adds no new parameter-level detail beyond overall tool purpose.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool analyzes DNS posture and enumerates specific attack paths with severity, feasibility, steps, and mitigations. It uses specific action verb and resource, distinguishing it from sibling single-check tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for overall DNS posture analysis but does not explicitly state when to use this tool versus alternatives, nor does it provide when-not instructions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

validate_fixA

Read-onlyIdempotent

Inspect

Re-check a specific security control after applying a fix, to confirm the finding is now resolved. Use only when a fix has already been applied and you want to verify or confirm the remediation was successful — not for initial inspection of a record.

ParametersJSON Schema

Name	Required	Description
`check`	Yes	Check name to re-run (e.g., "dmarc", "spf")
`domain`	Yes	Domain to validate the fix for
`format`	No	Output verbosity. Auto-detected if omitted.
`expected`	No	Expected DNS record value to verify against

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true, destructiveHint=false, idempotentHint=true. Description adds the post-fix verification context, which is valuable beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two efficient sentences, front-loaded with purpose, followed by usage guideline. No superfluous words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Tool has 4 parameters with no output schema. Description covers purpose and usage but doesn't describe return values. Given the tool's simplicity and good annotations, it's mostly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions for all 4 parameters. Description does not add additional semantics beyond what schema provides, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it re-checks a security control after a fix to confirm resolution. It distinguishes from initial inspection, differentiating it from sibling check_* tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use ('only when a fix has already been applied') and when not to ('not for initial inspection'), providing clear context for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [{ "email": "your-email@example.com" }]
}

The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.

Server Details

Available Tools

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Discussions

Your Connectors

Resources