Skip to main content
Glama

Server Details

Open-source AI accounting skills verified by licensed accountants (tax, VAT, payroll).

Status
Healthy
Last Tested
Transport
Streamable HTTP
URL
Repository
openaccountants/openaccountants
GitHub Stars
209
Server Listing
OpenAccountants

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client
Glama
MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.
Tool DescriptionsA

Average 4.4/5 across 36 of 36 tools scored. Lowest: 3.5/5.

Server CoherenceA
Disambiguation4/5

Most tools have clear, distinct purposes, but a few pairs like submit_fact_verification vs. submit_verification and search_rules vs. search_skills could cause confusion. Overall, well-disambiguated.

Naming Consistency4/5

Naming follows a consistent snake_case pattern with action verbs (add, get, list, submit, etc.). Some deviations like start, start_help, and plan_cross_border are tolerable and internally consistent.

Tool Count3/5

36 tools is on the high side, but the domain is broad (workflow, skill, verification, cross-border). Some consolidation possible among verification tools, but overall reasonable for the scope.

Completeness3/5

Covers many important operations (search, create, verify, feedback). However, lacks an update_skill tool and has incomplete lifecycle for workflows (no delete workflow). Notable but not severe gaps.

Available Tools

36 tools
add_workflow_nodeAdd a workflow node (step)A
Idempotent
Inspect

Add a guided step (node) to a workflow. Creates a DRAFT node. Optionally wire it to the skill that performs its computation (skill_slug). Verified accountants (in-jurisdiction) + admins.

ParametersJSON Schema
NameRequiredDescriptionDefault
titleYes
summaryNo
guidanceNoHow the agent should run this step
positionNoOrder; omit to append
skill_slugNoSkill this node runs (makes it 'implemented')
key_outputsNo
key_questionsNo
workflow_slugYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
messageYes
node_idNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate a non-destructive, idempotent write operation. The description adds behavioral context: it creates a DRAFT node and restricts usage to specific roles. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences but the second sentence is fragmented ('Verified accountants ...'). It is reasonably concise but could be better structured for readability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers core purpose and who can use it, but lacks explanation of the 'wiring' concept, what happens if skill_slug is omitted, or that the node is added to a specific workflow (workflow_slug). Adequate but not comprehensive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 38% schema description coverage, the description adds little beyond what the schema already provides. It mentions 'skill_slug' but does not explain other parameters or their semantics in a way that compensates for the low coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool adds a guided step (node) to a workflow, creates a DRAFT node, and optionally wires it to a skill. This distinguishes it from siblings like 'update_workflow_node' or 'archive_workflow_node'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions authorized users ('Verified accountants (in-jurisdiction) + admins') but does not explicitly state when to use this tool vs alternatives like 'add_workflow_skill'. Usage context is implied but not explicitly guided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

add_workflow_skillAdd a skill to a workflowA
Idempotent
Inspect

Attach a published skill (same jurisdiction) to a workflow's ordered skill set. Recomputes the workflow tier. Verified accountants (in-jurisdiction) + admins.

ParametersJSON Schema
NameRequiredDescriptionDefault
roleNointake | content | assembly | foundation | reference
skill_slugYes
step_orderNo
workflow_slugYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
tierNo
addedNo
messageYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds beyond annotations: recomputes workflow tier and specifies authorized users. Annotations already indicate non-destructive and idempotent, so description complements without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with the core action, no fluff. Efficiently conveys key constraints.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Provides essential context (jurisdiction, user roles, tier recalculation) but lacks parameter explanations despite low schema coverage. Has output schema, so return value omission is acceptable.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is low (25%) with only 'role' described in schema. Description does not explain any parameters, failing to compensate for the lack of schema documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the action: attaching a published skill to a workflow's ordered skill set, including constraints like same jurisdiction. Distinguishes from siblings like remove_workflow_skill.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implies usage context (attaching a skill) and prerequisites (published skill, same jurisdiction, user role). Lacks explicit when-not or alternatives like add_workflow_node.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

archive_workflow_nodeArchive a workflow nodeA
Idempotent
Inspect

Soft-delete a node. Refused if it would leave a published workflow with no published nodes. Verified accountants (in-jurisdiction) + admins.

ParametersJSON Schema
NameRequiredDescriptionDefault
node_idYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
messageYes
archivedNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds value beyond annotations: it explains the soft-delete behavior, refusal condition, and permission requirements (verified accountants + admins). Annotations only provide idempotentHint and destructiveHint, which the description complements.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise: three sentences covering action, condition, and authorization. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has an output schema, so return values are not needed. However, the lack of parameter documentation and no guidance on how to obtain node_id leaves the description incomplete for a 1-parameter tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has 0% description coverage for its only parameter (node_id). The description does not mention the parameter or provide any guidance on its format or source, leaving a significant gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs a 'soft-delete' on a node, which is a specific verb and resource. It distinguishes itself from sibling tools like add_workflow_node or publish_workflow_node by mentioning deletion context and adding conditions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly mentions when the tool is refused (if it would leave a published workflow with no published nodes) and who can use it (verified accountants + admins). However, it does not directly compare to alternatives like remove_workflow_skill.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

compare_jurisdictionsCompare tax across jurisdictionsA
Read-onlyIdempotent
Inspect

Quick SIDE-BY-SIDE loader. Loads the income-tax skills for 2–5 jurisdictions as independent blocks so the agent can produce a static comparison (effective rates at a given income level, headline differences, entity-choice implications). It does NOT sequence events or bridge treaties. Use for 'should I incorporate in X or Y?', 'compare tax in MT vs IE', or any standalone side-by-side. NOTE: if the person's facts actually INTERACT across borders (a US person abroad, a residence change, a foreign trust/company, an expatriation), use plan_cross_border instead — that tool returns a sequenced plan and the treaty bridge this one deliberately leaves out. The two are siblings: this one for static compares, plan_cross_border for live cross-border planning.

ParametersJSON Schema
NameRequiredDescriptionDefault
incomeNoOptional: income figure with currency, e.g. 'EUR 80000' or 'USD 250000'.
entity_typeNoOptional taxpayer/entity type for the comparison context. One of the listed values.
jurisdictionsYesArray of 2-5 ISO codes (e.g. ['MT', 'IE'] or ['US-CA', 'US-TX', 'US-FL']).

Output Schema

ParametersJSON Schema
NameRequiredDescription
incomeNo
entity_typeNo
jurisdictionsNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, openWorldHint, destructiveHint=false. The description adds that the tool does NOT sequence events or bridge treaties, which is valuable context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is moderately long but well-structured with clear sections and every sentence adds value, though slightly verbose for a simple tool.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 3 parameters with full schema coverage, output schema presence, rich annotations, and sibling tools, the description covers purpose, usage, limitations, and alternatives comprehensively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with parameter descriptions. The description echoes schema constraints (2–5 jurisdictions, income with currency) but adds no deeper meaning beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool loads income-tax skills for 2–5 jurisdictions as independent blocks for static comparison, distinguishing it from the sibling tool plan_cross_border that handles cross-border interactions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly provides when-to-use examples ('should I incorporate in X or Y?') and when-not-to-use conditions (facts interact across borders), naming the alternative tool plan_cross_border.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_workflowCreate a workflow from skillsA
Idempotent
Inspect

Design a NEW guided workflow for a jurisdiction + type, optionally seeding it with the skills it's based on. Creates a DRAFT (not public until publish_workflow). Create-or-adopt: if a workflow for that (jurisdiction, workflow_type) already exists it is returned for editing instead of duplicated. Verified accountants only, in their approved jurisdictions.

ParametersJSON Schema
NameRequiredDescriptionDefault
titleYesHuman title, e.g. 'Prepare a Malta VAT return'
triggersNoPhrases that route users here
descriptionNo
skill_slugsNoSkills to attach, in order (first=intake, last=assembly)
jurisdictionYesJurisdiction code, e.g. 'MT'
workflow_typeYesself-employed | vat | payroll | corporate | cross-border | capital-gains | crypto

Output Schema

ParametersJSON Schema
NameRequiredDescription
slugNo
createdNo
messageYes
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate idempotent and non-destructive. Description adds crucial context: returns existing workflow on duplicate (explaining idempotency), creates draft (non-public), and restricts to verified accountants. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no fluff. First sentence states purpose, second details constraints and behavior. Information is front-loaded and every sentence is necessary.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (6 params, output schema exists), description covers all crucial aspects: draft status, idempotent behavior, user restrictions. Output format is not needed due to output schema. Complete guidance for correct tool invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is high (83%), so baseline is 3. Description adds value beyond schema by explaining the create-or-adopt logic and the draft concept, and hints at skill_slugs ordering ('first=intake, last=assembly') which is also in schema but reinforced. One parameter (description) lacks description in schema, but description doesn't cover it.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the verb 'Design' and resource 'new guided workflow' with specific scope (jurisdiction + type), distinguishing it from siblings like publish_workflow and update_workflow.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly mentions draft status, create-or-adopt behavior, and user prerequisites (verified accountants, approved jurisdictions). Provides clear guidance on when to use and what to expect.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_ratesGet current-year indexed tax ratesA
Read-onlyIdempotent
Inspect

Returns the machine-readable annual rates for a given jurisdiction + tax year. Covers federal brackets, Social Security wage base, retirement plan limits (401(k), IRA, HSA), FEIE cap, gift/estate exemptions, 1099-K thresholds, mileage rates, supplemental wage rates, capital gains brackets, CTC. Currently US federal for tax years 2025 and 2026. Use this when the user asks specific dollar amounts that change yearly (e.g. '2025 401(k) limit', 'this year's Social Security wage base').

ParametersJSON Schema
NameRequiredDescriptionDefault
tax_yearYesTax year (e.g. 2025, 2026).
jurisdictionYesJurisdiction code. Currently only 'US' (US federal) is supported.

Output Schema

ParametersJSON Schema
NameRequiredDescription
ratesNo
tax_yearNo
source_urlNo
next_actionNo
jurisdictionNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, idempotentHint, destructiveHint. The description adds context about supported jurisdictions, years, and specific rate categories, enhancing transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences: purpose, contents, usage guidance. No wasted words, front-loaded with key action. Efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of output schema and rich annotations, the description covers intended use, supported values, and examples. Could mention error handling or rate format, but overall sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline 3. The description adds semantic value by specifying that jurisdiction is currently only 'US' and providing examples of tax year usage, going beyond schema definitions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'returns' and resource 'machine-readable annual rates for a given jurisdiction + tax year', listing specific rate types. It distinguishes from siblings indirectly but lacks explicit differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives explicit when-to-use guidance: 'Use this when the user asks specific dollar amounts that change yearly' with examples. No explicit when-not-to-use, but clear enough.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_skillGet a tax skillA
Read-onlyIdempotent
Inspect

Fetch a published skill by slug, including its current-version markdown, quality tier, named verifier (where accountant-verified), and a provenance/attribution footer.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYesSkill slug, e.g. 'us-schedule-c-and-se-computation'

Output Schema

ParametersJSON Schema
NameRequiredDescription
skillYesThe skill record (slug, name, jurisdiction, tier, etc.)
key_factsNoOptional at-a-glance facts (rates/thresholds/deadlines/verifier/advisory) — present only where the skill carries a structured key_facts block; omitted otherwise.
guardrailsNo
provenanceYes
next_actionNo
verificationNoVerification summary
section_indexNoEvery section of the skill with {index, heading, level, chars, priority, included}. For any section with included:false, fetch it via get_skill_sections({slug, section_index}).
current_versionNoCurrent version. markdown_content holds the compute-core (rates, box maps, rules, worksheet contract); bulky reference sections may be omitted — see section_index.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true, destructiveHint=false, idempotentHint=true. Description adds behavioral context beyond annotations: it fetches only 'published' skills and specifies exactly what fields are returned (markdown, quality tier, verifier, footer). This enriches the agent's understanding of the tool's output.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single, well-structured sentence that front-loads the core action and resource. Every word is informative; no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple 1-parameter tool with an output schema, the description covers the essential behavioral details. It could briefly note that this fetches a single skill (vs 'list_skills') but overall is sufficiently complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% for the single 'slug' parameter; the description restates 'by slug' but adds no further semantics beyond the schema's example. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description uses specific verb 'Fetch' and resource 'a published skill by slug', clearly distinguishing from siblings like 'list_skills' (which returns multiple skills) and 'search_skills' (which searches by criteria). Also enumerates returned fields (markdown, quality tier, verifier, footer).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implies usage when needing a single skill by slug, but no explicit when-to-use or when-not-to-use compared to siblings. Context signals and sibling names provide indirect guidance, but description could be clearer.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_skill_sectionsGet a skill's sectionsA
Read-onlyIdempotent
Inspect

Fetch the parsed sections of a skill's current version. Each section has a heading and its markdown content. Use this to pull a specific section that get_skill listed in section_index as not inlined (e.g. a supplier-pattern library) — pass section_index to fetch just that one. Omit it to get every section.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYesSkill slug
section_indexNoOptional. Return only this section (matches the `index` from get_skill's `section_index`).
section_indicesNoOptional. Return only these sections.

Output Schema

ParametersJSON Schema
NameRequiredDescription
slugYes
versionNo
sectionsYes
key_factsNoOptional at-a-glance facts — present only where the skill carries a structured key_facts block; omitted otherwise.
guardrailsNo
next_actionNo
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, destructiveHint=false, idempotentHint=true, and openWorldHint=true, providing strong behavioral transparency. The description adds that output includes heading and markdown content, which is consistent but not a significant addition beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with purpose, followed by structure and usage. Every sentence is necessary and concise, with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, usage, and the most important parameter. Minor omission: does not explicitly mention section_indices parameter, but schema handles it. Output schema exists, so return values are documented. Overall sufficient for correct tool selection.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage, so baseline is 3. Description adds useful context for section_index (linking to get_skill's section_index) but does not mention section_indices. Overall, it adds marginal value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it fetches parsed sections of a skill's current version, with each section containing heading and markdown content. It distinguishes from sibling tool get_skill by specifying it retrieves the actual content, not just indices.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit guidance on when to use section_index (for a specific non-inlined section from get_skill's section_index) and when to omit it (to get all sections). Clearly differentiates use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_workflowGet a guided workflowA
Read-onlyIdempotent
Inspect

Fetch a workflow by slug with its ordered nodes (guided steps) and the skills it's built from. Public callers see published content only; verified accountants/admins also see draft nodes for workflows in their jurisdiction (use this to review before publishing).

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYesWorkflow slug, e.g. 'mt-vat'

Output Schema

ParametersJSON Schema
NameRequiredDescription
nodesNo
skillsNo
can_editNo
workflowNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark the tool as readOnly, idempotent, and non-destructive. The description adds critical behavioral context: role-based access control (public vs. accountants/admins) and filtering by jurisdiction for drafts. This goes well beyond annotations, fully disclosing when different data is returned.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with the core action ('Fetch a workflow by slug...'), immediately following with key details (ordered nodes, skills). The second sentence adds role-specific behavior efficiently. No wasted words; every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of workflows (containing nodes and skills) and role-based filtering, the description covers what is returned, for whom, and the review use case. An output schema exists, so return format is not needed. This is a complete description for the tool's purpose.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% (slug described). The description adds an example ('e.g. 'mt-vat'') and clarifies that the slug identifies the workflow, which adds value beyond the raw schema. No additional parameter semantics are needed given the simplicity of a single parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb 'Fetch' and names the resource 'workflow by slug', clearly distinguishing it from siblings like list_workflows (which returns many) and inspect_workflow (likely returns more detail). It also specifies the returned contents (ordered nodes, skills) and role-based visibility, making purpose unmistakable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains that public callers see only published content and verified accountants/admins see draft nodes for review before publishing, providing clear context for different user roles. However, it does not explicitly state when to use alternatives like inspect_workflow or list_workflows, which would improve guidance further.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

inspect_workflowInspect a workflow (visual + health)A
Read-onlyIdempotent
Inspect

Return a VISUAL map (Mermaid flowchart) of a workflow's nodes wired to their skills, colour-coded by health (green = published + wired, yellow = draft, red = unwired or flagged), plus a structural lint: unwired nodes, tier mismatch (tier-1 workflow on unverified skills), wired skills that are unpublished or wrong-jurisdiction, trigger collisions, and empty published workflows. Use before publishing or licensing a workflow. RENDER the mermaid field as a diagram for the user, then summarise the findings.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYesWorkflow slug, e.g. 'mt-self-employed'

Output Schema

ParametersJSON Schema
NameRequiredDescription
mermaidNo
summaryNo
findingsNo
workflowNo
next_actionNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, idempotentHint, and destructiveHint. Description adds valuable behavioral details: color-coding by health, lint checks (unwired nodes, tier mismatch, etc.), and instruction to render the mermaid field. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, with two sentences covering all key points. No wasted words, though the first sentence is somewhat long and could be split for readability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (visual map + lint) and existence of an output schema, the description adequately explains what is returned (mermaid diagram, findings). It covers the main behavioral aspects without needing to detail every return field.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% (slug with description). The description adds a brief example ('e.g. 'mt-self-employed'') but does not provide additional semantic meaning beyond the schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns a visual map and structural lint, specifying verb 'Return' and resource 'workflow's nodes wired to their skills' with health indicators. It distinguishes from siblings like get_workflow by emphasizing visualization and linting.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'Use before publishing or licensing a workflow.' Provides clear context for when to use, but does not explicitly mention when not to use or name specific alternatives among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_jurisdictionsList all jurisdictions coveredA
Read-onlyIdempotent
Inspect

Returns every jurisdiction with published skills — countries (ISO 2), US states (US-XX), Canadian provinces — with skill counts, accountant-verified counts, and named lead verifier. Use when the user asks 'which countries does OpenAccountants cover?' or 'what's available for [country]?' Avoids paginating through list_skills to compute this.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Output Schema

ParametersJSON Schema
NameRequiredDescription
next_actionNo
total_skillsNo
jurisdictionsNo
total_jurisdictionsNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description adds valuable context about returned data fields and pre-computed nature, complementing annotations (readOnlyHint, idempotentHint, openWorldHint). Could mention if any latency or caching, but overall sufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences plus a usage sentence. Front-loaded with key information, no redundant words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no parameters and an output schema, the description fully covers what the tool does, what it returns, and when to use it. Complete for its complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters in schema, baseline 4. Description implicitly explains that no input is needed since it returns all jurisdictions. No further param detail required.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clear verb 'Returns' with specific resource 'jurisdictions' and detailed output fields (skill counts, verified counts, lead verifier). Explicitly distinguishes from sibling tool 'list_skills' by noting it avoids pagination.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when-to-use examples ('which countries does OpenAccountants cover?', 'what's available for [country]?') and tells what to avoid ('Avoids paginating through list_skills').

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_my_verificationsYour verification historyA
Read-onlyIdempotent
Inspect

AUTHENTICATED (approved accountants). Your OWN recent verification submissions, newest first — each receipt records which skill(s) you touched, whether the change applied, the fact count, the jurisdiction, and the date. Read-only; not scoped to one jurisdiction (you see all your own work).

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMax receipts to return (default 20, max 100).

Output Schema

ParametersJSON Schema
NameRequiredDescription
countNo
messageYes
next_actionNo
verificationsNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true and destructiveHint=false. The description adds behavioral context beyond annotations, such as requiring authentication and detailing the content of receipts, without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long, front-loaded with key information, and every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema, the description adequately explains the return value content (skills, change applied, fact count, jurisdiction, date). The tool is simple with one optional parameter, and all relevant context is covered.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with the 'limit' parameter already described in the schema. The description adds no extra semantic meaning for the parameter beyond what the schema provides, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states 'Your OWN recent verification submissions, newest first' and lists what each receipt records, clearly defining the tool's scope and distinguishing it from siblings like 'list_verification_targets' or 'submit_verification'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description indicates authentication requirement ('AUTHENTICATED (approved accountants)') and scope ('not scoped to one jurisdiction'), but does not explicitly state when not to use or mention alternatives; however, context makes usage clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_rule_facetsList queryable rule facets (no args)A
Read-onlyIdempotent
Inspect

Returns the metadata you can filter on with search_rules — the live jurisdictions, the domains, roles, block types (rule kinds), statuses, tax years, and a sample of topics — plus the defaults. Call this before search_rules to learn the valid filter values rather than guessing.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Output Schema

ParametersJSON Schema
NameRequiredDescription
filtersNo
defaultsNo
next_actionNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, idempotentHint, destructiveHint. Description adds value by detailing what facets are returned (live jurisdictions, domains, etc.) and mentioning defaults, providing context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with purpose, no wasted words. Efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (no params, has output schema), description is complete. It explains what is returned and why to use it, and output schema covers return values.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters (0 params), so baseline is 4. Description does not need to add parameter info, and schema coverage is 100%.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it returns metadata for filtering on search_rules, listing specific facets (jurisdictions, domains, roles, etc.). It distinguishes itself from sibling tools by explicitly tying to search_rules.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'Call this before search_rules to learn the valid filter values rather than guessing.' This provides clear when-to-use guidance, though no explicit when-not or alternatives are given, but the context is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_skillsList tax & accounting skillsA
Read-onlyIdempotent
Inspect

List published OpenAccountants skills with their quality tier and verification status. Optionally filter by jurisdiction (e.g. 'US', 'MT', 'DE', 'GB'), domain (the accounting area, e.g. 'vat-gst', 'payroll', 'income-tax'), or role ('foundation' | 'compute' | 'orchestrator' | 'reference'). Results are paginated (default 100, max 200 per call) — unfiltered browsing of the full ~1,100-skill catalogue requires paging via offset/next_offset, so jurisdiction/domain filters are strongly recommended.

ParametersJSON Schema
NameRequiredDescriptionDefault
roleNoFilter by pipeline role: foundation, compute, orchestrator, reference.
limitNoMax skills to return (default 100, max 200).
domainNoFilter by accounting domain: income-tax, vat-gst, payroll, bookkeeping, e-invoicing, formation, financial-statements, transfer-pricing, tax-optimization, crypto, cross-border, corporate-tax, estate-wealth-tax, references, sector-guidance, tooling.
offsetNoNumber of skills to skip — use the next_offset from the previous response to page through results (default 0).
categoryNo(Legacy) display label; prefer domain/role.
jurisdictionNoFilter by jurisdiction code, e.g. 'US', 'MT', 'DE'

Output Schema

ParametersJSON Schema
NameRequiredDescription
limitNoPage size applied to this response.
totalNoTotal skills matching the filter (across all pages).
offsetNoOffset applied to this response.
skillsYesMatching published skills.
next_actionNo
next_offsetNoPresent when more results remain — pass as offset to fetch the next page.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already signal safety, and the description adds pagination details, total catalogue size, and filter suggestions, providing useful behavioral context beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with main purpose, no fluff. Every sentence adds essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, filters, pagination, and recommendations sufficiently for an agent to use the tool correctly. Output schema exists, so return value details are not needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but description adds value by noting 'legacy' for category and emphasizing filter recommendations, aiding parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists published skills with quality tier and verification status, and distinguishes by mentioning optional filters. It contrasts well with sibling tools like search_skills and get_skill.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides guidance on pagination and strongly recommends jurisdiction/domain filters for efficient browsing. Does not explicitly compare to alternatives but gives context for when to use filters.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_verification_targetsList facts that need your verificationA
Read-onlyIdempotent
Inspect

AUTHENTICATED (approved accountants). Find what needs your review. Call with NO slug for a TRIAGE summary of skills in YOUR approved jurisdiction(s) — each with counts of verified vs. unverified facts, most-unverified first — so you can pick one. Call with a slug to get that skill's facts in CHUNKS (default 40) — each fact has a stable row number (use it to reference facts in chat) and a key (pass to submit_fact_verification). The response includes a sections breakdown; for a large skill, review one section at a time by passing topic, and page with offset. This is the in-chat replacement for exporting a verification workbook. Always scoped to your approved jurisdictions.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugNoReturn one skill's facts (omit for the jurisdiction triage summary).
limitNoTriage: max skills (default 50, max 200). Single-skill: facts per chunk (default 40, max 200).
queryNoTriage: optional free-text filter on skill name/slug.
topicNoSingle-skill: review only one section (from the `sections` breakdown) at a time.
offsetNoSingle-skill: page offset into the (optionally topic-filtered) facts.
statusNoPass 'unverified' to list only skills (triage) or facts (single-skill) that are still unverified.
jurisdictionNoLimit the triage to one of your approved jurisdictions (e.g. 'MT'). Defaults to all of them.

Output Schema

ParametersJSON Schema
NameRequiredDescription
factsNoSingle-skill mode: a chunk of the skill's verifiable facts.
skillNoPresent in single-skill mode: { slug, name, jurisdiction }.
skillsNoTriage mode: per-skill summaries, most-unverified first.
messageNo
truncatedNo
next_actionNo
total_skillsNo
jurisdictionsNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and no destructive actions. The description adds behavioral details such as chunking (default 40), pagination via offset, row numbers, keys, sections breakdown, and the auth requirement ('AUTHENTICATED'). This goes beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is fairly long but well-structured: starts with auth note, purpose, then two modes with details. Every sentence adds value, though it could be slightly more concise. For a complex tool with 7 parameters and two modes, the length is justified.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the existence of an output schema, the description still explains the return structure (chunks, row, key, sections) and pagination. It covers scope, auth, and filtering. It is contextually complete for the complexity of the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds significant meaning beyond the schema: it explains dual use of 'limit', triage vs single-skill modes, topic and offset usage, and the response structure. This adds substantial value for parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is for listing verification targets (facts needing review) with two distinct modes: triage summary without slug and detailed facts with slug. It distinguishes from sibling tools like 'list_my_verifications' by specifying it's for approved accountants and is an in-chat replacement for exporting a verification workbook.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly explains when to call without slug (for triage) and with slug (for specific skill facts), and details filtering options. It does not explicitly exclude alternatives but provides clear context for use, scoring a 4.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_verifiersList named accountant verifiersA
Read-onlyIdempotent
Inspect

Returns named licensed accountants who have signed off on OpenAccountants jurisdictions. Use ONLY when the user explicitly asks to see the verifier network or 'who verified this skill'. Do NOT use this to check whether a jurisdiction is covered before calling request_accountant_review — just call request_accountant_review directly, it routes to the right person regardless.

ParametersJSON Schema
NameRequiredDescriptionDefault
jurisdictionNoOptional ISO code filter — only return verifiers for this jurisdiction.

Output Schema

ParametersJSON Schema
NameRequiredDescription
totalNo
verifiersNo
next_actionNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, openWorldHint=true, idempotentHint=true, and destructiveHint=false, indicating safe, read-only behavior. The description adds context by specifying that it returns 'named licensed accountants', but does not introduce any additional behavioral traits beyond what annotations cover.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, consisting of two sentences that front-load the purpose and provide clear usage guidelines. No superfluous information is present.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one optional parameter, robust annotations, and an output schema), the description is complete. It covers the tool's purpose, usage context, and restrictions without needing additional elaboration.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with the single parameter 'jurisdiction' already described in the schema as 'Optional ISO code filter — only return verifiers for this jurisdiction.' The tool description does not add extra meaning beyond the schema's description, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Returns', the resource 'named licensed accountants', and the context 'who have signed off on OpenAccountants jurisdictions'. It distinguishes itself from sibling tools by providing specific usage instructions, such as not using it to check jurisdiction coverage.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit when-to-use conditions ('when the user explicitly asks to see the verifier network or 'who verified this skill'') and when-not-to-use ('Do NOT use this to check whether a jurisdiction is covered before calling request_accountant_review'). It also gives an alternative action: 'just call request_accountant_review directly'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_workflowsList guided tax workflowsA
Read-onlyIdempotent
Inspect

List published OpenAccountants workflows — guided, multi-step procedures (e.g. 'Prepare a Malta VAT return') built from skills. Optionally filter by jurisdiction (e.g. 'MT'), workflow_type ('self-employed' | 'vat' | 'payroll' | 'corporate' | 'cross-border' | 'capital-gains' | 'crypto'), or a free-text query. Paginated.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMax results (default 50, max 100)
queryNoFree-text match on title/description
offsetNoPagination offset
jurisdictionNoJurisdiction code, e.g. 'MT', 'US-CA'
workflow_typeNoOne of the 7 workflow types

Output Schema

ParametersJSON Schema
NameRequiredDescription
limitNo
totalNo
offsetNo
workflowsYes
next_offsetNo
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint. Description adds pagination and filtering behavior, which is useful but not critical beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first sets purpose, second details filters and pagination. Concise and well-structured with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Coverage of pagination, filtering options, and purpose is sufficient for a list tool, especially with output schema available.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Adds value beyond schema by providing example jurisdiction codes and explicitly listing workflow_type options; schema already describes parameters completely.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists published workflows with specific examples (e.g., 'Prepare a Malta VAT return'), distinguishes from sibling tools like get_workflow or list_skills.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explanation of optional filters (jurisdiction, workflow_type, query) provides clear context for when to use with filtering, but lacks explicit guidance on when not to use or alternatives among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

plan_cross_borderPlan a cross-border / multi-country tax situationA
Read-onlyIdempotent
Inspect

THE cross-border tool. Use this — not compare_jurisdictions — whenever a person's facts touch more than one country: a US citizen living abroad, a dual resident, someone changing residence, a non-dom, an expatriating citizen, or an owner of a foreign trust/company. Unlike compare_jurisdictions (which loads each country as an independent block and disclaims treaty/PE interaction), this returns a SEQUENCED plan: it builds the residency/citizenship/domicile map, identifies the country skills AND the international topic skills (FEIE/FTC, FBAR/FATCA, CFC/GILTI, foreign trusts, exit tax) the facts engage, fixes the ORDER of events (order changes the tax — e.g. sever residency before vs. after a sale), names the verifier per country, states the treaty bridge for double-tax relief, and mandates a request_accountant_review hand-off to the lead country's accountant. Always load cross-border-tax-router + cross-border-tax-workflow-base first (returned in load_first). Output is research-grade (tier 2) until a licensed human signs off.

ParametersJSON Schema
NameRequiredDescriptionDefault
eventNoThe asset/income/event in question and ideally WHEN — e.g. 'sell an Australian discretionary trust in 2026', 'renounce US citizenship', 'exercise founder options before moving'.
assetsNoOptional: foreign entities/assets owned — companies, trusts, partnerships, pensions, foreign funds. Drives the anti-deferral and reporting skills.
domicileNoOptional: domicile, if a remittance-basis country (UK, Malta, Ireland, Cyprus) is involved.
citizenshipNoCitizenship(s) held — ISO codes, slugs, or names (e.g. ['US', 'malta']). Drives citizenship-based taxation (US) and exit-tax tests.
event_timingNoWhen is the key event expected to occur? e.g. 'sale expected to complete in 6 weeks', 'planning to move in 3 months', 'still conceptual'. Drives urgency assessment and scenario feasibility.
tax_residencesNoCountry(ies) of tax residence now, and any country being moved to/from. First entry is treated as the primary residence.

Output Schema

ParametersJSON Schema
NameRequiredDescription
needsNo
statusNo
guardrailsNo
load_firstNo
next_actionNo
residency_mapNo
country_blocksNo
engagement_scopeNoPer-country advisor scope derived from the facts and skills available.
lead_jurisdictionNo
scenario_guidanceNo2-4 event-ordering scenarios derived from the facts, each with a sequence, consequence summary per country, and urgency rating.
sequence_guidanceNo
treaty_bridge_noteNo
plausibility_warningsNoPresent when the residency facts look impossible (e.g. >2 concurrent residences, or origin==destination). The agent MUST confirm these with the user before computing.
uncovered_jurisdictionsNo
international_topic_skillsNo
us_citizenship_taxation_engagedNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false. The description adds valuable context: output is research-grade (tier 2) until human sign-off, it returns a sequenced plan, and it explains why ordering matters. There is no contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single dense paragraph that front-loads the essential purpose and usage. Every sentence adds value, from the explicit differentiation to the output structure and dependencies. No wasted words despite the complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers what the tool does, when to use it, what it returns (including dependencies and output structure), and the need for human review. Given the tool's complexity and the presence of an output schema, this is complete and leaves no major gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with good parameter descriptions. The tool description does not add new meaning beyond the schema but provides context on how parameters like citizenship and tax_residences are used in the broader tool logic. Baseline 3 is appropriate as the schema already does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states 'THE cross-border tool' and lists specific scenarios (US citizen abroad, dual resident, changing residence, etc.), clearly distinguishing it from compare_jurisdictions. The verb 'plan' combined with resource 'cross-border tax situation' is specific and unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives explicit when-to-use guidance: 'Use this — not compare_jurisdictions — whenever a person's facts touch more than one country' with concrete examples. It also instructs to load specific dependencies first and mandates a request_accountant_review hand-off, providing clear invocation guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

preview_fact_verificationPreview a fact verification (no write)A
Read-onlyIdempotent
Inspect

AUTHENTICATED (approved accountants). Dry-run of submit_fact_verification — IDENTICAL arguments, but writes NOTHING. Returns whether the change would apply and any reviewer warnings, so you can see the automated reviewer's take (e.g. 'that citation looks like a placeholder') before committing. Then call submit_fact_verification to apply.

ParametersJSON Schema
NameRequiredDescriptionDefault
rowsYesSame shape as submit_fact_verification.rows.
slugYesSkill slug.

Output Schema

ParametersJSON Schema
NameRequiredDescription
skillsNo
messageYes
warningsNo
next_actionNo
would_applyNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, idempotentHint, and destructiveHint. The description adds critical context: writes nothing, returns change applicability and warnings, and requires authentication. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences that front-load authentication and purpose. Every sentence is necessary and informative, with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists, the description appropriately summarizes return values (whether change applies and warnings). It covers authentication, behavior, and connection to sibling tool, leaving no gaps for proper usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters. The description adds significant value by stating that the rows parameter has the same shape as submit_fact_verification.rows, and that arguments are identical to the sibling tool, aiding in correct parameter usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a dry-run of submit_fact_verification that writes nothing and returns whether the change would apply and warnings. It explicitly distinguishes from its sibling submit_fact_verification by noting it is a preview.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidance: requires authentication (approved accountants), used as a dry-run before submit_fact_verification, and explicitly names the sibling tool to call for actual application. It includes a concrete example of when to use it ('to see reviewer warnings').

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

publish_workflowPublish a workflowA
Idempotent
Inspect

Make a workflow live on the site. Preconditions: at least one published node, and every wired skill is published and in the same jurisdiction. Restricted to the workflow's creator or an admin.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
messageYes
publishedNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds behavioral context beyond annotations, such as preconditions and access restrictions. Annotations already indicate idempotent and non-destructive behavior; the description does not contradict them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and front-loaded with the primary action. Every sentence adds value, and there is no redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the main action, preconditions, and access control, but omits crucial parameter explanation. The presence of an output schema mitigates the need to describe return values, but the missing parameter context makes it incomplete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description must explain the 'slug' parameter but fails to provide any meaning or example. The agent cannot determine what value to supply for this required parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Make a workflow live on the site,' which is a specific verb-resource combination. It distinguishes from sibling tools like 'publish_workflow_node' which only publishes a single node, and 'create_workflow' which creates but does not publish.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lists preconditions and access restrictions, guiding when to use the tool. However, it does not explicitly mention alternative tools or state when not to use it, slightly reducing clarity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

publish_workflow_nodePublish a workflow nodeA
Idempotent
Inspect

Promote a draft node to published (visible on the site). Restricted to the node's creator or an admin.

ParametersJSON Schema
NameRequiredDescriptionDefault
node_idYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
messageYes
publishedNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds value beyond annotations by specifying the access control (creator/admin only) and the state change (draft to published). Annotations already indicate mutation (readOnlyHint=false) and idempotency (idempotentHint=true), so the description complements them well.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, short sentence that conveys the purpose and key constraint efficiently. It is concise and front-loaded, though it could be slightly more structured by separating the access restriction, but it remains clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one parameter, no nested objects) and the presence of an output schema, the description adequately covers the core behavior and access restrictions. It does not need extensive additional context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only one parameter (node_id) and 0% schema description coverage, the description does not elaborate on the parameter's format or meaning. However, the parameter is simple and self-explanatory from the schema, so a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses the specific verb 'Promote' and clearly identifies the resource as a 'draft node to published', making the action unambiguous. It distinguishes from siblings like 'publish_workflow' which targets entire workflows.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states when to use this tool (promote a draft node) and includes an access restriction ('Restricted to the node's creator or an admin'), providing clear context. However, it does not explicitly mention when not to use or name alternatives, so it's slightly less than perfect.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

remove_workflow_skillRemove a skill from a workflowA
Idempotent
Inspect

Detach a skill from a workflow's skill set and recompute tier. Verified accountants (in-jurisdiction) + admins.

ParametersJSON Schema
NameRequiredDescriptionDefault
skill_slugYes
workflow_slugYes

Output Schema

ParametersJSON Schema
NameRequiredDescription
tierNo
messageYes
removedNo
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide idempotentHint=true and destructiveHint=false, indicating safe, non-destructive behavior. The description adds that the operation recomputes the tier and requires authorization, which is useful. However, it does not disclose error cases or side effects beyond these.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is brief (two sentences) and front-loads the core action. It includes necessary authorization info. However, it could be slightly more structured, but it is concise enough.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (2 required string params, no enums), and the existence of an output schema (details not shown), the description is partially complete. It covers the action and authorization but lacks parameter semantics and error handling details, leaving gaps for safe usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, yet the description provides no information about the parameters (skill_slug, workflow_slug). The description only explains the overall operation, not the meaning or format of the inputs. This is insufficient for an agent to know what values to supply.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action: 'Detach a skill from a workflow's skill set and recompute tier.' It specifies the resource (skill from workflow) and the verb, distinguishing it from siblings like add_workflow_skill.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides usage context by specifying authorized users: 'Verified accountants (in-jurisdiction) + admins.' This is a clear prerequisite. It implies when to use (when removing a skill) compared to the sibling add_workflow_skill, but does not explicitly list alternatives or when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

request_accountant_reviewRequest a licensed accountant to reviewAInspect

THE handoff tool. Call this for ANY jurisdiction whenever (a) the user wants their working paper reviewed before filing, (b) the situation needs professional sign-off, (c) it involves cross-border or high-stakes advice, (d) the user asks to speak to an accountant, or (e) real money is at stake. BEFORE calling this tool: ask the user for their email address (contact_email) and name (contact_name) — the accountant needs these to follow up if the user does not book via Calendly. Do NOT proceed without at least contact_email. Do NOT call list_verifiers first. The network handles coverage. CRITICAL: always pass the full working_paper so the reviewer sees the computation before the call.

ParametersJSON Schema
NameRequiredDescriptionDefault
urgencyNourgent = filing in <2 weeks; standard = current filing season; planning = future-year strategy.
scenarioYesBrief description of the situation the user needs reviewed — e.g. '2025 sole-trader Schedule C with home office + crypto disposals + new dependant', 'considering S-corp election', 'multi-state RSU vest'. The verifier reads this before the call.
tax_yearNoTax year the review concerns, if relevant.
worksheetNoOptional structured worksheet (JSON object) conforming to WORKSHEET_CONTRACT.md: { jurisdiction, tax_type, period, currency, lines:[{net_box,vat_box,net,rate,vat}], boxes:[{box,label,amount,sources:[{label,amount}]}], result:{type,amount} }. Provide for VAT returns where box-level reconciliation can be foot-checked. Not required for income tax or other working papers — the prose working_paper alone is sufficient.
source_urlNoOptional public URL where the working paper can also be viewed (e.g. a Google Doc the user authored, a Notion page).
contact_nameNoUser's name — ask for this alongside the email. The accountant uses it to address the user before the call.
jurisdictionYesISO code or slug for the user's tax jurisdiction (e.g. 'ZA', 'US-CA', 'malta'). Required.
contact_emailNoUser's email address — REQUIRED. Ask the user for this before calling the tool. The accountant needs it to follow up if the user does not book via Calendly. Do not submit without it.
working_paperNoThe full working paper — classified transactions, computation, draft return lines, issue map — as plain markdown. ALWAYS pass this when you have produced any structured tax output. Without it, the accountant walks into the call blind. With it, they can review before the call and the user gets a better outcome. Capped at 512 KB UTF-8; trim if needed. No worksheet JSON required — the prose working paper alone is sufficient to create the consultation request.
working_paper_formatNoFormat hint for the working paper. Default 'markdown'.

Output Schema

ParametersJSON Schema
NameRequiredDescription
messageYes
capturedYesWhether the request landed server-side.
accountantNo
request_idNo
calendly_urlYes
capture_errorNo
worksheet_attachedNo
no_verifier_assignedNo
working_paper_attachedNo
worksheet_recon_statusNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses that the tool initiates a human review process, requiring critical inputs (contact info, working_paper) for accountant follow-up. Adds behavioral context beyond annotations: the accountant reads the working paper before a call, and the tool should not be called without proper preparation. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with front-loaded purpose, clear bullet-point conditions, and explicit warnings. Slightly verbose in listing all scenarios but each sentence adds value. Could be marginally more concise without losing clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers prerequisites, parameter usage, and important warnings. Lacks explicit expectations about post-call outcomes (e.g., confirmation, timing). With an existing output schema, this gap is partially mitigated, but a brief note on what happens after calling would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Despite 100% schema coverage, the description enriches parameter meaning: emphasizes contact_email and working_paper as critical, explains worksheet is optional for VAT, and describes how scenario aids the verifier. Adds usage context beyond schema definitions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies it as a handoff tool for accountant review, listing five specific trigger scenarios (working paper review, professional sign-off, cross-border/high-stakes, user requests accountant, real money at stake). It distinguishes itself by warning against calling list_verifiers first, showing purpose differentiation from siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to call (five clear criteria), what prerequisites to fulfill (ask for contact_email and contact_name), and what not to do (do not call list_verifiers). Also warns not to proceed without contact_email, providing unambiguous usage boundaries.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

retract_skillRetract a skill you createdA
Idempotent
Inspect

AUTHENTICATED. Unpublish a skill YOU created (e.g. a submit_skill draft you want to take back). Pass the slug. It's hidden from the directory immediately; its facts are kept and an admin can re-publish. You can only retract a skill you contributed (admins can retract any). Requires sign-in and completed onboarding.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYesSlug of the skill to retract (unpublish).

Output Schema

ParametersJSON Schema
NameRequiredDescription
okYes
slugNo
messageYes
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide safe hints; description adds authentication requirement, retention of facts, re-publish ability, and role restrictions. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences packing essential info: action first, then effects and constraints. No wasted words, front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple one-parameter tool with output schema and annotations, the description covers behavior, constraints, and effects thoroughly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only one parameter 'slug' with schema coverage at 100%. Description mentions 'Pass the `slug`' but adds no semantic depth beyond schema. Baseline 3 due to full schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the action 'Unpublish' and the resource 'skill YOU created'. It distinguishes from sibling 'submit_skill' by mentioning 'draft you want to take back'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use (retract a skill you created) and who can use (you or admin). Provides context on effects and restrictions, but lacks explicit when-not-to-use statements.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_rulesSearch rules across jurisdictionsA
Read-onlyIdempotent
Inspect

Query individual tax RULES/FACTS (rates, thresholds, rules, definitions, tables) ACROSS jurisdictions and metadata, and get back a bundled markdown rule set the user can save and run locally. Unlike get_skill (one whole skill), this assembles a cross-cutting SET — e.g. 'VAT rates in MT, IE and DE', 'all income-tax thresholds for 2025', or 'rules mentioning reverse charge'. By default returns ALL matching rules, each tagged with its verification status; pass status:'verified' for accountant-/research-verified only. Call list_rule_facets first to see the queryable values.

ParametersJSON Schema
NameRequiredDescriptionDefault
textNoFree-text search over each rule's label, value, and citation.
limitNoMax rules to return (default 200, max 500).
rolesNoSkill roles: foundation | compute | orchestrator | reference.
topicNoFilter by a fact topic.
offsetNoPagination offset — pass the previous response's next_offset.
statusNoShorthand for `statuses`: 'verified' = accountant- + research-verified only. Default 'all'.
domainsNoAccounting domains, e.g. ['vat-gst','income-tax']. See list_rule_facets.
statusesNoVerification statuses to include. Default = all (each rule is tagged).
tax_yearNoLimit to a tax year, e.g. 2025.
block_typesNoRule kinds to include. Default = all of these (framing prose + workflow steps are excluded).
jurisdictionsNoISO codes to include, e.g. ['MT','US-CA']. Omit for all jurisdictions.

Output Schema

ParametersJSON Schema
NameRequiredDescription
totalNoTotal matching rules (across all pages).
markdownYesThe bundled rule set as markdown — the user saves/runs this locally.
truncatedNo
fact_countYesRules returned on this page.
next_actionNo
next_offsetNo
jurisdictionsNo
accountant_verified_countNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, openWorldHint, idempotentHint, destructiveHint) are all consistent with a read-only, safe query tool. The description adds useful behavioral context: returns all matching rules by default, each tagged with verification status, and can filter by status. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with the core purpose, examples, and key differentiators. Every sentence adds value without redundancy. Well-structured and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 11 parameters, a rich schema, and an output schema present, the description is thorough. It covers purpose, usage guidance, default behavior, and hints for complementary tools. The existence of an output schema means return values need not be explained.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% so baseline is 3. The description adds value beyond the schema by explaining the status shorthand ('verified' = 'accountant_verified' + 'research_verified'), the default limit of 200, and referencing list_rule_facets for domains/jurisdictions. This extra context raises the score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool queries tax rules/facts across jurisdictions and returns a bundled markdown rule set. It differentiates from get_skill (one whole skill) with concrete examples like 'VAT rates in MT, IE and DE', making the purpose specific and distinguishable from siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit guidance is given: suggests calling list_rule_facets first to see queryable values, explains that by default all matching rules are returned, and describes the status parameter shorthand ('verified' = accountant- + research-verified). This helps the agent decide when to use this vs get_skill.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_skillsSearch skills by keywordA
Read-onlyIdempotent
Inspect

Full-text search across all published tax and accounting skills. Find, lookup, query, or discover skills by keyword, tax concept, deduction type, form number, or regulation (e.g. 'home office deduction', 'crypto capital gains', 'reverse charge', 'Schedule C', '60-day reporting'). Optionally limit to one jurisdiction. Use this when you don't know the exact skill slug.

ParametersJSON Schema
NameRequiredDescriptionDefault
queryYesSearch term, e.g. 'home office deduction', 'crypto capital gains', 'reverse charge'
domainNoOptional accounting domain to limit the search (e.g. 'vat-gst', 'payroll', 'income-tax', 'crypto').
jurisdictionNoOptional ISO 2-letter country code to limit the search

Output Schema

ParametersJSON Schema
NameRequiredDescription
totalYes
resultsYes
next_actionNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint, so the description's main contribution is clarifying that search is over published skills and optional filtering by jurisdiction/domain. It does not reveal rate limits or pagination but is adequate for a search tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: two sentences that front-load the core purpose and immediately provide examples and usage guidance. No word is wasted.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema, the description does not need to explain return values. It covers search behavior, filtering options, and usage scenarios thoroughly for a tool of this complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the description adds only marginal value by providing search examples and clarifying optional parameters. It does not introduce new constraints or format details beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it performs full-text search across published tax and accounting skills, with specific examples of search terms. It distinguishes itself from sibling tools like get_skill (exact slug lookup) and list_skills (listing all skills) by focusing on keyword-based search.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use this when you don't know the exact skill slug', which directly guides the agent on when to use this tool versus get_skill. The openWorldHint annotation further indicates it's for exploration.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

startStart a tax workflowA
Read-onlyIdempotent
Inspect

Front door for any tax / accounting question once you know what the user wants. intent is REQUIRED (e.g. 'taxes', 'VAT return', 'set up a company', 'find deductions', 'classify transactions', 'payroll'); pass a jurisdiction too (ISO 2-letter, e.g. 'MT', 'GB', 'US-CA'). If you don't yet have an intent, call start_help first. Returns either a clarification request (if jurisdiction is missing) or a ready-to-execute plan with the list of skills to load. Call this FIRST (after start_help if needed) whenever the user asks for tax help.

ParametersJSON Schema
NameRequiredDescriptionDefault
intentYesUser intent — REQUIRED. Free text, e.g. 'taxes', 'VAT return', 'set up a company'.
acting_asYesREQUIRED. Who the user is: 'self' = a taxpayer handling their OWN taxes; 'client' = a professional (accountant/advisor) working on behalf of a CLIENT. Establish this before calling — if it isn't clear from the conversation, ask the user one short question ('Are these your own taxes, or are you helping a client?'). Never guess.
jurisdictionNoISO 2-letter code or US state code (e.g. 'MT', 'GB', 'US-CA').

Output Schema

ParametersJSON Schema
NameRequiredDescription
needsNo
intentNo
statusNo
guardrailsNo
next_actionNo
expectationsNo
jurisdictionNo
skills_to_loadNo
available_intentsNo
available_jurisdictionsNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, idempotentHint, destructiveHint) already indicate safe, read-only behavior. The description adds value by detailing what the tool returns (clarification request or ready-to-execute plan) and how it handles missing jurisdiction. This enriches understanding beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and well-structured: it opens with the purpose, details each parameter, explains return behavior, and gives ordering guidance. Every sentence serves a purpose, and it is easy to scan.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's role as an entry point, the description covers all essential aspects: required parameters (intent, acting_as), optional jurisdiction, return types, and prerequisite call to start_help. It also provides fallback behavior for missing jurisdiction. This is fully adequate for an AI agent to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds practical guidance: intent is 'REQUIRED' with examples, acting_as explains when to use 'self' vs 'client' and suggests clarifying if unclear, jurisdiction format is specified. This enhances the schema's descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is the 'front door for any tax / accounting question' and specifies that intent is required. It distinguishes itself from the sibling 'start_help' by noting that if no intent is known, one should call that first. The verb 'start' paired with 'tax workflow' is specific and unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says when to use this tool ('once you know what the user wants') and when not to ('If you don't yet have an intent, call start_help first'). It also provides guidance on the 'acting_as' parameter, instructing to ask the user if unclear. This gives clear decision criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

start_helpGet tax-workflow scoping guidance (no args)A
Read-onlyIdempotent
Inspect

No-argument front door — call this FIRST whenever a user asks 'how can you help me?', 'what can you do?', 'where do I start?', or otherwise opens vaguely (do NOT answer such questions by listing your tools or calling list_jurisdictions). For a signed-in approved accountant it returns a personalized orientation briefing (their standing + what their jurisdiction needs + one next action). For everyone else it returns the two scoping questions plus the available intents and jurisdictions. Once you have an intent, call start(intent, jurisdiction).

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Output Schema

ParametersJSON Schema
NameRequiredDescription
needsNo
statusNo
next_actionNo
available_intentsNo
available_jurisdictionsNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, idempotentHint as true, and destructiveHint as false. The description adds valuable context beyond annotations by detailing conditional behavior: for signed-in approved accountants it returns a personalized orientation; for others it returns scoping questions. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, with the key instruction front-loaded. It is efficient and every sentence adds value, though the first sentence is somewhat long. Could be slightly more terse, but overall well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has no parameters and an output schema exists, the description sufficiently covers what the tool returns for two user categories. It provides actionable next step. It lacks details on error handling or non-signed-in user behavior, but is complete enough for a simple scoping tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The tool has no parameters, and the description explicitly states 'No-argument front door'. Schema coverage is 100% (trivial). The description adds no further parameter info needed. Baseline 4 for zero parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a no-argument front door for scoping guidance, with explicit verb ('get scoping guidance') and resource. It distinguishes from sibling tool list_jurisdictions by stating 'do NOT answer such questions by listing your tools or calling list_jurisdictions'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit when-to-use instructions: 'call this FIRST whenever a user asks how can you help me?' It also tells when not to use alternatives and gives next action: 'Once you have an intent, call start(intent, jurisdiction).'

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

start_skill_draftStart drafting a new skillA
Idempotent
Inspect

AUTHENTICATED. Begin authoring a BRAND-NEW skill to contribute to the OpenAccountants directory (skills are markdown playbooks that teach an AI agent to do a bounded task with cited rules — they need not be tax/accounting). Pass the user's intent in natural language. Returns a ready-to-execute authoring_prompt: run it directly — its first step interviews the user for the specifics this skill needs (scope, jurisdiction/domain, the exact rules/computations, and the source for each claim), its second authors skill.md following the guidelines. Then call submit_skill. This tool does NOT write the skill and does NOT check for duplicates (the reviewer agent dedups on submit). Requires sign-in and completed onboarding.

ParametersJSON Schema
NameRequiredDescriptionDefault
intentNoWhat the user wants the skill to do, in natural language.
jurisdictionNoOptional jurisdiction code (e.g. 'MT', 'US-CA') or 'general' for a domain-agnostic skill.

Output Schema

ParametersJSON Schema
NameRequiredDescription
guidelinesNoThe authoring guidelines the prompt follows (structure + the grounding rule).
next_actionYes
authoring_promptYesExecute this directly: it interviews the user, then authors skill.md.
guidelines_versionYesVersion of the guidelines; carry into submit_skill verbatim.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=false and idempotentHint=true, but the description adds critical context: the tool returns a prompt to be executed, does NOT write the skill, and does NOT check duplicates, which goes beyond annotation hints. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single well-organized paragraph, front-loaded with authentication requirement and purpose. Some redundancy with schema, but overall efficient and structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity and presence of an output schema, the description adequately explains the return value and subsequent steps. Could include more detail about the authoring_prompt structure, but output schema likely covers that.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and schema descriptions already clearly explain both parameters. The description repeats the schema descriptions without adding significant new meaning, so meets baseline but does not exceed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Start drafting'), resource ('skill'), and distinguishes from siblings by noting that this tool does NOT write the skill nor check for duplicates, setting it apart from submit_skill and similar tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit guidance on when to use (to start drafting a new skill), what to do with the output (run the authoring_prompt, then call submit_skill), and prerequisites (requires sign-in and completed onboarding). Does not explicitly state when not to use, but context implies alternatives like search_skills for existing skills.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

submit_fact_verificationVerify or correct a skill's factsA
Idempotent
Inspect

AUTHENTICATED (approved accountants). Attest or correct one or more individual facts of a skill — directly, no workbook. Pass the skill slug and rows: one entry per fact you're acting on, each with its fact_key (from list_verification_targets) and a status — 'correct' (attest as-is), 'needs-correction' (fix the value and/or citation; a source is REQUIRED), or 'needs-context' (flag as unsure with a note, no fix). Your change applies IMMEDIATELY and the served skill is regenerated from its facts — you carry the liability. An automated reviewer checks each correction; any concern comes back as a non-blocking warning and the change STILL applies. Call preview_fact_verification first if you want the reviewer's take before committing. Scoped to your approved jurisdictions.

ParametersJSON Schema
NameRequiredDescriptionDefault
rowsYesOne entry per fact you're verifying/correcting.
slugYesSkill slug, e.g. 'us-crypto-tax'.

Output Schema

ParametersJSON Schema
NameRequiredDescription
skillsNo
statusNoPresent on a no-op: not_found | not_authorized | needs_source | noop.
appliedNo
messageYes
warningsNoNon-blocking reviewer concerns — the change still applied.
receipt_idNo
ungroundedNoItems rejected for an empty source (the one hard floor).
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses key behaviors beyond annotations: changes apply immediately, the skill is regenerated from facts, the user carries liability, and an automated reviewer checks corrections but warnings are non-blocking and changes still apply. This aligns with the annotations (readOnlyHint=false, destructiveHint=false, idempotentHint=true) and adds valuable context about the reviewer's role.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is moderately long but every sentence adds value: authentication, purpose, row structure, effects, liability, reviewer behavior, and sibling reference. It is front-loaded with the authentication requirement and is well-organized, though slightly dense.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (2 parameters, one being an array of objects), the description covers the core inputs, immediate effects, liability, reviewer behavior, and jurisdictional scoping. It does not describe the output schema, but that is provided separately. The description is complete for an AI agent to understand how to use the tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds meaning: it explains that each row corresponds to one fact, clarifies the statuses ('correct', 'needs-correction', 'needs-context', 'skip') with brief definitions, emphasizes that a source is REQUIRED for needs-correction, and notes that corrected_value can be omitted to fix only the citation. This enriches the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool attests or corrects individual facts of a skill ('Attest or correct one or more individual facts of a skill — directly, no workbook'). It distinguishes from the sibling tool 'preview_fact_verification' by advising to call it first if a review is desired, and the action is direct without workbook involvement. The verb 'attest or correct' and resource 'individual facts' are specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit context: only for authenticated approved accountants, scoped to their jurisdictions, and recommends calling 'preview_fact_verification' first if review is wanted. It also notes that changes apply immediately and liability rests with the user. However, it does not explicitly state when NOT to use this tool versus other verification tools like 'submit_verification'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

submit_feedbackSubmit feedback on a skillA
Read-onlyIdempotent
Inspect

When the user finds an error in a skill, says rates look outdated, or wants to suggest an improvement, call this to generate a pre-filled GitHub Issue URL. The URL opens in the user's browser with the report partially filled — they review and submit. This creates a public feedback loop that maintains skill quality over time. Use whenever the user says 'this seems wrong', 'the rate is outdated', 'add this rule', or asks how to flag an issue.

ParametersJSON Schema
NameRequiredDescriptionDefault
skill_slugYesSlug of the skill the feedback relates to (e.g. 'malta-income-tax').
descriptionYesWhat's wrong, outdated, or missing — be specific.
feedback_typeNoCategory of feedback.
user_jurisdictionNoOptional: the user's jurisdiction context for the report.

Output Schema

ParametersJSON Schema
NameRequiredDescription
messageNo
github_issue_urlNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint, idempotentHint, and non-destructive behavior; the description adds context about the URL being opened in the user's browser for review and submission, and mentions the public feedback loop. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences: first states purpose, second explains behavior, third provides usage cues. Well-structured, front-loaded, and no redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists (context confirms), the description covers the flow, usage scenarios, and behavioral context. It is complete for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description doesn't need to add much. It mentions skill_slug and description as required, and gives examples of feedback keywords, but does not elaborate beyond the schema details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool generates a pre-filled GitHub Issue URL for feedback on a skill, specifying the verb (submit_feedback) and resource (feedback). It differentiates from siblings by focusing on user feedback, not data retrieval or comparison.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit when-to-use scenarios ('error', 'outdated', 'suggestion') and example user utterances ('this seems wrong', 'the rate is outdated'). It lacks explicit when-not-to-use guidance but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

submit_skillSubmit a new skill for inclusionA
Idempotent
Inspect

AUTHENTICATED. Submit an authored skill_markdown (the full skill, frontmatter + body) for inclusion in the OpenAccountants directory. It's lint-checked (parseable frontmatter, required sections, well-formed slug not colliding with an existing skill, reasonable size); malformed input is rejected synchronously with the reason. Otherwise the skill is CREATED immediately at Q2 (source-cited draft) and published: the prose is deconstructed into facts so the skill enters the same keyed model as a verified one, and the served document is generated from those facts. There is NO human review gate — you are liable for the skill you author; it's clearly marked unverified until a CPA/EA verifies it to Q1. Requires sign-in and completed onboarding.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameNoHuman-readable skill name (defaults to the frontmatter `name`).
slugNoProposed slug (lowercase kebab). Defaults to the frontmatter `name` if omitted.
categoryNoCategory (tax topic or non-tax domain).
depends_onNoBase/parent skill slugs this loads on top of.
jurisdictionNoJurisdiction code, or 'general'.
skill_markdownYesThe complete authored skill (YAML frontmatter + markdown body).

Output Schema

ParametersJSON Schema
NameRequiredDescription
slugNo
tierNoQuality tier of the new skill (2 = Q2 source-cited draft).
factsNoFacts written when deconstruction succeeded.
statusYescreated | needs_revision
messageNo
coverageNoRound-trip coverage of the facts deconstruction (0–1).
skill_idNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (openWorldHint, idempotentHint), description discloses creation, immediate publishing, liability, and unverified status until CPA/EA verification. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is somewhat long but every sentence adds important detail. Well-organized flow from authentication to validation to creation to liability. Minor room for tighter wording.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given complexity (6 params, output schema exists), description covers authentication, validation, creation, publishing, liability, and verification level. Output schema reduces need to detail return values. Thorough and complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, baseline 3. Description adds value by explaining 'skill_markdown' includes frontmatter+body and defaults for name/slug, plus behavioral context like synchronous rejection of malformed input.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Submit' and specific resource 'skill for inclusion', with details on lint-checking, creation, and publishing. It distinguishes from sibling tools like 'start_skill_draft' by focusing on final submission.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly marks 'AUTHENTICATED' and requires 'completed onboarding.' Explains synchronous rejection for malformed input and that no human review gate exists. While it doesn't provide a when-not-to-use statement, context from siblings implies this is for final submission after drafting.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

submit_skill_verdictRecord a verdict on a skill's outputAInspect

Call this AFTER the signed-in user has run a skill and reviewed its REAL output (e.g. a computed VAT return), to record their structured quality verdict against the skill and its current version. This is the highest-value feedback the platform collects — especially from accountants, whose verdicts are treated as gold. Use when the user is acting as a REVIEWER grading the AI's output ('that return is wrong', 'the figures came out off', 'rate this', 'here's what the skill got wrong'). This is product-quality QA on the SKILL — NOT a taxpayer handoff (for that, use request_accountant_review) and NOT a generic bug report (that's submit_feedback). Pass the worksheet the skill produced when you have one; the server foot-checks it.

ParametersJSON Schema
NameRequiredDescriptionDefault
scoreNoOptional 0-100 quality score the reviewer may add.
ratingYesThe reviewer's overall verdict on the skill's output. solid = correct and file-ready; minor_issues = small/cosmetic problems; significant_issues = materially wrong figures; dangerous = would cause a wrong filing or a penalty. Required.
findingsNoStructured corrections — each item identifies a figure/box and what it was vs. what it should be.
scenarioNoWhat was being computed — e.g. 'Q1 2026 Malta VAT3, standard-rated sales + EU acquisitions + a blocked entertainment input'.
worksheetNoOptional structured worksheet JSON the skill produced (WORKSHEET_CONTRACT.md shape: { jurisdiction, tax_type, period, currency, lines:[...], boxes:[...], result:{type,amount} }). If provided, the server foot-checks the arithmetic and stores the recon status.
skill_slugYesSlug of the skill that produced the output being judged, e.g. 'malta-vat-return'. Required.
suggestionNoFree-text fix or improvement for the skill.
output_summaryNoProse summary of what the skill actually produced — the figures / return lines the reviewer is judging.

Output Schema

ParametersJSON Schema
NameRequiredDescription
ratingYes
messageYes
capturedYes
skill_slugNo
verdict_idNo
recorded_asNo'gold-accountant' when an accountant left it, else the submitter's role.
capture_errorNo
skill_versionNo
worksheet_recon_statusNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotated as non-read-only and non-destructive; description adds context about high-value feedback and server foot-check of worksheet arithmetic, but could mention side effects like updating skill reputation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with key action and timing upfront, then examples and exclusions; slightly verbose but each sentence contributes useful information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool with 8 parameters and nested objects, the description covers usage context, parameter roles, and mentions output behavior (server foot-check), leaving little ambiguity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All parameters have schema descriptions (100% coverage); description adds extra context for 'worksheet' (server foot-check) and 'findings' structure, going beyond basic schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool records a quality verdict on a skill's output after review, and distinguishes from sibling tools like request_accountant_review and submit_feedback.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly specifies when to call (after review of real output), and provides alternatives for taxpayer handoff and bug reports, with specific sibling tool names.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

submit_verificationSubmit reviewed verification facts (bulk)A
Idempotent
Inspect

AUTHENTICATED. Bulk-apply a reviewed set of facts as the 'submission' JSON (the reviewed rows — Status / Corrected value / Source / Notes filled in), carrying each sheet's slug + base_version_id. It's validated (rows reviewed, known/published skill, base version still live), then applied IN-PROCESS: reconciled by fact_key, judged by an advisory reviewer, and the document is regenerated deterministically from facts — no markdown is rewritten by an LLM and nothing touches a git repo. A skill with a live base applies immediately; the reviewer's pushback comes back as non-blocking warnings. For a single fact or a quick spot-fix, prefer submit_fact_verification (no base_version bookkeeping). Requires sign-in and completed onboarding.

ParametersJSON Schema
NameRequiredDescriptionDefault
submissionNoThe reviewed facts as JSON. One sheet per skill; one row per fact.

Output Schema

ParametersJSON Schema
NameRequiredDescription
statusYesapplied | needs_revision
messageNo
warningsNoNon-blocking reviewer concerns to review — the changes still applied.
submission_idYes
applied_skillsNoSlugs updated and regenerated from facts.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds rich behavioral context beyond annotations: describes validation steps, in-process reconciliation, advisory reviewer judgment, deterministic document regeneration, no LLM rewrite, no git repo touch, and non-blocking warnings for pushback. Annotations already indicate idempotent and non-destructive, but description fills in details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is front-loaded with key information (AUTHENTICATED, bulk-apply) and then explains process and alternatives. It is detailed but efficient, with no unnecessary sentences. Could be slightly tighter but overall well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (not shown but indicated), the description does not need to explain return values. It covers tool purpose, usage context, validation, and behavioral notes. Missing explicit mention of pagination or rate limits, but complete for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with detailed descriptions for all properties. The description adds minimal value beyond the schema, but it summarizes the structure and purpose of the submission JSON, warranting a baseline 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Bulk-apply a reviewed set of facts' with specific verb and resource. Distinguishes from sibling submit_fact_verification by noting it's for bulk and includes base_version bookkeeping.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises to prefer submit_fact_verification for single facts or quick spot-fixes, providing clear when-to-use and when-not-to-use guidance. Also mentions authentication and validation prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

update_workflowEdit workflow metadataA
Idempotent
Inspect

Update a workflow's title, description, or trigger phrases. Jurisdiction, type, and slug are immutable (create a different workflow instead). Verified accountants (in-jurisdiction) + admins.

ParametersJSON Schema
NameRequiredDescriptionDefault
slugYes
titleNo
triggersNo
descriptionNo

Output Schema

ParametersJSON Schema
NameRequiredDescription
messageYes
updatedNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds context beyond annotations: it lists immutable fields and specifies authorization requirements. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences cover purpose, scope, and constraints without wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and the tool's focused scope, the description is largely complete. It covers main updates, immutability, and auth, but could detail the slug parameter.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description explains the updatable fields (title, description, trigger phrases) but does not mention the required 'slug' parameter, leaving its role as identifier unclear. With 0% schema coverage, more detail would be beneficial.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool updates workflow metadata (title, description, trigger phrases) and specifies immutable fields. This distinguishes it from sibling tools like create_workflow or add_workflow_node.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It specifies what can be updated and what is immutable, and mentions authorized users. However, it does not explicitly state when to use this tool versus alternatives like create_workflow for immutable fields.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

update_workflow_nodeEdit / implement a workflow nodeA
Idempotent
Inspect

Edit a node's content, wire it to a skill (skill_slug), or reorder it (position). Editing a PUBLISHED node returns it to draft until re-published. Verified accountants (in-jurisdiction) + admins.

ParametersJSON Schema
NameRequiredDescriptionDefault
titleNo
node_idYes
summaryNo
guidanceNo
positionNo
skill_slugNo
key_outputsNo
key_questionsNo

Output Schema

ParametersJSON Schema
NameRequiredDescription
statusNo
messageYes
updatedNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds significant behavioral context beyond annotations: editing a published node reverts to draft, and only verified accountants and admins can perform the action. The annotations provide readOnlyHint=false and destructiveHint=false, which the description complements without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, zero wasted words. The first sentence covers the core purpose, the second adds critical behavioral and authorization context. Perfectly front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers basic usage and authorization but omits explanations for 6 of 8 parameters and does not address error handling or idempotency implications despite an output schema existing. Adequate but has clear gaps for a mutation tool with many parameters.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description only adds meaning for 'skill_slug' and 'position' by explicitly naming them. The remaining 6 parameters (title, summary, guidance, key_outputs, key_questions, node_id) are not explained, leaving the agent guessing about content fields.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verbs (edit, wire, reorder) and the resource (workflow node). It distinctly separates from sibling tools like add_workflow_node and archive_workflow_node.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for modifying a node but does not explicitly state when to use this tool versus alternatives like update_workflow. It mentions a behavioral consequence (published->draft) but lacks exclusions or prerequisite context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Sign in to create a connector for this server.