GoldenMatch

Name: GoldenMatch
Author: benzsevern

by io.github.benzsevern

Server Details

Find duplicate records in 30 seconds. Zero-config entity resolution, 97.2% F1 out of the box.

Status: Healthy
Last Tested: 2026-06-20 06:34
Transport: Streamable HTTP
URL
Repository: benzsevern/goldenmatch
GitHub Stars: 30

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client

Glama

MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.

Tool Definition Quality

C2.9/5.0

Tool DescriptionsC

Average 3.2/5 across 42 of 42 tools scored. Lowest: 2.1/5.

Server CoherenceB

Disambiguation4/5

Most tools have distinct purposes, but some overlap exists (e.g., explain_match vs agent_explain_pair, list_corrections vs memory_export, analyze_data vs profile_data). Descriptions help differentiate, but there is some potential for confusion among closely related tools.

Naming Consistency3/5

Multiple naming conventions: verb_noun (add_correction, get_cluster), agent_verb_noun (agent_deduplicate), identity_noun_verb (identity_merge), and a few others (controller_telemetry). The inconsistency, especially between 'agent_' prefix and verb_noun patterns, makes naming less predictable.

Tool Count2/5

With 42 tools, the set is significantly larger than the typical well-scoped range (3-15). While the server covers a broad domain, many tools are highly specific and could be consolidated (e.g., list_corrections and memory_export). The count feels overwhelming for an MCP server.

Completeness4/5

The tool set covers nearly all aspects of entity resolution: configuration, matching, review, corrections, identity management, quality checks, and export. Minor gaps exist, such as explicit data loading or deletion of identities, but overall the surface is comprehensive and avoids major dead ends.

Available Tools

42 tools

add_correctionBInspect

Add a pair correction to Learning Memory. Source is set to 'agent' with trust=0.5 (lower than human steward decisions which are 1.0). Pair (id_a, id_b) is canonicalized to (min, max) before storage.

ParametersJSON Schema

Name	Required	Description
`id_a`	Yes
`id_b`	Yes
`path`	No	SQLite memory DB path. Default: .goldenmatch/memory.db
`reason`	No
`dataset`	Yes	Dataset identifier (e.g. file path). Required, non-empty.
`decision`	Yes
`matchkey_name`	No

Tool Definition Quality

B3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the full burden. It discloses that source is set to 'agent' with trust=0.5 and canonicalization of pair IDs. However, it does not mention side effects, idempotency, error handling, or required permissions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with the primary action, and no wasted words. Each sentence provides essential behavioral detail.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 7 parameters, no output schema, and no annotations, the description is insufficient. It does not cover return values, error conditions, or relationship to sibling tools like 'list_corrections'.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (29%). The description adds value only for id_a and id_b via canonicalization, but fails to explain other critical parameters like decision, dataset, reason, and matchkey_name.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Add a pair correction to Learning Memory') and the resource. It distinguishes from siblings like 'list_corrections' by the verb 'add'. However, it does not explicitly differentiate from other similar tools like 'agent_approve_reject'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The description implies it's for agent corrections with lower trust, but lacks clear when-to-use or when-not-to-use instructions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_approve_rejectCInspect

Approve or reject a review queue pair

ParametersJSON Schema

Name	Required	Description	Default
`id_a`	Yes
`id_b`	Yes
`reason`	No
`decision`	Yes
`job_name`	Yes
`decided_by`	Yes

Tool Definition Quality

C2.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full disclosure burden. While 'approve or reject' implies a state mutation, it fails to specify side effects (what happens to the pair post-decision?), reversibility, or authentication requirements. It doesn't clarify if this triggers automatic merges or deduplication actions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise at seven words. While efficiently structured with the action front-loaded, it borders on underspecification given the 6-parameter complexity and lack of schema descriptions. One additional sentence for parameter guidance would improve utility without sacrificing clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Inadequate for a 6-parameter workflow tool with 0% schema coverage and no output schema. Given the deduplication domain context (evident from siblings like agent_deduplicate, merge_record), the description lacks workflow integration details, return value description, or error scenarios. Critical gaps remain for safe agent invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, requiring the description to compensate. It only implicitly hints that id_a/id_b represent the 'pair'. It completely omits critical semantics: valid values for 'decision' (strings 'approve'/'reject'? boolean?), the 'job_name' context, or that 'reason' is optional text for audit trails. No syntax or format guidance provided.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

States the specific action (approve/reject) and target resource (review queue pair), which distinguishes it from siblings like agent_review_queue (likely for retrieval) or shatter_cluster (for cluster management). However, it doesn't explain what constitutes a 'review queue pair' or the business context of the decision.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides no guidance on when to use this tool versus siblings like agent_review_queue, agent_compare_strategies, or agent_explain_pair. No mention of workflow prerequisites (e.g., whether the pair must first be retrieved via agent_review_queue) or when rejection vs approval is appropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_compare_strategiesCInspect

Compare ER strategies on your data

ParametersJSON Schema

Name	Required	Description	Default
`file_path`	Yes
`ground_truth`	No

Tool Definition Quality

C2.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full disclosure burden but fails to specify whether this is read-only or destructive, what the comparison entails (accuracy vs. speed metrics), or what format results are returned in.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

While brief (5 words), it is underspecified rather than efficiently concise. The single sentence fails to earn its place by providing insufficient actionable information for a tool with complex behavior.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given zero annotations, zero schema descriptions, no output schema, and a complex domain (strategy evaluation), the description is inadequate. It lacks critical context about the comparison methodology and expected inputs/outputs.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 0% description coverage, and the description does not compensate. 'On your data' vaguely alludes to file_path but provides no details on acceptable formats, while ground_truth is completely unmentioned despite being a significant parameter for strategy comparison.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the core action ('Compare') and domain ('ER strategies'), but 'ER' is jargon that assumes familiarity with Entity Resolution. It vaguely references 'your data' without clarifying scope or what distinguishes this from siblings like test_domain or auto_configure.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance provided on when to use this tool versus alternatives. The description does not indicate prerequisites (e.g., whether a domain must exist first) or when agent_compare_strategies is preferable to test_domain or analyze_data.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_deduplicateCInspect

Run full ER pipeline with confidence gating and reasoning

ParametersJSON Schema

Name	Required	Description	Default
`config`	No
`file_path`	Yes

Tool Definition Quality

C2.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Without annotations, the description carries the full burden of behavioral disclosure. While 'confidence gating' and 'reasoning' hint at threshold-based filtering and explainability, the description omits whether the tool creates/updates records, produces side effects, requires specific prerequisites, or how it handles low-confidence matches.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely terse (7 words) with no wasted language, but this brevity is inappropriate given the tool's complexity, lack of annotations, and opaque schema. The density of jargon ('ER pipeline', 'confidence gating') without elaboration reduces clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex pipeline tool with nested configuration objects, no output schema, and zero schema documentation, a 7-word description is grossly insufficient. It fails to explain return values, pipeline stages, or integration with the sibling tools in the deduplication ecosystem.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage and a nested 'config' object containing unknown fields, the description completely fails to compensate. It does not mention the 'file_path' requirement or explain what configuration parameters are expected in the nested object, leaving critical inputs undocumented.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the core mechanism ('Run full ER pipeline') and specific features ('confidence gating and reasoning'), but fails to distinguish this tool from siblings like 'find_duplicates', 'match_record', or 'agent_match_sources'. It assumes familiarity with the 'ER' acronym without clarifying the scope of the pipeline.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to invoke this versus alternative tools. Given the extensive sibling list includes multiple deduplication and matching utilities, the description lacks critical context about when this 'full pipeline' approach is preferred over granular operations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_explain_clusterCInspect

Explain why records are in the same cluster

ParametersJSON Schema

Name	Required	Description	Default
`cluster_id`	Yes

Tool Definition Quality

C2.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. While 'explain' implies a read-only operation, the description fails to disclose the explanation format, data sources considered, or any side effects such as logging or audit trails.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description consists of a single, efficient sentence with no wasted words. However, given the lack of schema documentation and annotations, the extreme brevity arguably underserves the tool's documentation needs.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given zero schema description coverage and no annotations, the description is insufficiently complete. It fails to document the required cluster_id parameter, output format, or behavioral constraints necessary for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, requiring the description to compensate. While the text mentions 'cluster' (semantically related to the cluster_id parameter), it does not explicitly document the parameter, its valid values, or where to obtain a cluster ID, providing minimal compensation for the schema gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('explain') and resource ('records in the same cluster'), making the core function clear. However, it does not explicitly differentiate from the sibling tool 'agent_explain_pair' (which explains pairs vs. clusters) within the description text itself.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'agent_explain_pair' or 'get_cluster', nor does it mention prerequisites or conditions where this tool should not be used.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_explain_pairCInspect

Natural language explanation for a record pair

ParametersJSON Schema

Name	Required	Description	Default
`exact`	No
`fuzzy`	No
`record_a`	Yes
`record_b`	Yes

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure but only indicates the output is 'natural language'. It fails to disclose whether the operation is read-only, deterministic, expensive/computationally heavy, or whether the explanation generation relies on external LLM calls versus rule-based logic.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence with no redundant words. However, given the tool's complexity (nested objects, four parameters, no output schema), it is arguably too concise to be functionally helpful, though technically well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with four parameters (including nested objects), zero schema documentation, no annotations, and no output schema, the description is inadequate. It lacks necessary context about the explanation's scope, the structure of input records, or what constitutes a valid 'pair' in this domain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description must compensate for all four parameters. While 'record pair' implicitly maps to 'record_a' and 'record_b', it completely omits the 'exact' (array) and 'fuzzy' (object) parameters, leaving critical matching criteria undocumented. It does not explain parameter formats or relationships.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool generates a 'Natural language explanation for a record pair', identifying the general action and target resource. However, it fails to specify what aspect of the pair is being explained (e.g., similarity, differences, match reasoning) and does not distinguish from siblings like 'explain_match' or 'agent_compare_strategies' which may also operate on record pairs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives such as 'agent_explain_cluster' (for groups) or 'explain_match'. It lacks prerequisites, conditions for use, or explicit exclusions that would help an agent select the correct tool from the extensive sibling list.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_match_sourcesCInspect

Match two files with intelligent strategy selection

ParametersJSON Schema

Name	Required	Description	Default
`config`	No
`file_a`	Yes
`file_b`	Yes

Tool Definition Quality

C2.3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions 'intelligent strategy selection' suggesting automatic algorithm selection, but fails to disclose critical safety information: whether this creates persistent entities/domains, modifies input files, requires specific permissions, or what format the output takes (clusters? links? scores?).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single efficient sentence with no redundant words. However, given the complete absence of schema documentation and annotations, this brevity results in under-specification rather than effective conciseness—the description is too short for the information required.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex file-matching operation with 3 parameters (including a nested config object), 0% schema coverage, no annotations, and no output schema, the description is grossly incomplete. It fails to explain the matching methodology, the role of configuration, return value structure, or side effects (e.g., whether it creates clusters automatically).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It implicitly identifies 'file_a' and 'file_b' as files to be matched, but provides no information about the 'config' object—its purpose, required fields, or structure. Given the nested object complexity, this omission leaves a critical parameter undocumented.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description provides a clear verb ('Match') and identifies the resources ('two files'), plus adds specific context about 'intelligent strategy selection' that distinguishes it from simple record matching. However, it doesn't explicitly clarify how this differs from the sibling tool 'match_record' (file-level vs. record-level matching).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines1/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'match_record', 'agent_compare_strategies', or 'pprl_link'. It omits prerequisites (e.g., whether files must be pre-registered as domains) and gives no indication of when this tool is inappropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_review_queueCInspect

Get borderline pairs awaiting approval

ParametersJSON Schema

Name	Required	Description	Default
`job_name`	Yes

Tool Definition Quality

C2.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure but offers minimal information. While 'Get' implies a read-only operation, the description fails to specify return format (array of pairs vs. cursor), pagination behavior, or whether the call blocks if the queue is empty.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently worded at only five words and is front-loaded with the action and object. However, given the lack of schema documentation and output schema, this brevity results in underspecification rather than effective conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the absence of an output schema and annotations, the description should explain what the tool returns (e.g., structure of borderline pairs, confidence scores) and document the required parameter. It provides neither, leaving critical gaps for an agent attempting to invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0% description coverage for the required 'job_name' parameter, and the description text fails to compensate by explaining what job_name represents, its expected format, or how to obtain valid values. It adds no semantic meaning beyond the property name itself.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('Get') and resource ('borderline pairs awaiting approval'), clearly distinguishing it from sibling tools like 'agent_approve_reject' (which acts on pairs) by focusing on retrieval of the queue itself. However, it assumes domain knowledge about what constitutes 'borderline' in this entity resolution context.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no explicit guidance on when to use this tool versus alternatives like 'list_clusters' or 'find_duplicates', nor does it mention prerequisites such as requiring an active job or matching process to exist before calling.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

analyze_dataCInspect

Profile data, detect domain, recommend ER strategy

ParametersJSON Schema

Name	Required	Description	Default
`file_path`	Yes

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description fails to disclose critical behavioral traits. It does not indicate whether the tool modifies the source file, what format the recommendations take, or whether it requires specific permissions to access the file_path.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a terse 6-word fragment with no structural waste, but it is under-specified to the point of being a tagline rather than a functional description. Appropriate density for the length, but the length itself is inadequate.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with no output schema and no annotations, the description provides insufficient context. Given the complexity of the sibling tools (entity resolution, clustering, matching), the agent needs more information about what 'domain' and 'ER strategy' mean in this specific context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0% description coverage for the required 'file_path' parameter. The description completely fails to compensate, providing no information about expected file formats, path conventions, or whether the file should be local/remote.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description lists three specific actions (profile data, detect domain, recommend ER strategy) using concrete verbs. It implicitly distinguishes itself from the sibling 'profile_data' by indicating additional capabilities beyond simple profiling.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance provided on when to use this tool versus siblings like 'profile_data' or 'suggest_config'. No mention of prerequisites, file format requirements, or workflow positioning within the ER pipeline.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

auto_configureCInspect

Run AutoConfigController on a CSV; return the committed GoldenMatchConfig (incl. negative_evidence / Path Y when chosen) plus telemetry — stop_reason, health, decision trace, indicator column priors. Programmatic equivalent of goldenmatch autoconfig.

ParametersJSON Schema

Name	Required	Description	Default
`file_path`	Yes
`constraints`	No

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses the returned data (GoldenMatchConfig, telemetry details), but omits behavioral traits such as whether the tool is destructive, if it commits/permanent writes, or required permissions. With no annotations, more transparency is needed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The two-sentence description is concise and front-loaded with the primary action. However, the dense technical jargon could be clarified for broader understanding.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of the tool (nested parameters, no output schema, rich return data), the description lacks key details about the 'constraints' parameter and full return structure, leaving significant gaps for the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description must explain parameters but fails to do so. 'file_path' is implied as CSV path, but 'constraints' is completely unexplained, leaving the agent unable to understand valid inputs.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Run AutoConfigController on a CSV') and the return values, making the tool's purpose explicit. However, it does not differentiate from sibling tools like 'pprl_auto_config' or 'suggest_config', which slightly reduces clarity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. The description merely states it's the programmatic equivalent of a command, but does not specify contexts or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

controller_telemetryAInspect

Return the AutoConfigController telemetry from the most recent auto_configure or agent_deduplicate call in this MCP session. Same JSON shape as the web /api/v1/controller/telemetry endpoint.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Without annotations, the description adequately discloses that it returns telemetry from the most recent session call and matches a known web endpoint shape, implying read-only behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with no wasted words, front-loaded with the action and resource.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given zero parameters and no output schema, the description is complete by referencing the JSON shape of a web endpoint, which compensates for the missing output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

There are no parameters, so the description inherently adds no meaning beyond the schema. Baseline 4 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly specifies the verb 'Return', the resource 'AutoConfigController telemetry', and the scope ('most recent auto_configure or agent_deduplicate call'), distinguishing it from sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It implies usage after an auto_configure or agent_deduplicate call, providing clear context. However, it does not explicitly state when not to use or mention alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_domainAInspect

Create a custom domain extraction rulebook. Define patterns for a specific data domain (medical devices, automotive parts, real estate, etc.).

ParametersJSON Schema

Name	Required	Description	Default
`name`	Yes	Domain name (e.g. 'medical_devices', 'automotive_parts')
`scope`	No	Save locally (.goldenmatch/domains/) or globally (~/.goldenmatch/domains/). Default: local.	local
`signals`	Yes	Column name keywords that trigger this domain (e.g. ['ndc', 'fda', 'implant'])
`stop_words`	No	Words to strip during name normalization
`brand_patterns`	No	Brand/manufacturer names to extract (e.g. ['Medtronic', 'Abbott'])
`attribute_patterns`	No	Named regex patterns for domain attributes (e.g. {'size': '\\b(\\d+mm)\\b'})
`identifier_patterns`	No	Named regex patterns for domain identifiers (e.g. {'ndc': '\\b(\\d{5}-\\d{4}-\\d{2})\\b'})

Tool Definition Quality

A3.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden but discloses minimal behavioral traits. Fails to mention persistence mechanism (files created in .goldenmatch/domains/), idempotency behavior (what happens if name exists), or side effects. Schema reveals file paths, but description doesn't explain the creation's impact.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with zero waste. First sentence states core action; second provides pattern definition context with helpful examples. Appropriately front-loaded with the creation verb.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequate for the 7-parameter complexity with good schema coverage, but lacks system context (this is for entity resolution/matching per siblings) and behavioral details expected when no annotations or output schema exist. Should clarify relationship to matching pipeline.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, establishing baseline 3. Description adds value by providing concrete domain examples ('medical_devices', 'automotive_parts') that help agents understand the semantic intent of the 'name' and 'signals' parameters, elevating above pure schema repetition.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description opens with specific verb 'Create' and resource 'custom domain extraction rulebook'. Examples (medical devices, automotive parts) clarify the abstract 'domain' concept and distinguish this from sibling tools list_domains (listing) and test_domain (testing).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides implied usage context through domain examples, indicating when to use (for specific verticals like medical devices). However, lacks explicit when-to-use guidance or differentiation from test_domain (which validates rules) and no mention of prerequisites or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

explain_matchBInspect

Explain why two records match or don't match. Shows per-field score breakdown.

ParametersJSON Schema

Name	Required	Description	Default
`record_a`	Yes	First record fields
`record_b`	Yes	Second record fields

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description must carry the full burden. It discloses that the tool provides a 'per-field score breakdown,' which is valuable behavioral context about the output format. However, it lacks disclosure on side effects, idempotency, or authorization requirements, though the verb 'Explain' implies a read-only operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description consists of two efficient sentences with zero waste. The primary purpose is front-loaded in the first sentence, while the second sentence adds specific value about the output format. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the moderate complexity (2 parameters with 100% schema coverage, nested objects, no output schema), the description is reasonably complete. It compensates for the missing output schema by describing the expected analysis format ('per-field score breakdown'), though it could benefit from explicit safety declarations given the lack of annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage ('First record fields', 'Second record fields'), establishing a baseline of 3. The description mentions 'two records' and 'per-field' breakdown, which aligns with the schema but does not add significant semantic detail about expected record structure or required fields beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool explains why two records match or don't match, using specific verbs ('Explain', 'Shows') and identifying the resource (records, per-field scores). However, it does not explicitly differentiate from similar sibling tools like 'agent_explain_pair'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives such as 'match_record' (which likely performs matching without explanation) or 'agent_explain_pair'. There are no stated prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

export_resultsCInspect

Export matching results to a file (CSV or JSON).

ParametersJSON Schema

Name	Required	Description	Default
`format`	No	Output format (default csv)	csv
`output_path`	Yes	File path to save results

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description fails to disclose critical behavioral traits such as file overwrite behavior, idempotency, whether the operation blocks until completion, or required permissions for writing to the specified path.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence with no wasted words. However, given the absence of annotations and output schema, the extreme brevity leaves important behavioral questions unanswered, suggesting it is slightly under-specified rather than optimally concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a two-parameter tool with complete schema documentation, the description is minimally adequate. However, it should clarify what constitutes 'matching results' (e.g., from match_record or find_duplicates) and address file system interactions given the lack of annotations and output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, providing the baseline. The description parenthetically mentions 'CSV or JSON' which reinforces the format enum, but adds no additional semantic context about the output_path requirements or format implications beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (export), resource (matching results), destination (file), and supported formats (CSV or JSON). However, it assumes familiarity with what 'matching results' refers to without defining the context from sibling matching tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance provided on when to use this tool versus alternatives, nor prerequisites such as requiring prior matching operations to generate results. The description lacks 'when-not-to-use' or sequencing information.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

find_duplicatesBInspect

Find duplicate matches for a record. Provide field values to search against the loaded dataset.

ParametersJSON Schema

Name	Required	Description	Default
`top_k`	No	Max results to return (default 5)
`record`	Yes	Record fields to match (e.g. {"name": "John Smith", "zip": "10001"})

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions the 'loaded dataset' prerequisite but fails to describe the matching algorithm (fuzzy vs. exact), the return format (since no output schema exists), or whether results include similarity scores. This leaves critical behavioral traits undocumented.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description consists of exactly two efficient sentences with no redundant words. The first sentence establishes purpose immediately, and the second provides the essential context about the dataset requirement. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the absence of both annotations and an output schema, the description is minimally adequate. It correctly identifies the dataset prerequisite but omits critical context such as return value structure, read-only nature, and differentiation from the numerous sibling deduplication tools available on this server.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the schema already fully documents both 'record' and 'top_k' parameters. The description adds no additional semantic value regarding parameter formats, validation rules, or example values beyond what the schema provides, warranting the baseline score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the core action (find duplicate matches) and target (a record). However, it fails to distinguish this tool from similar siblings like 'agent_deduplicate' or 'match_record', which could confuse an agent trying to select between them.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides minimal usage guidance beyond the tautological instruction to 'provide field values.' It offers no criteria for when to use this tool versus alternatives like 'agent_deduplicate' or 'match_record', nor does it mention prerequisites for the 'loaded dataset' it references.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

fix_qualityAInspect

Run GoldenCheck scan and apply fixes to a CSV file. Returns the fixed data summary and a manifest of all fixes applied. Requires goldencheck: pip install goldenmatch[quality]

ParametersJSON Schema

Name	Required	Description	Default
`domain`	No	Optional domain hint (healthcare, finance, ecommerce)
`fix_mode`	No	Fix aggressiveness: safe (conservative) or moderate (balanced). Default: safe	safe
`file_path`	Yes	Path to the CSV file to fix
`output_path`	No	Optional path to save the fixed CSV. If omitted, returns summary only.

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It successfully discloses the return values (fixed data summary and manifest) and external dependency requirement. However, it does not explicitly clarify whether the original file is modified in-place or if the operation is strictly read-only without output_path.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, each earning its place: action statement, return value disclosure, and dependency requirement. No redundancy and appropriately front-loaded with the core operation.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a 4-parameter tool with simple types and no output schema, the description adequately covers what the tool returns and its external requirements. It could be improved by clarifying file handling behavior (in-place vs. copy) and response format details.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the schema fully documents all parameters including fix_mode options and output_path behavior. The description adds minimal semantic context beyond the schema, though 'GoldenCheck scan' provides domain context for the domain parameter hints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action (Run GoldenCheck scan and apply fixes) and resource (CSV file). It distinguishes itself from sibling 'scan_quality' by explicitly mentioning 'apply fixes' rather than just scanning.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides critical prerequisite information (requires goldencheck package installation), which is essential for execution. However, it lacks explicit guidance on when to use this versus the sibling 'scan_quality' tool or when to choose 'safe' versus 'moderate' fix modes.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_clusterAInspect

Get details of a specific cluster: all member records and their field values.

ParametersJSON Schema

Name	Required	Description	Default
`cluster_id`	Yes	Cluster ID to look up

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full disclosure burden. It partially compensates by describing the return data scope ('all member records and their field values'), which hints at the response structure. However, it omits safety traits (read-only status), rate limits, or authentication requirements that annotations would typically cover.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence with zero waste. The first clause states the action ('Get details of a specific cluster') and the colon-delimited second clause immediately specifies the return payload, placing critical information at the front.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (single required parameter, no nested objects) and lack of output schema, the description adequately compensates by characterizing the returned data ('all member records and their field values'). It could be improved by explicitly stating this is a read-only operation given the absence of safety annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with the cluster_id parameter fully documented as 'Cluster ID to look up'. The description mentions 'specific cluster' which aligns with the parameter but adds no additional syntax guidance, format details, or validation rules beyond the schema. Baseline 3 is appropriate for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses specific verb 'Get' with resource 'cluster' and clarifies the scope ('all member records and their field values'). It implicitly distinguishes from list_clusters via 'specific cluster' and from agent_explain_cluster by returning raw data rather than explanations, though it doesn't explicitly name siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage through 'specific cluster' (suggesting single-entity lookup vs. bulk listing) and 'all member records' (suggesting deep inspection). However, it lacks explicit when-to-use guidance or named alternatives like 'use list_clusters to browse available clusters first'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_golden_recordCInspect

Get the merged golden (canonical) record for a cluster.

ParametersJSON Schema

Name	Required	Description	Default
`cluster_id`	Yes	Cluster ID

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, placing the full burden of behavioral disclosure on the description. While 'merged' and 'canonical' hint at entity resolution behavior, the description fails to clarify what constitutes a golden record, whether the operation is read-only, or what occurs when an invalid cluster_id is provided.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description consists of a single, efficient sentence that front-loads the verb and object. Every word serves a purpose, though the extreme brevity comes at the cost of explanatory depth given the absence of annotations and output schema.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a single-parameter retrieval tool, the description adequately identifies the return concept (a merged canonical record). However, without an output schema, it misses the opportunity to describe the record's structure or explain the domain-specific 'golden record' concept for agents unfamiliar with entity resolution terminology.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage for its single parameter (cluster_id). The description mentions 'for a cluster,' which correlates to the parameter, but adds no semantic value regarding valid ranges or constraints. With the schema fully self-documenting, the description meets the baseline expectation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Get') and the specific resource ('merged golden (canonical) record'), aligning with the cluster_id parameter. However, it does not explicitly differentiate from the sibling tool 'get_cluster,' leaving ambiguity about whether this retrieves the cluster container versus the consolidated entity within it.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'get_cluster' or 'agent_explain_cluster.' It omits prerequisites (such as whether the cluster must exist or be deduplicated first) and offers no exclusions or error conditions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_statsBInspect

Get dataset statistics: record count, cluster count, match rate, cluster sizes.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

B3.3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. While it lists output metrics, it fails to mention performance characteristics (expensive calculation vs. cached), side effects, or whether the statistics are real-time or approximate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficiently structured sentence that front-loads the action and precisely enumerates return values without redundant prose or filler text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the absence of an output schema, the description adequately compensates by listing the specific statistics returned. For a zero-parameter tool, this is sufficient, though mentioning the scope (current dataset vs. all data) would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema contains zero parameters, establishing a baseline score of 4. The description appropriately requires no additional parameter explanation since the tool operates on the implicit current dataset context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('Get') and resource ('dataset statistics') and enumerates exact metrics returned (record count, cluster count, match rate, cluster sizes). However, it does not explicitly differentiate from analytical siblings like analyze_data or profile_data that might overlap in functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states what the tool retrieves but provides no guidance on when to use this tool versus alternatives like analyze_data or profile_data. There are no stated prerequisites, exclusions, or conditions for invocation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

identity_conflictsCInspect

List evidence edges marked conflicts_with.

ParametersJSON Schema

Name	Required	Description	Default
`path`	No
`dataset`	No

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, and the description does not disclose behavioral traits such as whether the tool is read-only, requires permissions, or has pagination limits. The only implied behavior is that it lists data, but no explicit confirmation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise (one sentence), but it lacks structure such as front-loading key information. It is not overly verbose but is also not well-organized for quick understanding.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the absence of output schema and parameter explanations, the description is insufficient. It does not cover return format, error handling, or prerequisites, making it incomplete for reliable use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must explain parameters. It does not mention 'path' or 'dataset' at all, leaving the agent unable to determine how to use them.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description provides a specific verb 'list' and resource 'evidence edges marked conflicts_with', which clearly states what the tool does. It distinguishes from sibling tools that deal with other aspects of identity resolution.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like 'identity_list' or 'find_duplicates'. The description lacks context about the purpose of listing conflict edges.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

identity_historyCInspect

Return the temporal event log for an identity.

ParametersJSON Schema

Name	Required	Description	Default
`path`	No
`limit`	No
`entity_id`	Yes

Tool Definition Quality

C2.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must disclose behavioral traits. It only states it returns a log, without details on ordering, pagination, performance, or access requirements. For a read-only tool, more context is needed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence, which is concise but omits critical information. While brief, it fails to be sufficiently informative, resulting in under-specification rather than efficient communication.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given three parameters, no annotations, and no output schema, the description is severely incomplete. It does not explain the event log format, parameter semantics, or expected results, leaving the agent with insufficient context to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, but the description provides no explanation of parameters (path, limit, entity_id). It adds no value beyond the schema, leaving the agent without understanding of how to use parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Return the temporal event log for an identity' clearly states the action (return) and the resource (event log), distinguishing it from sibling tools like identity_list or identity_merge. It is specific and unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives such as identity_list or identity_conflicts. The description does not mention when this tool is appropriate or under what conditions it should be preferred.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

identity_listCInspect

List identities, optionally filtered by dataset/status.

ParametersJSON Schema

Name	Required	Description	Default
`path`	No
`limit`	No
`offset`	No
`status`	No
`dataset`	No

Tool Definition Quality

C2.8/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden but only says 'List identities'. It does not disclose that the tool is read-only, does not explain pagination behavior (though limit/offset are in schema), nor does it mention response format or any side effects. A more detailed description would be needed for adequate transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence. Every word serves a purpose: verb, resource, and key optional filters. No waste, easy to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the absence of annotations, no output schema, and 5 parameters with zero schema descriptions, the description is incomplete. It fails to explain return values, pagination, or the purpose of the path parameter. A list tool should provide more context for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 5 parameters with 0% description coverage. The description only mentions two filters (dataset, status), ignoring path, limit, and offset. It adds minimal meaning beyond the schema, especially for the path parameter which is left unexplained.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses the verb 'List' and resource 'identities', and specifies optional filters (dataset/status). It is clear what the tool does, and it distinguishes from sibling identity tools that perform mutations (e.g., identity_merge, identity_split) or specific queries (e.g., identity_conflicts, identity_history).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives like identity_history or list_clusters. There is no mention of prerequisites, exclusions, or context for the filtering parameters.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

identity_mergeCInspect

Manually merge two identities. All records from absorb_entity_id are reassigned to keep_entity_id.

ParametersJSON Schema

Name	Required	Description	Default
`path`	No
`reason`	No
`keep_entity_id`	Yes
`absorb_entity_id`	Yes

Tool Definition Quality

C2.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description must disclose behavioral traits. It only states records are reassigned, lacking info on destructiveness, reversibility, or required permissions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two efficient sentences with no fluff. Could be slightly more structured but overall concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Lacks output schema and behavioral context (e.g., errors, side effects). Incomplete for a merge operation with 4 parameters.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%; description only clarifies keep_entity_id and absorb_entity_id, but path and reason are unexplained. Needs to cover all parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool merges two identities with a specific verb and resource. However, it does not differentiate from siblings like identity_split or identity_resolve.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives (e.g., identity_resolve). Does not mention prerequisites or conditions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

identity_resolveAInspect

Resolve a record_id to its durable identity. Returns the full identity view (members, evidence edges, recent events) or null when no identity exists for that record.

ParametersJSON Schema

Name	Required	Description	Default
`path`	No	Identity DB path
`record_id`	Yes	record id in `{source}:{source_pk}` form

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations exist, so the description must disclose behavioral traits. It reveals that the tool returns null if no identity exists, but it does not explicitly state that it is a read-only operation or mention side effects, auth needs, or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first states the action, second explains the return value. No wasted words. Front-loaded with the core purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description adequately covers the return value (full identity view or null). It also mentions the parameter format. However, it does not mention error cases or prerequisites like domain existence, though this is acceptable for a simple lookup tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description adds minimal extra meaning beyond the schema (e.g., the record_id format is already in the schema). It provides context but does not significantly enhance parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it resolves a record_id to its durable identity and specifies the return value (full identity view with members, evidence edges, recent events, or null). It uses a specific verb and resource, making the purpose unmistakable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus siblings like identity_history or identity_conflicts. The description lacks explicit when/when-not or alternative recommendations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

identity_splitCInspect

Split a subset of records off an identity into a brand-new identity. The original keeps the remaining records.

ParametersJSON Schema

Name	Required	Description	Default
`path`	No
`reason`	No
`entity_id`	Yes
`record_ids`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Without annotations, the description carries full burden. It discloses the split outcome and that original keeps remaining records, but omits permissions, reversibility, or side effects like relationship changes.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no wasted words. The action is front-loaded, and the second sentence clarifies the effect on the original identity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (4 params, no annotations, no output schema, many siblings), the description lacks usage context, parameter details, and behavioral constraints, making it incomplete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It provides no explanation for entity_id, record_ids, path, or reason, adding zero meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action of splitting records from an identity into a new one, using specific verbs and resource. However, it does not distinguish from siblings like identity_merge or identity_conflicts.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives, nor any exclusion criteria or prerequisites. The description only states what it does.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

learn_thresholdsAInspect

Force a MemoryLearner pass over accumulated corrections. Returns the list of LearnedAdjustments produced (matchkey_name, threshold, sample_size, learned_at). Requires >= 10 corrections per matchkey before threshold tuning fires; otherwise returns an empty list.

ParametersJSON Schema

Name	Required	Description	Default
`path`	No	SQLite memory DB path. Default: .goldenmatch/memory.db
`matchkey_name`	No	Optional: learn only for this matchkey.

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations exist, so description must cover behavior. It mentions forcing a pass and a prerequisite, but does not explicitly state whether the operation modifies state or is destructive, leaving some ambiguity.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with clear structure: action + return, then condition. No extraneous information; every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers purpose, return format, and precondition. Lacks explanation of error scenarios or side effects, but given no output schema, it is adequately complete for most use cases.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers both parameters (path, matchkey_name) with descriptions. The tool description does not add new parameter-level details beyond the schema, so baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool forces a MemoryLearner pass and returns a list of LearnedAdjustments, distinguishing it from siblings like add_correction and memory_stats.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit context on when to use (force learning after corrections) and a precondition (>=10 corrections) but does not directly address when not to use or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_clustersBInspect

List duplicate clusters found in the dataset. Returns cluster IDs, sizes, and member counts.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No	Max clusters to return (default 20)
`min_size`	No	Minimum cluster size to include (default 2)

Tool Definition Quality

B3.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Since no annotations are provided, the description must carry full behavioral burden. It discloses return values (cluster IDs, sizes, counts) but omits read-only status, pagination behavior (whether 'limit' implies cursor/page tokens), and performance characteristics (dataset size implications).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with zero redundancy. Front-loaded with the core action ('List duplicate clusters') followed immediately by return value specification. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a 2-parameter listing tool with no output schema, the description provides minimal return value hints but lacks structural details (array vs object format) and safety confirmation. Adequate but not robust given the absence of output schema and annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the schema adequately documents 'limit' and 'min_size'. The description adds no parameter context, meeting the baseline expectation for high-coverage schemas.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists 'duplicate clusters' with specific return values (IDs, sizes, member counts). However, it does not differentiate from sibling tool 'get_cluster' (singular retrieval) or clarify when to use listing versus individual retrieval.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance provided on when to use this tool versus alternatives like 'get_cluster' or 'agent_deduplicate'. Missing prerequisites (e.g., dataset must be analyzed first) and workflow context (listing before detailed examination).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_correctionsBInspect

List stored Learning Memory corrections, optionally filtered by dataset. Returns id_a, id_b, decision, source, trust, reason, matchkey_name, dataset, original_score, created_at.

ParametersJSON Schema

Name	Required	Description	Default
`path`	No	SQLite memory DB path. Default: .goldenmatch/memory.db
`dataset`	No	Optional dataset filter (e.g. file path).

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided; description implies read-only by stating 'list' and returns fields, but does not explicitly declare safety, side effects, or pagination behavior. Adequate but minimal.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two succinct sentences: first states purpose, second lists output fields. No unnecessary words, front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Describes operation and return fields well for a simple list tool with optional filter. Lacks mention of result ordering or pagination, but acceptable given typical usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and already describes both parameters. Description adds 'optionally filtered by dataset' which repeats schema, and lists return fields which are not input params. No new parameter-level insight.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it lists stored Learning Memory corrections with optional dataset filtering and specifies return fields. Distinguishes from sibling 'add_correction' by read nature, but no explicit differentiation from other list tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit when or when-not to use; only mentions optional filtering. No alternatives provided despite many sibling tools like 'memory_export' or 'get_stats' that might overlap.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_domainsAInspect

List available domain extraction rulebooks (built-in + user-defined).

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It adds valuable scope context ('built-in + user-defined') indicating the result set contents, but fails to disclose safety characteristics (read-only status), pagination behavior, or return format structure.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence of seven words. It is front-loaded with the action verb and contains zero redundancy or filler, with every word earning its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (zero parameters) and lack of output schema, the description adequately covers the core functionality by specifying what is listed. However, it could be improved by briefly describing the return structure since no output schema is available.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema contains zero parameters. Per the evaluation rules, zero parameters establishes a baseline score of 4, as there are no parameter semantics to elaborate upon.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description provides a specific verb ('List'), clear resource ('domain extraction rulebooks'), and scope qualification ('built-in + user-defined'). This clearly distinguishes it from siblings like create_domain and test_domain through the action verb alone.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no explicit guidance on when to use this tool versus alternatives such as create_domain or test_domain. It lacks 'when to use' or 'when not to use' clauses, forcing the agent to infer applicability solely from the verb.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

match_recordAInspect

Match a single record against the loaded dataset in real-time. Paste a record's fields and instantly see if it matches any existing record. Uses the configured matchkeys, scorers, and thresholds. Example: {"name": "John Smith", "email": "john@test.com", "zip": "10001"}

ParametersJSON Schema

Name	Required	Description
`top_k`	No	Max matches to return (default 5)
`record`	Yes	Record fields to match against the dataset
`threshold`	No	Minimum score to consider a match (default: use config threshold)

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses configuration dependencies ('Uses the configured matchkeys, scorers, and thresholds') but fails to clarify if the operation is read-only, what happens when no match is found, or the return value structure.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three focused sentences plus a concrete example. Front-loaded with purpose, followed by usage instruction and configuration context. No redundant words; every element earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequate for a 3-parameter tool, but given the lack of output schema and annotations, the description should explain what constitutes a successful match return (records, scores, or boolean) and confirm the read-only nature of the operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Despite 100% schema coverage, the description adds significant value through the concrete JSON example for the 'record' parameter, illustrating expected field structure. This goes beyond the schema's generic 'Record fields to match' description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action (match a single record against the loaded dataset) and scope (real-time). It effectively distinguishes from siblings like 'find_duplicates' (batch) and 'explain_match' (analysis) by emphasizing 'single record' and 'real-time' matching.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The phrase 'Paste a record's fields and instantly see' implies interactive usage for single-record validation, but it lacks explicit when-to-use guidance or named alternatives (e.g., when to use 'find_duplicates' vs this tool).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

memory_exportBInspect

Return all corrections as a list of dicts (CSV-shaped). Caller is responsible for writing the file. Optionally filter by dataset.

ParametersJSON Schema

Name	Required	Description	Default
`path`	No	SQLite memory DB path. Default: .goldenmatch/memory.db
`dataset`	No

Tool Definition Quality

B3.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Without annotations, description must disclose behavior. It states it returns data (read operation) and caller writes file, but doesn't address pagination, database locking, or performance impacts.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

One concise sentence (20 words) that front-loads the main action and includes key responsibilities. No extra words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Lacks output schema, so description should clarify return structure. 'CSV-shaped' hints but is vague. Does not mention default path behavior beyond schema. Moderately complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Adds meaning to 'dataset' parameter via 'Optionally filter by dataset'. The 'path' parameter description is in schema; tool description doesn't repeat it. Schema description coverage is 50%, description partially compensates.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it returns all corrections as a list of dicts (CSV-shaped), aligned with tool name 'memory_export'. Lacks explicit differentiation from siblings like 'list_corrections' or 'export_results'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The note 'Caller is responsible for writing the file' is a responsibility warning, not a usage guideline.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

memory_statsBInspect

Return Learning Memory status: total correction count, last learn time, and current learned adjustments. Cheap; safe for status checks.

ParametersJSON Schema

Name	Required	Description	Default
`path`	No	SQLite memory DB path. Default: .goldenmatch/memory.db

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the burden of behavioral disclosure. It states the tool is 'cheap' and 'safe', indicating it is read-only and low-cost, but lacks details on potential side effects, required permissions, or response behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very short, consisting of two sentences. It is concise and front-loads the key information, but the structure could be improved by merging the two sentences or adding bullet points for the returned data.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description lists three components of the return value but does not specify their types or format. Given there is no output schema, this is a gap. However, for a simple status check tool, it provides a reasonable overview.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage for the single parameter 'path', so the schema already provides meaning. The description does not add additional semantic value beyond what is in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns Learning Memory status and lists specific data points. However, it does not explicitly distinguish this tool from similar sibling tools like get_stats, which may also return status information.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions it is 'cheap; safe for status checks', implying it is suitable for frequent, non-critical queries. However, it does not provide explicit guidance on when not to use it or suggest alternatives among the many sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pprl_auto_configBInspect

Analyze the loaded dataset and recommend optimal PPRL (privacy-preserving record linkage) configuration. Returns recommended fields, bloom filter parameters, threshold, and explanation.

ParametersJSON Schema

Name	Required	Description	Default
`use_llm`	No	Use LLM for enhanced recommendations (requires API key)
`security_level`	No	Security level (default: high)	high

Tool Definition Quality

B3.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It compensates for the missing output schema by documenting return values (fields, bloom filter parameters, threshold, explanation) and clarifies the 'analyze' scope works on the 'loaded dataset'. However, it lacks details on execution time, deterministic behavior, or side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two well-structured sentences: first stating the action and domain, second detailing the return structure. No redundant words or tautologies; information density is high.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequately compensates for the lack of output schema by describing return values. However, given the presence of similar configuration siblings ('suggest_pprl', 'auto_configure'), the description is incomplete without differentiation guidance or explicit prerequisites beyond the passing reference to 'loaded dataset'.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, providing full documentation for 'use_llm' and 'security_level'. The description adds no parameter-specific guidance, meeting the baseline expectation when the schema is self-documenting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool analyzes the loaded dataset and recommends PPRL configurations, specifying exact outputs (bloom filter parameters, threshold, fields). However, it fails to differentiate from similar siblings like 'suggest_pprl' or 'auto_configure'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides no guidance on when to use this versus alternatives like 'suggest_pprl' or 'auto_configure'. While it implicitly references a prerequisite ('loaded dataset'), it does not explicitly state that a dataset must be loaded first or when this automated approach is preferred over manual configuration.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pprl_linkBInspect

Run privacy-preserving record linkage between two parties' data. Computes bloom filters, matches records without sharing raw data. Specify fields, threshold, and security level.

ParametersJSON Schema

Name	Required	Description	Default
`fields`	Yes	Field names to match on (e.g. ['first_name', 'last_name', 'zip_code'])
`file_a`	Yes	Path to party A's CSV file
`file_b`	Yes	Path to party B's CSV file
`threshold`	No	Match threshold (default: auto-detected)
`security_level`	No		high

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. Discloses privacy mechanism ('bloom filters') and safety property ('without sharing raw data'), but omits operational details like output format, side effects (file writes?), performance characteristics, or error behavior expected for a cryptographic tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three efficient sentences with clear structure: purpose, technical method, parameters. No redundant text. Minor opportunity to front-load the 'two parties' distinction earlier, but overall well-paced.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 80% schema coverage and moderate complexity, description covers basic operation but gaps remain. Missing output specification (critical given no output_schema), side effect disclosure, and resource requirements. Adequate but incomplete for safe agent invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 80%, establishing baseline 3. Description lists parameters ('Specify fields, threshold, and security level') but adds minimal semantic value beyond schema descriptions. Does not explain threshold ranges, security_level implications, or file path requirements.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

States specific action ('Run privacy-preserving record linkage') and resource ('two parties' data'). Distinguishes from siblings like find_duplicates and match_record by specifying 'privacy-preserving' technique and 'two parties' context, though it doesn't explicitly contrast with pprl_auto_config or suggest_pprl.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implies usage context via 'two parties' data' (suggesting external data linkage vs internal deduplication), but lacks explicit when-to-use guidance, prerequisites (e.g., file formats), or contrasts with sibling tools like pprl_auto_config.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

profile_dataBInspect

Get data quality profile: column types, null rates, unique counts, sample values.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses the output structure by listing what the profile includes, but fails to mention operational traits like read-only safety, performance characteristics, or whether it samples the full dataset.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence that immediately states the action and resource, followed by a colon-delimited list clarifying the specific data returned. There is no wasted text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of an output schema, the description partially compensates by listing return values. However, given the high density of sibling analysis tools and lack of annotations, the description is incomplete without usage context or safety disclosures.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema contains zero parameters. Per the scoring rubric, 0 params establishes a baseline of 4. The description appropriately adds no parameter-related text since none exist.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool 'Get[s] data quality profile' and enumerates specific metrics returned (column types, null rates, unique counts, sample values). However, it does not explicitly differentiate this from siblings like 'get_stats' or 'analyze_data'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus the numerous sibling analysis tools (e.g., 'analyze_data', 'get_stats', 'test_domain'). There are no stated prerequisites, exclusions, or alternative recommendations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

run_transformsAInspect

Run GoldenFlow data transforms on a CSV file. Normalizes phone numbers (E.164), dates (ISO), categorical spelling, and Unicode issues. Returns a manifest of transforms applied. Requires goldenflow: pip install goldenmatch[transform]

ParametersJSON Schema

Name	Required	Description	Default
`file_path`	Yes	Path to the CSV file to transform
`output_path`	No	Optional path to save the transformed CSV. If omitted, returns summary only.

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. Discloses critical operational requirements (pip install goldenmatch[transform]) and return value (manifest of transforms applied). Clarifies that output is a summary/manifest rather than just the transformed file. Does not mention side effects (whether original file is modified) or error handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences with zero waste: operation definition, specific transformations, return value, and prerequisites. Front-loaded with main purpose. Every sentence provides unique information not redundant with schema or other sentences.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Appropriately complete for a 2-parameter tool with simple schema. Prerequisites and return value are documented despite lack of output schema. Minor gap: doesn't clarify if tool creates temporary files, handles non-CSV formats, or behavior when output_path is provided (does it still return manifest?).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions for both file_path and output_path. Description reinforces the CSV file target and clarifies that 'summary only' mentioned in schema means 'manifest of transforms applied'. Baseline 3 is appropriate since schema does heavy lifting, though description adds useful context about the return format.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Specific verb (Run) + resource (GoldenFlow data transforms) + target (CSV file). Lists exact normalization operations (E.164 phone numbers, ISO dates, categorical spelling, Unicode) that clearly distinguish it from sibling matching/deduplication tools (agent_match_sources, find_duplicates, etc.).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit when-to-use or when-not-to-use guidance comparing it to similar siblings like fix_quality or analyze_data. However, the specific transformation types listed (phone/date normalization) provide implicit context for when this tool is appropriate vs. general quality scanning.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

scan_qualityAInspect

Run GoldenCheck data quality scan on a CSV file. Returns issues found (encoding errors, Unicode problems, format violations) without applying fixes. Requires goldencheck: pip install goldenmatch[quality]

ParametersJSON Schema

Name	Required	Description	Default
`domain`	No	Optional domain hint (healthcare, finance, ecommerce)
`file_path`	Yes	Path to the CSV file to scan

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, description carries full burden. It discloses specific issue types detected (encoding errors, Unicode problems, format violations), clarifies non-destructive read-only behavior ('without applying fixes'), and warns about external dependency requirement. Could explicitly confirm it does not modify the input file.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences efficiently structured: action/purpose first, return behavior second, prerequisites third. Every sentence earns its place with zero redundancy. Appropriate length for tool complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a 2-parameter tool without output schema, description adequately covers functionality, return value concept (issues found), and operational constraints. Missing only explicit read-only confirmation and output structure details, though these are partially implied by the text.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, establishing baseline 3. Description references 'CSV file' (mapping to file_path) but does not add semantic context beyond what the schema already provides for either parameter. No additional format guidance or domain parameter elaboration needed due to comprehensive schema documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description states specific action ('Run GoldenCheck data quality scan'), target resource ('CSV file'), and distinguishes from sibling tool fix_quality by explicitly noting it returns issues 'without applying fixes'. Clear verb-resource combination with scope differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implicitly guides usage by contrasting with fix operations ('without applying fixes') and states prerequisite ('Requires goldencheck'). However, it does not explicitly state when to choose this over analyze_data or profile_data siblings, or when to invoke fix_quality afterward.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

shatter_clusterAInspect

Break an entire cluster into individual records. All members become singletons. Use when a cluster is completely wrong.

ParametersJSON Schema

Name	Required	Description	Default
`cluster_id`	Yes	Cluster ID to shatter

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and successfully explains the state transformation (cluster becomes singletons) and implied destructiveness via 'Break' and 'shatter'. Minor gap: does not explicitly address reversibility, permissions, or side effects beyond the state change.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences efficiently structured: action statement, behavioral outcome, and usage guideline. Zero redundancy—every sentence earns its place. Front-loaded with the core action ('Break').

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a single-parameter destructive tool without output schema, the description adequately covers the operation mechanics and usage context. Minor deduction for lacking explicit safety warnings (irreversibility) or error conditions, but sufficient for agent invocation decisions.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with cluster_id fully documented as 'Cluster ID to shatter'. The description does not add parameter-specific semantics (e.g., valid ID formats, where to obtain IDs), but the schema is self-sufficient. Baseline score appropriate for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses specific verbs ('Break', 'shatter') and clearly identifies the resource (cluster) and outcome (individual records/singletons). It effectively distinguishes this from siblings like get_cluster (retrieval) or unmerge_record (single record operation) by specifying 'entire cluster'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear when-to-use guidance ('Use when a cluster is completely wrong'), helping agents distinguish this nuclear option from partial correction tools. Lacks explicit naming of alternatives (e.g., unmerge_record for individual fixes), but the 'completely wrong' criterion provides strong contextual guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

suggest_configAInspect

Analyze bad merges and suggest config changes. Provide examples of incorrect merges (pairs that should NOT have matched) and GoldenMatch will identify which fields/thresholds to tighten. Example: [{"record_a": {...}, "record_b": {...}, "reason": "different people"}]

ParametersJSON Schema

Name	Required	Description	Default
`bad_merges`	Yes	List of bad merge examples with record_a, record_b, and optional reason

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden but only discloses the core analysis mechanism ('identify which fields/thresholds to tighten'). It omits critical behavioral details: whether suggestions are immediately applied or returned for review, if the operation is idempotent, or any rate limiting concerns.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Excellent structure with three efficient sentences: purpose statement, usage instructions, and concrete example. Every sentence earns its place with no redundancy or extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequate for the input side given 100% schema coverage and the helpful example. Minor gap in not describing what the suggestions look like (output format), though no output schema exists to supplement this. Could better contextualize within the configuration workflow.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

While the schema has 100% coverage (baseline 3), the description adds valuable context through the concrete JSON example showing the expected structure with 'record_a', 'record_b', and 'reason' fields, clarifying the expected input format beyond the schema's abstract description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool 'Analyze[s] bad merges and suggest[s] config changes' with specific verbs and resources. It effectively distinguishes from siblings like auto_configure or find_duplicates by specifying this requires manual examples of incorrect merges to drive the configuration suggestions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear positive guidance on when to use ('Provide examples of incorrect merges... and GoldenMatch will identify which fields/thresholds to tighten'). However, lacks explicit contrast with alternatives like auto_configure or guidance on when NOT to use this versus automatic configuration options.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

suggest_pprlCInspect

Check if data needs privacy-preserving matching

ParametersJSON Schema

Name	Required	Description	Default
`file_path`	Yes

Tool Definition Quality

C2.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. While 'Check' implies a read-only assessment, the description fails to specify what the tool returns (boolean, report, or recommendation), whether it has side effects, or what 'needs' means in this context (sensitivity detection, column analysis, etc.).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence with no redundant words. It is appropriately front-loaded with the action verb, though extreme brevity contributes to information gaps elsewhere.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of output schema, annotations, and schema parameter descriptions, the description is insufficient. It omits critical details about the return value format, the nature of the PPRL recommendation, and what constitutes 'data' in this context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, requiring the description to compensate. It partially succeeds by implying the file_path should point to 'data' rather than configuration, but fails to specify expected file formats, content structure, or whether the path is local or remote.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool checks whether data requires privacy-preserving matching, providing a specific verb and resource. However, it does not clarify how this differs from sibling tools like `pprl_auto_config` or `analyze_data`, leaving ambiguity about when to select this specific tool.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like `suggest_config`, `analyze_data`, or `pprl_auto_config`. It lacks explicit prerequisites, exclusion criteria, or workflow context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

test_domainAInspect

Test a domain extraction rulebook against sample records. Shows what features would be extracted from the loaded data.

ParametersJSON Schema

Name	Required	Description	Default
`domain_name`	Yes	Name of the domain rulebook to test
`sample_size`	No	Number of records to test (default 10)

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It adds valuable context that data must be 'loaded' first (prerequisite) and implies read-only behavior through 'Shows' and 'would be extracted'. However, it fails to explicitly confirm safety (no state modification), performance limits, or idempotency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with zero waste. Front-loaded with the core action ('Test...'), followed by the specific output ('Shows what features...'). Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a 2-parameter tool without output schema, the description adequately explains the return value (shows extracted features) and the prerequisite (loaded data). It could be improved by explicitly stating this is a safe, non-destructive operation, but covers the essential behavioral contract.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, establishing a baseline of 3. The description adds semantic context by mapping 'domain_name' to 'domain extraction rulebook' and 'sample_size' to sample records for testing, reinforcing the feature extraction domain without repeating schema technicalities.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses specific verb 'Test' with clear resource 'domain extraction rulebook' and scope 'against sample records'. It effectively distinguishes this tool from siblings like create_domain or list_domains by emphasizing the validation/testing nature of the operation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage through words like 'Test' and 'Shows what features would be extracted', suggesting this is a preview/validation tool rather than production extraction. However, it lacks explicit guidance on when to use this versus alternatives like analyze_data or profile_data, and doesn't state prerequisites clearly.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

unmerge_recordAInspect

Remove a record from its cluster. The record becomes a singleton. Remaining cluster members are re-clustered using stored pair scores. Use this to fix bad merges.

ParametersJSON Schema

Name	Required	Description	Default
`record_id`	Yes	Row ID of the record to unmerge

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and successfully discloses key behavioral traits: the record becomes a singleton, and remaining members are re-clustered using 'stored pair scores' (revealing algorithmic dependency). It lacks explicit safety warnings or destructiveness declarations, but the functional behavior is well-documented.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences with zero waste: sentence 1 states the action, sentences 2-3 describe precise side effects (singleton creation and re-clustering), and sentence 4 provides usage context. Information is front-loaded with the core verb, and every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a single-parameter mutation tool without output schema, the description adequately covers the operation mechanics, side effects, and use case. It explains what happens to both the extracted record and the remaining cluster members. Minor gap: does not indicate what the tool returns (e.g., new cluster ID, success status).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage for the single parameter (record_id). The description implicitly reinforces that the record must be in a cluster to be unmerged, but does not add syntax details, validation rules, or semantic constraints beyond what the schema already provides. Baseline 3 is appropriate for complete schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('Remove') and resource ('record from its cluster'), clearly stating the core operation. It distinguishes itself from siblings like shatter_cluster by specifying that only the target record becomes a singleton while remaining members are re-clustered, and contextualizes the purpose as 'fix bad merges' vs. general cluster management.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when-to-use guidance ('Use this to fix bad merges'), giving clear context for appropriate invocation. However, it does not explicitly name alternatives (e.g., 'use shatter_cluster to break the entire cluster instead') or provide explicit when-not-to-use exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [{ "email": "your-email@example.com" }]
}

The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.

Server Details

Available Tools

Discussions

Your Connectors

Resources