GoldenMatch
Server Details
Find duplicate records in 30 seconds. Zero-config entity resolution, 97.2% F1 out of the box.
- Status
- Healthy
- Last Tested
- Transport
- Streamable HTTP
- URL
- Repository
- benzsevern/goldenmatch
- GitHub Stars
- 30
Glama MCP Gateway
Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.
Full call logging
Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.
Tool access control
Enable or disable individual tools per connector, so you decide what your agents can and cannot do.
Managed credentials
Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.
Usage analytics
See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.
Tool Definition Quality
Average 3.2/5 across 30 of 30 tools scored. Lowest: 2.1/5.
Most tools have distinct purposes within the entity resolution domain, but some overlap exists. For example, 'agent_explain_pair' and 'explain_match' both provide explanations for record pairs, which could cause confusion. However, the descriptions clarify that 'agent_explain_pair' offers natural language explanations while 'explain_match' includes per-field score breakdowns, helping to differentiate them.
Tool names generally follow a consistent verb_noun or noun_verb pattern, such as 'list_clusters', 'get_stats', and 'run_transforms'. There are minor deviations like 'agent_approve_reject' (verb_verb_noun) and 'pprl_auto_config' (acronym-based), but overall the naming is readable and predictable across the set.
With 30 tools, the count feels heavy for an entity resolution server, as many tools cover overlapping or highly specific functions. While the domain is complex, the high number may overwhelm agents and lead to misselection. A more streamlined set of 15-20 tools could better balance comprehensiveness and usability.
The tool set provides comprehensive coverage of the entity resolution lifecycle, including data profiling, configuration, matching, clustering, explanation, quality fixes, and export. It supports both standard and privacy-preserving workflows, with tools for creating and testing custom domains, ensuring no significant gaps for agent operations.
Available Tools
30 toolsagent_approve_rejectCInspect
Approve or reject a review queue pair
| Name | Required | Description | Default |
|---|---|---|---|
| id_a | Yes | ||
| id_b | Yes | ||
| reason | No | ||
| decision | Yes | ||
| job_name | Yes | ||
| decided_by | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full disclosure burden. While 'approve or reject' implies a state mutation, it fails to specify side effects (what happens to the pair post-decision?), reversibility, or authentication requirements. It doesn't clarify if this triggers automatic merges or deduplication actions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Extremely concise at seven words. While efficiently structured with the action front-loaded, it borders on underspecification given the 6-parameter complexity and lack of schema descriptions. One additional sentence for parameter guidance would improve utility without sacrificing clarity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Inadequate for a 6-parameter workflow tool with 0% schema coverage and no output schema. Given the deduplication domain context (evident from siblings like agent_deduplicate, merge_record), the description lacks workflow integration details, return value description, or error scenarios. Critical gaps remain for safe agent invocation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 0%, requiring the description to compensate. It only implicitly hints that id_a/id_b represent the 'pair'. It completely omits critical semantics: valid values for 'decision' (strings 'approve'/'reject'? boolean?), the 'job_name' context, or that 'reason' is optional text for audit trails. No syntax or format guidance provided.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
States the specific action (approve/reject) and target resource (review queue pair), which distinguishes it from siblings like agent_review_queue (likely for retrieval) or shatter_cluster (for cluster management). However, it doesn't explain what constitutes a 'review queue pair' or the business context of the decision.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides no guidance on when to use this tool versus siblings like agent_review_queue, agent_compare_strategies, or agent_explain_pair. No mention of workflow prerequisites (e.g., whether the pair must first be retrieved via agent_review_queue) or when rejection vs approval is appropriate.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_compare_strategiesCInspect
Compare ER strategies on your data
| Name | Required | Description | Default |
|---|---|---|---|
| file_path | Yes | ||
| ground_truth | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full disclosure burden but fails to specify whether this is read-only or destructive, what the comparison entails (accuracy vs. speed metrics), or what format results are returned in.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
While brief (5 words), it is underspecified rather than efficiently concise. The single sentence fails to earn its place by providing insufficient actionable information for a tool with complex behavior.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given zero annotations, zero schema descriptions, no output schema, and a complex domain (strategy evaluation), the description is inadequate. It lacks critical context about the comparison methodology and expected inputs/outputs.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema has 0% description coverage, and the description does not compensate. 'On your data' vaguely alludes to file_path but provides no details on acceptable formats, while ground_truth is completely unmentioned despite being a significant parameter for strategy comparison.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states the core action ('Compare') and domain ('ER strategies'), but 'ER' is jargon that assumes familiarity with Entity Resolution. It vaguely references 'your data' without clarifying scope or what distinguishes this from siblings like test_domain or auto_configure.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance provided on when to use this tool versus alternatives. The description does not indicate prerequisites (e.g., whether a domain must exist first) or when agent_compare_strategies is preferable to test_domain or analyze_data.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_deduplicateCInspect
Run full ER pipeline with confidence gating and reasoning
| Name | Required | Description | Default |
|---|---|---|---|
| config | No | ||
| file_path | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Without annotations, the description carries the full burden of behavioral disclosure. While 'confidence gating' and 'reasoning' hint at threshold-based filtering and explainability, the description omits whether the tool creates/updates records, produces side effects, requires specific prerequisites, or how it handles low-confidence matches.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely terse (7 words) with no wasted language, but this brevity is inappropriate given the tool's complexity, lack of annotations, and opaque schema. The density of jargon ('ER pipeline', 'confidence gating') without elaboration reduces clarity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a complex pipeline tool with nested configuration objects, no output schema, and zero schema documentation, a 7-word description is grossly insufficient. It fails to explain return values, pipeline stages, or integration with the sibling tools in the deduplication ecosystem.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 0% schema description coverage and a nested 'config' object containing unknown fields, the description completely fails to compensate. It does not mention the 'file_path' requirement or explain what configuration parameters are expected in the nested object, leaving critical inputs undocumented.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states the core mechanism ('Run full ER pipeline') and specific features ('confidence gating and reasoning'), but fails to distinguish this tool from siblings like 'find_duplicates', 'match_record', or 'agent_match_sources'. It assumes familiarity with the 'ER' acronym without clarifying the scope of the pipeline.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to invoke this versus alternative tools. Given the extensive sibling list includes multiple deduplication and matching utilities, the description lacks critical context about when this 'full pipeline' approach is preferred over granular operations.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_explain_clusterCInspect
Explain why records are in the same cluster
| Name | Required | Description | Default |
|---|---|---|---|
| cluster_id | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. While 'explain' implies a read-only operation, the description fails to disclose the explanation format, data sources considered, or any side effects such as logging or audit trails.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description consists of a single, efficient sentence with no wasted words. However, given the lack of schema documentation and annotations, the extreme brevity arguably underserves the tool's documentation needs.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given zero schema description coverage and no annotations, the description is insufficiently complete. It fails to document the required cluster_id parameter, output format, or behavioral constraints necessary for correct invocation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, requiring the description to compensate. While the text mentions 'cluster' (semantically related to the cluster_id parameter), it does not explicitly document the parameter, its valid values, or where to obtain a cluster ID, providing minimal compensation for the schema gap.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses a specific verb ('explain') and resource ('records in the same cluster'), making the core function clear. However, it does not explicitly differentiate from the sibling tool 'agent_explain_pair' (which explains pairs vs. clusters) within the description text itself.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives like 'agent_explain_pair' or 'get_cluster', nor does it mention prerequisites or conditions where this tool should not be used.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_explain_pairCInspect
Natural language explanation for a record pair
| Name | Required | Description | Default |
|---|---|---|---|
| exact | No | ||
| fuzzy | No | ||
| record_a | Yes | ||
| record_b | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure but only indicates the output is 'natural language'. It fails to disclose whether the operation is read-only, deterministic, expensive/computationally heavy, or whether the explanation generation relies on external LLM calls versus rule-based logic.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence with no redundant words. However, given the tool's complexity (nested objects, four parameters, no output schema), it is arguably too concise to be functionally helpful, though technically well-structured.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a tool with four parameters (including nested objects), zero schema documentation, no annotations, and no output schema, the description is inadequate. It lacks necessary context about the explanation's scope, the structure of input records, or what constitutes a valid 'pair' in this domain.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 0% schema description coverage, the description must compensate for all four parameters. While 'record pair' implicitly maps to 'record_a' and 'record_b', it completely omits the 'exact' (array) and 'fuzzy' (object) parameters, leaving critical matching criteria undocumented. It does not explain parameter formats or relationships.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states the tool generates a 'Natural language explanation for a record pair', identifying the general action and target resource. However, it fails to specify what aspect of the pair is being explained (e.g., similarity, differences, match reasoning) and does not distinguish from siblings like 'explain_match' or 'agent_compare_strategies' which may also operate on record pairs.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives such as 'agent_explain_cluster' (for groups) or 'explain_match'. It lacks prerequisites, conditions for use, or explicit exclusions that would help an agent select the correct tool from the extensive sibling list.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_match_sourcesCInspect
Match two files with intelligent strategy selection
| Name | Required | Description | Default |
|---|---|---|---|
| config | No | ||
| file_a | Yes | ||
| file_b | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions 'intelligent strategy selection' suggesting automatic algorithm selection, but fails to disclose critical safety information: whether this creates persistent entities/domains, modifies input files, requires specific permissions, or what format the output takes (clusters? links? scores?).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single efficient sentence with no redundant words. However, given the complete absence of schema documentation and annotations, this brevity results in under-specification rather than effective conciseness—the description is too short for the information required.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a complex file-matching operation with 3 parameters (including a nested config object), 0% schema coverage, no annotations, and no output schema, the description is grossly incomplete. It fails to explain the matching methodology, the role of configuration, return value structure, or side effects (e.g., whether it creates clusters automatically).
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, so the description must compensate. It implicitly identifies 'file_a' and 'file_b' as files to be matched, but provides no information about the 'config' object—its purpose, required fields, or structure. Given the nested object complexity, this omission leaves a critical parameter undocumented.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description provides a clear verb ('Match') and identifies the resources ('two files'), plus adds specific context about 'intelligent strategy selection' that distinguishes it from simple record matching. However, it doesn't explicitly clarify how this differs from the sibling tool 'match_record' (file-level vs. record-level matching).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives like 'match_record', 'agent_compare_strategies', or 'pprl_link'. It omits prerequisites (e.g., whether files must be pre-registered as domains) and gives no indication of when this tool is inappropriate.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_review_queueCInspect
Get borderline pairs awaiting approval
| Name | Required | Description | Default |
|---|---|---|---|
| job_name | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure but offers minimal information. While 'Get' implies a read-only operation, the description fails to specify return format (array of pairs vs. cursor), pagination behavior, or whether the call blocks if the queue is empty.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is efficiently worded at only five words and is front-loaded with the action and object. However, given the lack of schema documentation and output schema, this brevity results in underspecification rather than effective conciseness.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the absence of an output schema and annotations, the description should explain what the tool returns (e.g., structure of borderline pairs, confidence scores) and document the required parameter. It provides neither, leaving critical gaps for an agent attempting to invoke the tool correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 0% description coverage for the required 'job_name' parameter, and the description text fails to compensate by explaining what job_name represents, its expected format, or how to obtain valid values. It adds no semantic meaning beyond the property name itself.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses a specific verb ('Get') and resource ('borderline pairs awaiting approval'), clearly distinguishing it from sibling tools like 'agent_approve_reject' (which acts on pairs) by focusing on retrieval of the queue itself. However, it assumes domain knowledge about what constitutes 'borderline' in this entity resolution context.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no explicit guidance on when to use this tool versus alternatives like 'list_clusters' or 'find_duplicates', nor does it mention prerequisites such as requiring an active job or matching process to exist before calling.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
analyze_dataCInspect
Profile data, detect domain, recommend ER strategy
| Name | Required | Description | Default |
|---|---|---|---|
| file_path | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description fails to disclose critical behavioral traits. It does not indicate whether the tool modifies the source file, what format the recommendations take, or whether it requires specific permissions to access the file_path.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a terse 6-word fragment with no structural waste, but it is under-specified to the point of being a tagline rather than a functional description. Appropriate density for the length, but the length itself is inadequate.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a tool with no output schema and no annotations, the description provides insufficient context. Given the complexity of the sibling tools (entity resolution, clustering, matching), the agent needs more information about what 'domain' and 'ER strategy' mean in this specific context.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 0% description coverage for the required 'file_path' parameter. The description completely fails to compensate, providing no information about expected file formats, path conventions, or whether the file should be local/remote.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description lists three specific actions (profile data, detect domain, recommend ER strategy) using concrete verbs. It implicitly distinguishes itself from the sibling 'profile_data' by indicating additional capabilities beyond simple profiling.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance provided on when to use this tool versus siblings like 'profile_data' or 'suggest_config'. No mention of prerequisites, file format requirements, or workflow positioning within the ER pipeline.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
auto_configureCInspect
Generate optimal matching config from data analysis
| Name | Required | Description | Default |
|---|---|---|---|
| file_path | Yes | ||
| constraints | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full disclosure burden. It mentions 'data analysis' as the source but fails to explain what the analysis entails, whether the config is persisted, returned as an object, or what 'optimal' criteria are used. Missing critical behavioral context for a configuration generation tool.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single sentence of six words is efficiently structured without redundancy, but given the 0% schema coverage and lack of annotations, this brevity constitutes under-specification rather than effective conciseness.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Inadequate for the complexity: nested object parameter, no output schema, no annotations, and a domain-rich environment (record linkage/deduplication). The description omits return value structure, config format, and error conditions necessary for agent invocation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 0% for both parameters. The description implies 'file_path' contains data for analysis but provides no details on format requirements. The 'constraints' object (a nested parameter) is entirely undocumented—no explanation of its structure, required fields, or how it limits the configuration generation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states the core action (Generate... config) and domain (matching), but 'optimal' is vague and it fails to differentiate from siblings like 'suggest_config' or 'pprl_auto_config'. The phrase 'from data analysis' hints at the methodology but doesn't clarify the specific matching domain context evident from sibling tools.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance provided on when to use this versus alternatives like 'suggest_config' or manual configuration. No prerequisites or exclusions are mentioned despite the complex nested 'constraints' parameter.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
create_domainAInspect
Create a custom domain extraction rulebook. Define patterns for a specific data domain (medical devices, automotive parts, real estate, etc.).
| Name | Required | Description | Default |
|---|---|---|---|
| name | Yes | Domain name (e.g. 'medical_devices', 'automotive_parts') | |
| scope | No | Save locally (.goldenmatch/domains/) or globally (~/.goldenmatch/domains/). Default: local. | local |
| signals | Yes | Column name keywords that trigger this domain (e.g. ['ndc', 'fda', 'implant']) | |
| stop_words | No | Words to strip during name normalization | |
| brand_patterns | No | Brand/manufacturer names to extract (e.g. ['Medtronic', 'Abbott']) | |
| attribute_patterns | No | Named regex patterns for domain attributes (e.g. {'size': '\\b(\\d+mm)\\b'}) | |
| identifier_patterns | No | Named regex patterns for domain identifiers (e.g. {'ndc': '\\b(\\d{5}-\\d{4}-\\d{2})\\b'}) |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full burden but discloses minimal behavioral traits. Fails to mention persistence mechanism (files created in .goldenmatch/domains/), idempotency behavior (what happens if name exists), or side effects. Schema reveals file paths, but description doesn't explain the creation's impact.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences with zero waste. First sentence states core action; second provides pattern definition context with helpful examples. Appropriately front-loaded with the creation verb.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Adequate for the 7-parameter complexity with good schema coverage, but lacks system context (this is for entity resolution/matching per siblings) and behavioral details expected when no annotations or output schema exist. Should clarify relationship to matching pipeline.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, establishing baseline 3. Description adds value by providing concrete domain examples ('medical_devices', 'automotive_parts') that help agents understand the semantic intent of the 'name' and 'signals' parameters, elevating above pure schema repetition.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description opens with specific verb 'Create' and resource 'custom domain extraction rulebook'. Examples (medical devices, automotive parts) clarify the abstract 'domain' concept and distinguish this from sibling tools list_domains (listing) and test_domain (testing).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides implied usage context through domain examples, indicating when to use (for specific verticals like medical devices). However, lacks explicit when-to-use guidance or differentiation from test_domain (which validates rules) and no mention of prerequisites or alternatives.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
explain_matchBInspect
Explain why two records match or don't match. Shows per-field score breakdown.
| Name | Required | Description | Default |
|---|---|---|---|
| record_a | Yes | First record fields | |
| record_b | Yes | Second record fields |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description must carry the full burden. It discloses that the tool provides a 'per-field score breakdown,' which is valuable behavioral context about the output format. However, it lacks disclosure on side effects, idempotency, or authorization requirements, though the verb 'Explain' implies a read-only operation.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description consists of two efficient sentences with zero waste. The primary purpose is front-loaded in the first sentence, while the second sentence adds specific value about the output format. Every word earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the moderate complexity (2 parameters with 100% schema coverage, nested objects, no output schema), the description is reasonably complete. It compensates for the missing output schema by describing the expected analysis format ('per-field score breakdown'), though it could benefit from explicit safety declarations given the lack of annotations.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Input schema has 100% description coverage ('First record fields', 'Second record fields'), establishing a baseline of 3. The description mentions 'two records' and 'per-field' breakdown, which aligns with the schema but does not add significant semantic detail about expected record structure or required fields beyond what the schema provides.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool explains why two records match or don't match, using specific verbs ('Explain', 'Shows') and identifying the resource (records, per-field scores). However, it does not explicitly differentiate from similar sibling tools like 'agent_explain_pair'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives such as 'match_record' (which likely performs matching without explanation) or 'agent_explain_pair'. There are no stated prerequisites or exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
export_resultsCInspect
Export matching results to a file (CSV or JSON).
| Name | Required | Description | Default |
|---|---|---|---|
| format | No | Output format (default csv) | csv |
| output_path | Yes | File path to save results |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description fails to disclose critical behavioral traits such as file overwrite behavior, idempotency, whether the operation blocks until completion, or required permissions for writing to the specified path.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence with no wasted words. However, given the absence of annotations and output schema, the extreme brevity leaves important behavioral questions unanswered, suggesting it is slightly under-specified rather than optimally concise.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a two-parameter tool with complete schema documentation, the description is minimally adequate. However, it should clarify what constitutes 'matching results' (e.g., from match_record or find_duplicates) and address file system interactions given the lack of annotations and output schema.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, providing the baseline. The description parenthetically mentions 'CSV or JSON' which reinforces the format enum, but adds no additional semantic context about the output_path requirements or format implications beyond what the schema already provides.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action (export), resource (matching results), destination (file), and supported formats (CSV or JSON). However, it assumes familiarity with what 'matching results' refers to without defining the context from sibling matching tools.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance provided on when to use this tool versus alternatives, nor prerequisites such as requiring prior matching operations to generate results. The description lacks 'when-not-to-use' or sequencing information.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
find_duplicatesBInspect
Find duplicate matches for a record. Provide field values to search against the loaded dataset.
| Name | Required | Description | Default |
|---|---|---|---|
| top_k | No | Max results to return (default 5) | |
| record | Yes | Record fields to match (e.g. {"name": "John Smith", "zip": "10001"}) |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions the 'loaded dataset' prerequisite but fails to describe the matching algorithm (fuzzy vs. exact), the return format (since no output schema exists), or whether results include similarity scores. This leaves critical behavioral traits undocumented.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description consists of exactly two efficient sentences with no redundant words. The first sentence establishes purpose immediately, and the second provides the essential context about the dataset requirement. Every word earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the absence of both annotations and an output schema, the description is minimally adequate. It correctly identifies the dataset prerequisite but omits critical context such as return value structure, read-only nature, and differentiation from the numerous sibling deduplication tools available on this server.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the schema already fully documents both 'record' and 'top_k' parameters. The description adds no additional semantic value regarding parameter formats, validation rules, or example values beyond what the schema provides, warranting the baseline score.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the core action (find duplicate matches) and target (a record). However, it fails to distinguish this tool from similar siblings like 'agent_deduplicate' or 'match_record', which could confuse an agent trying to select between them.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides minimal usage guidance beyond the tautological instruction to 'provide field values.' It offers no criteria for when to use this tool versus alternatives like 'agent_deduplicate' or 'match_record', nor does it mention prerequisites for the 'loaded dataset' it references.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
fix_qualityAInspect
Run GoldenCheck scan and apply fixes to a CSV file. Returns the fixed data summary and a manifest of all fixes applied. Requires goldencheck: pip install goldenmatch[quality]
| Name | Required | Description | Default |
|---|---|---|---|
| domain | No | Optional domain hint (healthcare, finance, ecommerce) | |
| fix_mode | No | Fix aggressiveness: safe (conservative) or moderate (balanced). Default: safe | safe |
| file_path | Yes | Path to the CSV file to fix | |
| output_path | No | Optional path to save the fixed CSV. If omitted, returns summary only. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It successfully discloses the return values (fixed data summary and manifest) and external dependency requirement. However, it does not explicitly clarify whether the original file is modified in-place or if the operation is strictly read-only without output_path.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences, each earning its place: action statement, return value disclosure, and dependency requirement. No redundancy and appropriately front-loaded with the core operation.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a 4-parameter tool with simple types and no output schema, the description adequately covers what the tool returns and its external requirements. It could be improved by clarifying file handling behavior (in-place vs. copy) and response format details.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the schema fully documents all parameters including fix_mode options and output_path behavior. The description adds minimal semantic context beyond the schema, though 'GoldenCheck scan' provides domain context for the domain parameter hints.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action (Run GoldenCheck scan and apply fixes) and resource (CSV file). It distinguishes itself from sibling 'scan_quality' by explicitly mentioning 'apply fixes' rather than just scanning.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides critical prerequisite information (requires goldencheck package installation), which is essential for execution. However, it lacks explicit guidance on when to use this versus the sibling 'scan_quality' tool or when to choose 'safe' versus 'moderate' fix modes.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_clusterAInspect
Get details of a specific cluster: all member records and their field values.
| Name | Required | Description | Default |
|---|---|---|---|
| cluster_id | Yes | Cluster ID to look up |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full disclosure burden. It partially compensates by describing the return data scope ('all member records and their field values'), which hints at the response structure. However, it omits safety traits (read-only status), rate limits, or authentication requirements that annotations would typically cover.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence with zero waste. The first clause states the action ('Get details of a specific cluster') and the colon-delimited second clause immediately specifies the return payload, placing critical information at the front.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's low complexity (single required parameter, no nested objects) and lack of output schema, the description adequately compensates by characterizing the returned data ('all member records and their field values'). It could be improved by explicitly stating this is a read-only operation given the absence of safety annotations.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100% with the cluster_id parameter fully documented as 'Cluster ID to look up'. The description mentions 'specific cluster' which aligns with the parameter but adds no additional syntax guidance, format details, or validation rules beyond the schema. Baseline 3 is appropriate for high schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses specific verb 'Get' with resource 'cluster' and clarifies the scope ('all member records and their field values'). It implicitly distinguishes from list_clusters via 'specific cluster' and from agent_explain_cluster by returning raw data rather than explanations, though it doesn't explicitly name siblings.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage through 'specific cluster' (suggesting single-entity lookup vs. bulk listing) and 'all member records' (suggesting deep inspection). However, it lacks explicit when-to-use guidance or named alternatives like 'use list_clusters to browse available clusters first'.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_golden_recordCInspect
Get the merged golden (canonical) record for a cluster.
| Name | Required | Description | Default |
|---|---|---|---|
| cluster_id | Yes | Cluster ID |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, placing the full burden of behavioral disclosure on the description. While 'merged' and 'canonical' hint at entity resolution behavior, the description fails to clarify what constitutes a golden record, whether the operation is read-only, or what occurs when an invalid cluster_id is provided.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description consists of a single, efficient sentence that front-loads the verb and object. Every word serves a purpose, though the extreme brevity comes at the cost of explanatory depth given the absence of annotations and output schema.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a single-parameter retrieval tool, the description adequately identifies the return concept (a merged canonical record). However, without an output schema, it misses the opportunity to describe the record's structure or explain the domain-specific 'golden record' concept for agents unfamiliar with entity resolution terminology.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage for its single parameter (cluster_id). The description mentions 'for a cluster,' which correlates to the parameter, but adds no semantic value regarding valid ranges or constraints. With the schema fully self-documenting, the description meets the baseline expectation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('Get') and the specific resource ('merged golden (canonical) record'), aligning with the cluster_id parameter. However, it does not explicitly differentiate from the sibling tool 'get_cluster,' leaving ambiguity about whether this retrieves the cluster container versus the consolidated entity within it.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives like 'get_cluster' or 'agent_explain_cluster.' It omits prerequisites (such as whether the cluster must exist or be deduplicated first) and offers no exclusions or error conditions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_statsBInspect
Get dataset statistics: record count, cluster count, match rate, cluster sizes.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. While it lists output metrics, it fails to mention performance characteristics (expensive calculation vs. cached), side effects, or whether the statistics are real-time or approximate.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficiently structured sentence that front-loads the action and precisely enumerates return values without redundant prose or filler text.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the absence of an output schema, the description adequately compensates by listing the specific statistics returned. For a zero-parameter tool, this is sufficient, though mentioning the scope (current dataset vs. all data) would improve completeness.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema contains zero parameters, establishing a baseline score of 4. The description appropriately requires no additional parameter explanation since the tool operates on the implicit current dataset context.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses a specific verb ('Get') and resource ('dataset statistics') and enumerates exact metrics returned (record count, cluster count, match rate, cluster sizes). However, it does not explicitly differentiate from analytical siblings like analyze_data or profile_data that might overlap in functionality.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description states what the tool retrieves but provides no guidance on when to use this tool versus alternatives like analyze_data or profile_data. There are no stated prerequisites, exclusions, or conditions for invocation.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_clustersBInspect
List duplicate clusters found in the dataset. Returns cluster IDs, sizes, and member counts.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Max clusters to return (default 20) | |
| min_size | No | Minimum cluster size to include (default 2) |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Since no annotations are provided, the description must carry full behavioral burden. It discloses return values (cluster IDs, sizes, counts) but omits read-only status, pagination behavior (whether 'limit' implies cursor/page tokens), and performance characteristics (dataset size implications).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences with zero redundancy. Front-loaded with the core action ('List duplicate clusters') followed immediately by return value specification. Every word earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a 2-parameter listing tool with no output schema, the description provides minimal return value hints but lacks structural details (array vs object format) and safety confirmation. Adequate but not robust given the absence of output schema and annotations.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the schema adequately documents 'limit' and 'min_size'. The description adds no parameter context, meeting the baseline expectation for high-coverage schemas.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool lists 'duplicate clusters' with specific return values (IDs, sizes, member counts). However, it does not differentiate from sibling tool 'get_cluster' (singular retrieval) or clarify when to use listing versus individual retrieval.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance provided on when to use this tool versus alternatives like 'get_cluster' or 'agent_deduplicate'. Missing prerequisites (e.g., dataset must be analyzed first) and workflow context (listing before detailed examination).
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_domainsAInspect
List available domain extraction rulebooks (built-in + user-defined).
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It adds valuable scope context ('built-in + user-defined') indicating the result set contents, but fails to disclose safety characteristics (read-only status), pagination behavior, or return format structure.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence of seven words. It is front-loaded with the action verb and contains zero redundancy or filler, with every word earning its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's low complexity (zero parameters) and lack of output schema, the description adequately covers the core functionality by specifying what is listed. However, it could be improved by briefly describing the return structure since no output schema is available.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema contains zero parameters. Per the evaluation rules, zero parameters establishes a baseline score of 4, as there are no parameter semantics to elaborate upon.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description provides a specific verb ('List'), clear resource ('domain extraction rulebooks'), and scope qualification ('built-in + user-defined'). This clearly distinguishes it from siblings like create_domain and test_domain through the action verb alone.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no explicit guidance on when to use this tool versus alternatives such as create_domain or test_domain. It lacks 'when to use' or 'when not to use' clauses, forcing the agent to infer applicability solely from the verb.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
match_recordAInspect
Match a single record against the loaded dataset in real-time. Paste a record's fields and instantly see if it matches any existing record. Uses the configured matchkeys, scorers, and thresholds. Example: {"name": "John Smith", "email": "john@test.com", "zip": "10001"}
| Name | Required | Description | Default |
|---|---|---|---|
| top_k | No | Max matches to return (default 5) | |
| record | Yes | Record fields to match against the dataset | |
| threshold | No | Minimum score to consider a match (default: use config threshold) |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It discloses configuration dependencies ('Uses the configured matchkeys, scorers, and thresholds') but fails to clarify if the operation is read-only, what happens when no match is found, or the return value structure.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three focused sentences plus a concrete example. Front-loaded with purpose, followed by usage instruction and configuration context. No redundant words; every element earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Adequate for a 3-parameter tool, but given the lack of output schema and annotations, the description should explain what constitutes a successful match return (records, scores, or boolean) and confirm the read-only nature of the operation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Despite 100% schema coverage, the description adds significant value through the concrete JSON example for the 'record' parameter, illustrating expected field structure. This goes beyond the schema's generic 'Record fields to match' description.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action (match a single record against the loaded dataset) and scope (real-time). It effectively distinguishes from siblings like 'find_duplicates' (batch) and 'explain_match' (analysis) by emphasizing 'single record' and 'real-time' matching.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The phrase 'Paste a record's fields and instantly see' implies interactive usage for single-record validation, but it lacks explicit when-to-use guidance or named alternatives (e.g., when to use 'find_duplicates' vs this tool).
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
pprl_auto_configBInspect
Analyze the loaded dataset and recommend optimal PPRL (privacy-preserving record linkage) configuration. Returns recommended fields, bloom filter parameters, threshold, and explanation.
| Name | Required | Description | Default |
|---|---|---|---|
| use_llm | No | Use LLM for enhanced recommendations (requires API key) | |
| security_level | No | Security level (default: high) | high |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It compensates for the missing output schema by documenting return values (fields, bloom filter parameters, threshold, explanation) and clarifies the 'analyze' scope works on the 'loaded dataset'. However, it lacks details on execution time, deterministic behavior, or side effects.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two well-structured sentences: first stating the action and domain, second detailing the return structure. No redundant words or tautologies; information density is high.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Adequately compensates for the lack of output schema by describing return values. However, given the presence of similar configuration siblings ('suggest_pprl', 'auto_configure'), the description is incomplete without differentiation guidance or explicit prerequisites beyond the passing reference to 'loaded dataset'.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, providing full documentation for 'use_llm' and 'security_level'. The description adds no parameter-specific guidance, meeting the baseline expectation when the schema is self-documenting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clearly states the tool analyzes the loaded dataset and recommends PPRL configurations, specifying exact outputs (bloom filter parameters, threshold, fields). However, it fails to differentiate from similar siblings like 'suggest_pprl' or 'auto_configure'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides no guidance on when to use this versus alternatives like 'suggest_pprl' or 'auto_configure'. While it implicitly references a prerequisite ('loaded dataset'), it does not explicitly state that a dataset must be loaded first or when this automated approach is preferred over manual configuration.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
pprl_linkBInspect
Run privacy-preserving record linkage between two parties' data. Computes bloom filters, matches records without sharing raw data. Specify fields, threshold, and security level.
| Name | Required | Description | Default |
|---|---|---|---|
| fields | Yes | Field names to match on (e.g. ['first_name', 'last_name', 'zip_code']) | |
| file_a | Yes | Path to party A's CSV file | |
| file_b | Yes | Path to party B's CSV file | |
| threshold | No | Match threshold (default: auto-detected) | |
| security_level | No | high |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. Discloses privacy mechanism ('bloom filters') and safety property ('without sharing raw data'), but omits operational details like output format, side effects (file writes?), performance characteristics, or error behavior expected for a cryptographic tool.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three efficient sentences with clear structure: purpose, technical method, parameters. No redundant text. Minor opportunity to front-load the 'two parties' distinction earlier, but overall well-paced.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 80% schema coverage and moderate complexity, description covers basic operation but gaps remain. Missing output specification (critical given no output_schema), side effect disclosure, and resource requirements. Adequate but incomplete for safe agent invocation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 80%, establishing baseline 3. Description lists parameters ('Specify fields, threshold, and security level') but adds minimal semantic value beyond schema descriptions. Does not explain threshold ranges, security_level implications, or file path requirements.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
States specific action ('Run privacy-preserving record linkage') and resource ('two parties' data'). Distinguishes from siblings like find_duplicates and match_record by specifying 'privacy-preserving' technique and 'two parties' context, though it doesn't explicitly contrast with pprl_auto_config or suggest_pprl.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Implies usage context via 'two parties' data' (suggesting external data linkage vs internal deduplication), but lacks explicit when-to-use guidance, prerequisites (e.g., file formats), or contrasts with sibling tools like pprl_auto_config.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
profile_dataBInspect
Get data quality profile: column types, null rates, unique counts, sample values.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It discloses the output structure by listing what the profile includes, but fails to mention operational traits like read-only safety, performance characteristics, or whether it samples the full dataset.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, front-loaded sentence that immediately states the action and resource, followed by a colon-delimited list clarifying the specific data returned. There is no wasted text.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the lack of an output schema, the description partially compensates by listing return values. However, given the high density of sibling analysis tools and lack of annotations, the description is incomplete without usage context or safety disclosures.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema contains zero parameters. Per the scoring rubric, 0 params establishes a baseline of 4. The description appropriately adds no parameter-related text since none exist.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool 'Get[s] data quality profile' and enumerates specific metrics returned (column types, null rates, unique counts, sample values). However, it does not explicitly differentiate this from siblings like 'get_stats' or 'analyze_data'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus the numerous sibling analysis tools (e.g., 'analyze_data', 'get_stats', 'test_domain'). There are no stated prerequisites, exclusions, or alternative recommendations.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
run_transformsAInspect
Run GoldenFlow data transforms on a CSV file. Normalizes phone numbers (E.164), dates (ISO), categorical spelling, and Unicode issues. Returns a manifest of transforms applied. Requires goldenflow: pip install goldenmatch[transform]
| Name | Required | Description | Default |
|---|---|---|---|
| file_path | Yes | Path to the CSV file to transform | |
| output_path | No | Optional path to save the transformed CSV. If omitted, returns summary only. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. Discloses critical operational requirements (pip install goldenmatch[transform]) and return value (manifest of transforms applied). Clarifies that output is a summary/manifest rather than just the transformed file. Does not mention side effects (whether original file is modified) or error handling.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Four sentences with zero waste: operation definition, specific transformations, return value, and prerequisites. Front-loaded with main purpose. Every sentence provides unique information not redundant with schema or other sentences.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Appropriately complete for a 2-parameter tool with simple schema. Prerequisites and return value are documented despite lack of output schema. Minor gap: doesn't clarify if tool creates temporary files, handles non-CSV formats, or behavior when output_path is provided (does it still return manifest?).
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with clear descriptions for both file_path and output_path. Description reinforces the CSV file target and clarifies that 'summary only' mentioned in schema means 'manifest of transforms applied'. Baseline 3 is appropriate since schema does heavy lifting, though description adds useful context about the return format.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Specific verb (Run) + resource (GoldenFlow data transforms) + target (CSV file). Lists exact normalization operations (E.164 phone numbers, ISO dates, categorical spelling, Unicode) that clearly distinguish it from sibling matching/deduplication tools (agent_match_sources, find_duplicates, etc.).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No explicit when-to-use or when-not-to-use guidance comparing it to similar siblings like fix_quality or analyze_data. However, the specific transformation types listed (phone/date normalization) provide implicit context for when this tool is appropriate vs. general quality scanning.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
scan_qualityAInspect
Run GoldenCheck data quality scan on a CSV file. Returns issues found (encoding errors, Unicode problems, format violations) without applying fixes. Requires goldencheck: pip install goldenmatch[quality]
| Name | Required | Description | Default |
|---|---|---|---|
| domain | No | Optional domain hint (healthcare, finance, ecommerce) | |
| file_path | Yes | Path to the CSV file to scan |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations, description carries full burden. It discloses specific issue types detected (encoding errors, Unicode problems, format violations), clarifies non-destructive read-only behavior ('without applying fixes'), and warns about external dependency requirement. Could explicitly confirm it does not modify the input file.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences efficiently structured: action/purpose first, return behavior second, prerequisites third. Every sentence earns its place with zero redundancy. Appropriate length for tool complexity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a 2-parameter tool without output schema, description adequately covers functionality, return value concept (issues found), and operational constraints. Missing only explicit read-only confirmation and output structure details, though these are partially implied by the text.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, establishing baseline 3. Description references 'CSV file' (mapping to file_path) but does not add semantic context beyond what the schema already provides for either parameter. No additional format guidance or domain parameter elaboration needed due to comprehensive schema documentation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description states specific action ('Run GoldenCheck data quality scan'), target resource ('CSV file'), and distinguishes from sibling tool fix_quality by explicitly noting it returns issues 'without applying fixes'. Clear verb-resource combination with scope differentiation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Implicitly guides usage by contrasting with fix operations ('without applying fixes') and states prerequisite ('Requires goldencheck'). However, it does not explicitly state when to choose this over analyze_data or profile_data siblings, or when to invoke fix_quality afterward.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
shatter_clusterAInspect
Break an entire cluster into individual records. All members become singletons. Use when a cluster is completely wrong.
| Name | Required | Description | Default |
|---|---|---|---|
| cluster_id | Yes | Cluster ID to shatter |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden and successfully explains the state transformation (cluster becomes singletons) and implied destructiveness via 'Break' and 'shatter'. Minor gap: does not explicitly address reversibility, permissions, or side effects beyond the state change.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences efficiently structured: action statement, behavioral outcome, and usage guideline. Zero redundancy—every sentence earns its place. Front-loaded with the core action ('Break').
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a single-parameter destructive tool without output schema, the description adequately covers the operation mechanics and usage context. Minor deduction for lacking explicit safety warnings (irreversibility) or error conditions, but sufficient for agent invocation decisions.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with cluster_id fully documented as 'Cluster ID to shatter'. The description does not add parameter-specific semantics (e.g., valid ID formats, where to obtain IDs), but the schema is self-sufficient. Baseline score appropriate for high schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses specific verbs ('Break', 'shatter') and clearly identifies the resource (cluster) and outcome (individual records/singletons). It effectively distinguishes this from siblings like get_cluster (retrieval) or unmerge_record (single record operation) by specifying 'entire cluster'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides clear when-to-use guidance ('Use when a cluster is completely wrong'), helping agents distinguish this nuclear option from partial correction tools. Lacks explicit naming of alternatives (e.g., unmerge_record for individual fixes), but the 'completely wrong' criterion provides strong contextual guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
suggest_configAInspect
Analyze bad merges and suggest config changes. Provide examples of incorrect merges (pairs that should NOT have matched) and GoldenMatch will identify which fields/thresholds to tighten. Example: [{"record_a": {...}, "record_b": {...}, "reason": "different people"}]
| Name | Required | Description | Default |
|---|---|---|---|
| bad_merges | Yes | List of bad merge examples with record_a, record_b, and optional reason |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full burden but only discloses the core analysis mechanism ('identify which fields/thresholds to tighten'). It omits critical behavioral details: whether suggestions are immediately applied or returned for review, if the operation is idempotent, or any rate limiting concerns.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Excellent structure with three efficient sentences: purpose statement, usage instructions, and concrete example. Every sentence earns its place with no redundancy or extraneous information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Adequate for the input side given 100% schema coverage and the helpful example. Minor gap in not describing what the suggestions look like (output format), though no output schema exists to supplement this. Could better contextualize within the configuration workflow.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
While the schema has 100% coverage (baseline 3), the description adds valuable context through the concrete JSON example showing the expected structure with 'record_a', 'record_b', and 'reason' fields, clarifying the expected input format beyond the schema's abstract description.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool 'Analyze[s] bad merges and suggest[s] config changes' with specific verbs and resources. It effectively distinguishes from siblings like auto_configure or find_duplicates by specifying this requires manual examples of incorrect merges to drive the configuration suggestions.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides clear positive guidance on when to use ('Provide examples of incorrect merges... and GoldenMatch will identify which fields/thresholds to tighten'). However, lacks explicit contrast with alternatives like auto_configure or guidance on when NOT to use this versus automatic configuration options.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
suggest_pprlCInspect
Check if data needs privacy-preserving matching
| Name | Required | Description | Default |
|---|---|---|---|
| file_path | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. While 'Check' implies a read-only assessment, the description fails to specify what the tool returns (boolean, report, or recommendation), whether it has side effects, or what 'needs' means in this context (sensitivity detection, column analysis, etc.).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence with no redundant words. It is appropriately front-loaded with the action verb, though extreme brevity contributes to information gaps elsewhere.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the lack of output schema, annotations, and schema parameter descriptions, the description is insufficient. It omits critical details about the return value format, the nature of the PPRL recommendation, and what constitutes 'data' in this context.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 0%, requiring the description to compensate. It partially succeeds by implying the file_path should point to 'data' rather than configuration, but fails to specify expected file formats, content structure, or whether the path is local or remote.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states the tool checks whether data requires privacy-preserving matching, providing a specific verb and resource. However, it does not clarify how this differs from sibling tools like `pprl_auto_config` or `analyze_data`, leaving ambiguity about when to select this specific tool.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives like `suggest_config`, `analyze_data`, or `pprl_auto_config`. It lacks explicit prerequisites, exclusion criteria, or workflow context.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
test_domainAInspect
Test a domain extraction rulebook against sample records. Shows what features would be extracted from the loaded data.
| Name | Required | Description | Default |
|---|---|---|---|
| domain_name | Yes | Name of the domain rulebook to test | |
| sample_size | No | Number of records to test (default 10) |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It adds valuable context that data must be 'loaded' first (prerequisite) and implies read-only behavior through 'Shows' and 'would be extracted'. However, it fails to explicitly confirm safety (no state modification), performance limits, or idempotency.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences with zero waste. Front-loaded with the core action ('Test...'), followed by the specific output ('Shows what features...'). Every word earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a 2-parameter tool without output schema, the description adequately explains the return value (shows extracted features) and the prerequisite (loaded data). It could be improved by explicitly stating this is a safe, non-destructive operation, but covers the essential behavioral contract.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, establishing a baseline of 3. The description adds semantic context by mapping 'domain_name' to 'domain extraction rulebook' and 'sample_size' to sample records for testing, reinforcing the feature extraction domain without repeating schema technicalities.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses specific verb 'Test' with clear resource 'domain extraction rulebook' and scope 'against sample records'. It effectively distinguishes this tool from siblings like create_domain or list_domains by emphasizing the validation/testing nature of the operation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage through words like 'Test' and 'Shows what features would be extracted', suggesting this is a preview/validation tool rather than production extraction. However, it lacks explicit guidance on when to use this versus alternatives like analyze_data or profile_data, and doesn't state prerequisites clearly.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
unmerge_recordAInspect
Remove a record from its cluster. The record becomes a singleton. Remaining cluster members are re-clustered using stored pair scores. Use this to fix bad merges.
| Name | Required | Description | Default |
|---|---|---|---|
| record_id | Yes | Row ID of the record to unmerge |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden and successfully discloses key behavioral traits: the record becomes a singleton, and remaining members are re-clustered using 'stored pair scores' (revealing algorithmic dependency). It lacks explicit safety warnings or destructiveness declarations, but the functional behavior is well-documented.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Four sentences with zero waste: sentence 1 states the action, sentences 2-3 describe precise side effects (singleton creation and re-clustering), and sentence 4 provides usage context. Information is front-loaded with the core verb, and every sentence earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a single-parameter mutation tool without output schema, the description adequately covers the operation mechanics, side effects, and use case. It explains what happens to both the extracted record and the remaining cluster members. Minor gap: does not indicate what the tool returns (e.g., new cluster ID, success status).
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Input schema has 100% description coverage for the single parameter (record_id). The description implicitly reinforces that the record must be in a cluster to be unmerged, but does not add syntax details, validation rules, or semantic constraints beyond what the schema already provides. Baseline 3 is appropriate for complete schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses a specific verb ('Remove') and resource ('record from its cluster'), clearly stating the core operation. It distinguishes itself from siblings like shatter_cluster by specifying that only the target record becomes a singleton while remaining members are re-clustered, and contextualizes the purpose as 'fix bad merges' vs. general cluster management.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit when-to-use guidance ('Use this to fix bad merges'), giving clear context for appropriate invocation. However, it does not explicitly name alternatives (e.g., 'use shatter_cluster to break the entire cluster instead') or provide explicit when-not-to-use exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:
{
"$schema": "https://glama.ai/mcp/schemas/connector.json",
"maintainers": [{ "email": "your-email@example.com" }]
}The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.
Control your server's listing on Glama, including description and metadata
Access analytics and receive server usage reports
Get monitoring and health status updates for your server
Feature your server to boost visibility and reach more users
For users:
Full audit trail – every tool call is logged with inputs and outputs for compliance and debugging
Granular tool control – enable or disable individual tools per connector to limit what your AI agents can do
Centralized credential management – store and rotate API keys and OAuth tokens in one place
Change alerts – get notified when a connector changes its schema, adds or removes tools, or updates tool definitions, so nothing breaks silently
For server owners:
Proven adoption – public usage metrics on your listing show real-world traction and build trust with prospective users
Tool-level analytics – see which tools are being used most, helping you prioritize development and documentation
Direct user feedback – users can report issues and suggest improvements through the listing, giving you a channel you would not have otherwise
The connector status is unhealthy when Glama is unable to successfully connect to the server. This can happen for several reasons:
The server is experiencing an outage
The URL of the server is wrong
Credentials required to access the server are missing or invalid
If you are the owner of this MCP connector and would like to make modifications to the listing, including providing test credentials for accessing the server, please contact support@glama.ai.
Discussions
No comments yet. Be the first to start the discussion!