ThinkNEO Control Plane
Server Details
Enterprise AI governance: spend, guardrails, policy, budgets, compliance, and provider health.
- Status
- Healthy
- Last Tested
- Transport
- Streamable HTTP
- URL
- Repository
- thinkneo-ai/mcp-server
- GitHub Stars
- 1
- Server Listing
- thinkneo-control-plane
Glama MCP Gateway
Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.
Full call logging
Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.
Tool access control
Enable or disable individual tools per connector, so you decide what your agents can and cannot do.
Managed credentials
Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.
Usage analytics
See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.
Tool Definition Quality
Average 4.2/5 across 72 of 72 tools scored. Lowest: 3.2/5.
Most tools have clearly distinct purposes with detailed descriptions. However, there is some potential confusion among multiple 'check' and 'evaluate' tools, although the descriptions help differentiate them.
All tools use the 'thinkneo_' prefix and snake_case, but the verb-noun pattern is not entirely consistent (e.g., 'thinkneo_a2a_audit' vs 'thinkneo_scan_secrets'). Still, the naming is predictable and readable.
72 tools is very high for a single MCP server, even for a comprehensive control plane. This can overwhelm agents and reduce clarity. A more focused set would improve coherence.
The server covers a wide range of governance and observability operations, but there are missing lifecycle actions (e.g., no delete for policies or cache entries). Core workflows are present, but some gaps exist.
Available Tools
72 toolsthinkneo_a2a_auditARead-onlyIdempotentInspect
Full audit trail of agent-to-agent interactions. Shows every delegation, data request, escalation, and approval between agents with timestamps, costs, and outcomes. Supports filtering by agent, action, outcome, and time window. Essential for compliance, debugging, and understanding multi-agent behavior.
| Name | Required | Description | Default |
|---|---|---|---|
| hours | No | Lookback window in hours | |
| limit | No | Maximum number of records | |
| action | No | Filter by action type | |
| outcome | No | Filter by outcome: 'success', 'error', 'timeout', 'rejected' | |
| to_agent | No | Filter by target agent | |
| workspace | No | Workspace identifier | default |
| from_agent | No | Filter by source agent |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint and idempotentHint as true, so the agent knows this is a safe read operation. The description adds context on what is returned (every delegation, data request, etc.) and mentions filtering capabilities, which is useful beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise, using three sentences to convey purpose, content, and use cases. Front-loaded with key terms like 'audit trail' and 'agent-to-agent interactions'. No wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (7 parameters, all optional) and comprehensive schema descriptions, plus annotations indicating read-only/idempotent, the description adds sufficient context. It explains the tool's value (compliance, debugging) and filtering options. The output schema exists but is not needed to explain return values. A small gap: it doesn't specify that the output is a list of objects, but that is likely inferred from the schema.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100% and each parameter has a clear description. The tool description reinforces filtering by agent, action, outcome, and time window, but does not add significantly new semantics beyond the schema's parameter descriptions. Yet, it provides a high-level overview that aids parameter selection, justifying a higher score.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it provides a 'full audit trail' of agent-to-agent interactions, specifying the types of interactions (delegation, data request, escalation, approval) and the data included (timestamps, costs, outcomes). This clearly differentiates it from siblings like thinkneo_a2a_log or thinkneo_a2a_flow_map, which are likely more focused on raw logs or flow visualization.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implicitly clarifies usage for compliance, debugging, and understanding multi-agent behavior. However, it does not explicitly state when not to use it or compare to sibling tools like thinkneo_a2a_log or thinkneo_a2a_flow_map, which may be more appropriate for raw logs or flow diagrams. It also doesn't mention prerequisites like needing a workspace configuration.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_a2a_flow_mapARead-onlyIdempotentInspect
Map agent-to-agent communication flows. Shows which agents talk to each other, call frequency, cost, error rates, and latency. Identifies hot paths, bottlenecks, and anomalous patterns like agent loops or unexpected delegations.
| Name | Required | Description | Default |
|---|---|---|---|
| hours | No | Lookback window in hours | |
| workspace | No | Workspace identifier | default |
| agent_name | No | Filter flows involving this agent (as source or target) |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true and idempotentHint=true, so the tool is known to be safe. The description adds valuable behavioral context: it identifies hot paths, bottlenecks, and anomalous patterns like agent loops or unexpected delegations. This goes beyond the annotations, providing specific insights an agent can expect.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is two sentences, no wasted words. The first sentence clearly states the action and primary outputs; the second adds analytical value. Highly efficient.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (3 parameters, output schema present, annotations cover safety), the description is complete. It explains the tool's purpose, metrics, and analytical outputs without needing to cover return format (handled by output schema). No gaps identified.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents all three parameters (hours, workspace, agent_name). The description adds value by implying the output contains flow metrics and patterns, but does not elaborate on parameter syntax or defaults beyond schema. With full schema coverage, a baseline of 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool maps agent-to-agent communication flows, specifying exact metrics (call frequency, cost, error rates, latency) and analytical outputs (hot paths, bottlenecks, anomalous patterns). It clearly differentiates from sibling tools like thinkneo_a2a_audit or thinkneo_a2a_log by focusing on flow mapping rather than auditing or logging.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage for understanding agent communication patterns but does not explicitly state when to use this tool versus siblings or when not to use it. No alternatives or exclusions are mentioned, leaving the agent to infer context from the tool's purpose alone.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_a2a_logAInspect
Log an agent-to-agent interaction. Tracks which agent called which, what action was performed, the cost, outcome, and latency. Also enforces A2A policies if configured — returns policy_blocked if the interaction is not allowed.
| Name | Required | Description | Default |
|---|---|---|---|
| action | Yes | Action performed: 'delegate_task', 'request_data', 'escalate', 'approve', 'notify' | |
| outcome | No | Result: 'success', 'error', 'timeout', 'rejected', 'pending' | success |
| cost_usd | No | Cost of this interaction in USD | |
| metadata | No | JSON string with additional context | |
| to_agent | Yes | Name of the target agent, e.g. 'billing-agent', 'knowledge-retriever' | |
| workspace | No | Workspace identifier | default |
| from_agent | Yes | Name of the calling agent, e.g. 'orchestrator', 'support-bot' | |
| latency_ms | No | Latency in milliseconds | |
| error_message | No | Error message if outcome is 'error' | |
| task_description | No | What was delegated or requested |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description discloses the behavioral trait that policy enforcement may block the interaction, which goes beyond the readOnlyHint=false and idempotentHint=false annotations. It adds value by explaining potential side effects (policy blocking) without contradicting annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is only two sentences with essential information, no fluff, and front-loaded with the core purpose. Every sentence adds value.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool logs an interaction with policy enforcement, the description covers the main purpose and side effects. The output schema is not detailed here, but it exists, so the description doesn't need to explain return values. A small gap: it doesn't mention that the tool writes data (non-read-only), but that is inferred from annotations.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the baseline is 3. The description adds no further parameter meaning beyond the schema; it only summarizes the types of tracked data. No additional context is provided for the 10 parameters.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb 'log' and resource 'agent-to-agent interaction', and specifies the tracked fields (who called whom, action, cost, outcome, latency). It distinguishes itself from siblings like thinkneo_a2a_audit and thinkneo_a2a_flow_map by focusing on logging a single interaction, not auditing history or visualizing flows.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly mentions that it enforces A2A policies and can return policy_blocked if the interaction is not allowed, which gives clear context for when to use this tool. However, it does not explicitly state when not to use it or provide alternatives among siblings.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_a2a_set_policyAIdempotentInspect
Define a policy for agent-to-agent communication. Controls which agents can talk to each other, what actions are allowed, cost limits per call, and rate limits. Example: 'support-bot can delegate_task to billing-agent, max $0.10/call, max 100 calls/hour'.
| Name | Required | Description | Default |
|---|---|---|---|
| enabled | No | Enable or disable this policy | |
| to_agent | Yes | Target agent name (or '*' for any agent) | |
| workspace | No | Workspace identifier | default |
| from_agent | Yes | Source agent name (or '*' for any agent) | |
| allowed_actions | No | Comma-separated actions: 'delegate_task,request_data,escalate' or '*' for all | * |
| require_approval | No | Require human approval before execution | |
| max_calls_per_hour | No | Maximum calls per hour for this pair | |
| max_cost_per_call_usd | No | Maximum cost per interaction in USD |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate idempotentHint=true and readOnlyHint=false, so the description should add behavioral context. The description implies the policy is enforced but does not explain side effects (e.g., overwriting existing policies, need for propagation time) or authorization requirements. With annotations partially covering, the description adds example but lacks depth on behavioral traits.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is two sentences plus an example, front-loading the core action. Every sentence adds value; the example is particularly helpful. Minor room for improvement: could be slightly more structured with bullet points, but it's efficient for the complexity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
With 8 parameters, 100% schema coverage, an output schema (not shown but referenced), and annotations providing behavioral hints, the description provides a concise overview and example. It covers the main purpose and usage context, though could elaborate on return values (since output schema exists, it's less critical). Complete enough for an admin tool with good structured metadata.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, so the parameter descriptions are complete. The description adds an example showing how parameters combine, but does not add meaning beyond the schema. Baseline 3 is appropriate since the schema already carries the semantic load.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses a specific verb ('Define') and resource ('policy for agent-to-agent communication'), clearly stating it controls which agents can talk, allowed actions, cost limits, and rate limits. An example illustrates typical usage, and it distinguishes itself from sibling tools like thinkneo_check_policy or thinkneo_audit by being the policy creation tool.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explains what the tool controls (agent pairs, actions, cost, rate), and the sibling context shows alternative tools for reading or checking policies. However, it does not explicitly state when to use this versus thinkneo_check_policy or thinkneo_write_memory, or when not to use it.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_agent_roiARead-onlyIdempotentInspect
Calculate ROI per AI agent. Shows value generated vs AI cost consumed, with daily trend, success rate, and comparison to pre-AI baseline. Answers: 'Is this agent generating or consuming value?' and 'What's the ROI trend?'
| Name | Required | Description | Default |
|---|---|---|---|
| days | No | Number of days to analyze | |
| workspace | No | Workspace identifier | default |
| agent_name | No | Specific agent to analyze. If omitted, returns all agents. |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint and idempotentHint, so the safety profile is clear. The description adds value by explaining what the tool computes (value vs. cost, trends, baseline comparison) and the key questions it answers, providing behavioral context beyond the annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is two sentences with no wasted words. It begins with a clear action and resource, lists key outputs, and ends with concrete user-facing questions. Every sentence serves a purpose.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a read-only analytics tool with a known output schema and full parameter coverage, the description is sufficiently complete. It explains the tool's value, what it shows, and the questions it answers. No gaps are apparent given the context.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already explains all three parameters. The description does not add additional meaning beyond what the schema provides, meeting the baseline expectation of 3.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses specific verbs ('Calculate', 'Shows') and clearly identifies the resource ('ROI per AI agent'). It distinguishes the tool by mentioning key outputs (value vs. cost, daily trend, success rate, baseline comparison) and directly answers two typical user questions, making its purpose unambiguous.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies when to use this tool (when needing ROI analysis for AI agents) but offers no explicit guidance on when not to use it or how it differs from siblings. Without exclusion criteria or alternative recommendations, the agent must infer suitability from the tool's purpose alone.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_audit_exportARead-onlyIdempotentInspect
Export audit events in SIEM-compatible formats: JSON, CEF (ArcSight/QRadar), LEEF (IBM QRadar native), syslog (RFC 5424), or CSV. Supports optional HMAC SHA-256 signing for integrity verification. Filter by period, event type, and workspace. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Max events to export (1-10000) | |
| format | No | Output format: cef, csv, json, leef, syslog | json |
| period | No | Export period: 1d, 7d, 30d, 90d | 7d |
| hmac_key | No | HMAC key for signing (required if sign_hmac=true) | |
| sign_hmac | No | Sign export with HMAC SHA-256 for integrity | |
| workspace | No | Filter by workspace | |
| event_types | No | Comma-separated event types to filter (e.g., 'tool_call,task_sent') |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations provide readOnlyHint and idempotentHint. The description adds beyond that by specifying authentication requirement, HMAC signing, and format options, which are useful behavioral details.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences, front-loaded with key purpose and formats, no waste. Every sentence adds value.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity (7 parameters, signing, output schema exists), the description covers core aspects well. Minor omission: no mention of synchronous/asynchronous behavior, but overall sufficient.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so baseline is 3. The description summarizes parameters (period, event type, workspace) but adds no new information beyond what the schema already provides.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states 'Export audit events in SIEM-compatible formats' which is a specific verb and resource. However, it does not distinguish itself from sibling tools like thinkneo_audit_export_status, preventing a score of 5.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this tool vs alternatives such as thinkneo_audit_export_status. The description only states what it does, not when to choose it over others.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_audit_export_statusARead-onlyIdempotentInspect
View configured audit export integrations and their last export results. Shows integration type, endpoint, enabled status, and recent export history. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnlyHint and idempotentHint. The description adds behavioral context by stating it shows integration type, endpoint, enabled status, and recent export history, plus explicitly requiring authentication. No contradiction with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is three short sentences with no fluff. The first sentence states the core purpose, the second lists what is shown, and the third adds necessary authentication context. Information is front-loaded and every sentence adds value.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no parameters, existing output schema, and annotations providing behavioral hints, the description covers all necessary aspects: purpose, content detail, and authentication requirement. It is complete for an agent to correctly select and invoke this tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
There are zero parameters, and schema description coverage is 100%. According to guidelines, when no parameters exist, the baseline score is 4. The description does not need to add parameter information.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it is a view tool for configured audit export integrations and their last export results, listing specific details like integration type, endpoint, and enabled status. It distinguishes from related sibling tools like thinkneo_set_audit_export and thinkneo_audit_export by emphasizing read-only status checking.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly says 'View configured audit export integrations' and notes authentication is required, which provides clear context for when to use it. However, it does not explicitly mention when not to use it or compare to alternatives, but the read-only nature and sibling tool names offer sufficient implicit guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_benchmark_compareARead-onlyIdempotentInspect
Compare providers side-by-side for a specific task type. Shows quality scores, verification rates, and rankings based on real outcomes. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| providers | No | Optional list of providers to compare (e.g., ['anthropic', 'openai']). Leave empty for all. | |
| task_type | Yes | Task type to compare: 'summarization', 'code_generation', 'classification', 'translation', 'analysis', 'chat' |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true and idempotentHint=true, indicating safe, read-only behavior. The description adds the auth requirement and what results are shown (quality scores, verification rates, rankings). No contradictions. The description provides useful behavioral context beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences with no redundant information. The description is front-loaded with the purpose, then expands on outputs and requirement. Every sentence adds value.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 2 parameters with full schema coverage and an existing output schema, the description adequately covers the tool's purpose and behavior. It explains authentication and output contents, leaving no critical gaps for the AI agent to understand.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, both parameters are described. The description mentions 'providers side-by-side' and 'specific task type' which aligns with the schema. It does not add significant new semantic meaning beyond what the schema provides, hence baseline score of 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states the specific verb 'Compare', the resource 'providers', and the context 'side-by-side for a specific task type'. It also lists the outputs: 'quality scores, verification rates, and rankings based on real outcomes'. This clearly differentiates from siblings like thinkneo_benchmark_report, which likely focuses on single provider reports.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description notes 'Requires authentication' but provides no explicit guidance on when to use this tool versus alternatives like thinkneo_benchmark_report. The usage context is implied but not clearly stated with when-not or alternative recommendations.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_benchmark_reportARead-onlyIdempotentInspect
View the outcome benchmark matrix — real quality scores per provider/model/task_type based on verified outcomes, not static estimates. Shows verification rates, sample counts, and rankings. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| task_type | No | Filter by task type: 'summarization', 'code_generation', 'classification', 'translation', etc. Leave empty for all. |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true and idempotentHint=true. The description adds the requirement for authentication, which is a behavioral trait not covered by annotations. It also emphasizes 'verified outcomes not static estimates' adding credibility context.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences: first sentence states the core purpose and scope, second adds key differentiators and authentication. Every sentence adds value; no fluff.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
The description covers what the tool shows, its data source, and authentication. With an output schema present, return format is covered. However, it does not differentiate from the closely related sibling thinkneo_benchmark_compare, leaving a minor completeness gap.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%; the input schema already describes the optional 'task_type' parameter with a list of examples. The tool description does not add further parameter-specific meaning, so it meets the baseline without extra value.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states the tool views an outcome benchmark matrix with quality scores, verification rates, sample counts, and rankings. It distinguishes from sibling 'thinkneo_benchmark_compare' only indirectly by focusing on 'report' vs 'compare', but does not explicitly contrast them.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description notes 'Requires authentication' but provides no guidance on when to use this tool versus alternatives like thinkneo_benchmark_compare. No when-not or context for preferred use.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_bridge_a2a_to_mcpAInspect
Bridge an A2A (Agent-to-Agent Protocol) task to an MCP server. Receives an A2A task, identifies the best matching MCP tool on the target server, executes it, and returns the result wrapped in A2A response format. Enables A2A agents to use any MCP server transparently. Extracts the intent from the A2A task, maps it to an MCP tool, calls the tool, and wraps the result in A2A response format. Use this to let A2A agents interact with any MCP server. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| a2a_task | Yes | JSON string of the A2A task object. Must include at minimum a 'message' field with 'parts'. Example: '{"id": "task-1", "message": {"role": "user", "parts": [{"type": "text", "text": "Check provider status"}]}}' | |
| target_mcp_server_url | No | URL of the target MCP server endpoint (e.g. 'https://mcp.thinkneo.ai/mcp'). Defaults to ThinkNEO's own MCP server. | https://mcp.thinkneo.ai/mcp |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations provide readOnlyHint=false and idempotentHint=false, but the description adds valuable context: it discloses the authentication requirement (not in annotations) and describes the multi-step behavioral process (extract, map, call, wrap). It does not contradict annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences with zero waste: first states purpose and process, second gives usage context, third adds critical prerequisite. Each sentence earns its place, and key information is front-loaded.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (bridging protocols), the description covers purpose, process, usage context, and authentication. With annotations covering safety hints, schema covering parameters fully, and an output schema existing (so return values need not be explained), this is complete enough for an agent to use correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema fully documents both parameters. The description does not add meaning beyond the schema (e.g., no extra syntax or format details for 'a2a_task'), meeting the baseline of 3 when schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific verb ('bridge'), resource ('A2A task to MCP server'), and mechanism ('extracts intent, maps to MCP tool, calls tool, wraps result'). It distinguishes from siblings like 'thinkneo_bridge_mcp_to_a2a' (reverse direction) and 'thinkneo_bridge_list_mappings' (list only).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly states when to use ('to let A2A agents interact with any MCP server') and includes a key exclusion ('Requires authentication'), which is crucial guidance not obvious from the schema or annotations.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_bridge_generate_agent_cardARead-onlyIdempotentInspect
Auto-generate an A2A Agent Card from an MCP server's tool list. Each MCP tool is converted into an A2A skill. The resulting agent.json makes the MCP server discoverable by any A2A-compatible agent in Google's agent ecosystem. Defaults to ThinkNEO's own MCP server. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| mcp_server_url | No | URL of the MCP server to generate an agent card for. Defaults to ThinkNEO's own server (https://mcp.thinkneo.ai/mcp). The server must support the tools/list method. |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate read-only and idempotent operations, which the description does not contradict. The description adds valuable behavioral context beyond annotations: it specifies authentication requirements, mentions the server must support tools/list method, and explains the outcome (agent.json for discoverability).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is appropriately sized and front-loaded, with three sentences that efficiently convey purpose, outcome, and key requirements. Every sentence adds value without redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (conversion between systems), rich annotations (read-only, idempotent), and existence of an output schema, the description is complete. It covers purpose, usage context, authentication needs, and server requirements, leaving return values to the output schema.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema fully documents the single parameter. The description adds minimal semantics by mentioning defaults to ThinkNEO's server, but does not provide additional meaning beyond what's in the schema. Baseline 3 is appropriate given high schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Auto-generate an A2A Agent Card') and resource ('from an MCP server's tool list'), explaining that it converts MCP tools into A2A skills. It distinguishes from siblings by focusing on agent card generation rather than mapping, listing, or other operations.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for when to use this tool: to make an MCP server discoverable in Google's A2A ecosystem. It mentions defaults to ThinkNEO's server and authentication requirements, but does not explicitly state when not to use it or name alternatives among siblings.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_bridge_list_mappingsARead-onlyIdempotentInspect
List all active MCP ↔ A2A bridge mappings and translation statistics. Shows which MCP servers are mapped to which A2A agents, plus 30-day translation stats (total, success rate, average latency). Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnlyHint=true and idempotentHint=true, covering safety and idempotency. The description adds valuable context beyond annotations by specifying authentication requirements ('Requires authentication') and detailing what data is returned ('30-day translation stats (total, success rate, average latency)'), which helps the agent understand operational constraints and output expectations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is efficiently structured in two sentences: the first states the core purpose and output details, and the second adds critical behavioral context (authentication). Every sentence adds value without waste, and it is front-loaded with the main action and resource.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (listing mappings with statistics), rich annotations (readOnlyHint, idempotentHint), 0 parameters, and the presence of an output schema, the description is complete. It covers purpose, output details, and authentication needs, leaving schema-specific details to the structured fields, which is appropriate for this context.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 0 parameters and 100% schema description coverage, the baseline is 4 as there are no parameters to document. The description does not need to compensate for any parameter gaps, and it appropriately focuses on the tool's purpose and output without redundant parameter explanations.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('List all active MCP ↔ A2A bridge mappings and translation statistics') and resource ('MCP ↔ A2A bridge mappings'), distinguishing it from siblings like thinkneo_bridge_a2a_to_mcp or thinkneo_bridge_mcp_to_a2a which handle translation operations rather than listing mappings. It provides precise scope by mentioning '30-day translation stats' with specific metrics.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implicitly suggests usage when needing to view active mappings and statistics, but does not explicitly state when to use this tool versus alternatives like thinkneo_get_observability_dashboard or thinkneo_list_alerts. It provides clear context ('Shows which MCP servers are mapped to which A2A agents') but lacks explicit exclusions or named alternatives.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_bridge_mcp_to_a2aAInspect
Bridge an MCP tool call to an A2A (Agent-to-Agent Protocol) agent. Maps MCP tool name and parameters to the A2A task format, enabling interoperability between MCP servers and A2A agents. Returns a ready-to-send A2A task object with full protocol compliance. Translates the MCP tool_name and arguments into an A2A task, sends it to the target A2A agent, waits for completion, and translates the response back to MCP format. Use this to make any MCP tool accessible to A2A agents (Google's agent ecosystem). Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| arguments | No | JSON string of arguments to pass to the MCP tool (e.g. '{"workspace": "prod", "period": "this-month"}') | {} |
| mcp_tool_name | Yes | Name of the MCP tool to bridge (e.g. 'thinkneo_check_spend') | |
| target_a2a_agent_url | No | URL of the target A2A agent endpoint (e.g. 'https://agent.thinkneo.ai/a2a'). Defaults to ThinkNEO's own A2A agent. | https://agent.thinkneo.ai/a2a |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds valuable behavioral context beyond annotations: it discloses the workflow (translates, sends, waits for completion, translates back), mentions authentication requirements, and specifies the target ecosystem (Google's agent ecosystem). Annotations only indicate it's not read-only and not idempotent, so the description provides meaningful operational details that help an agent understand how to use it effectively.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is efficiently structured in three sentences: first states the core purpose and process, second provides usage context, third adds authentication requirement. Every sentence adds essential information with zero wasted words, making it easy to parse and understand quickly.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (bridging between protocols), the description provides complete context: purpose, process, usage scenario, authentication requirement, and target ecosystem. With annotations covering safety aspects and an output schema presumably handling return values, the description focuses appropriately on the bridging functionality without needing to explain technical implementation details.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the baseline is 3. The description doesn't add significant parameter-specific semantics beyond what's already documented in the schema (e.g., it doesn't explain format details for arguments beyond the JSON string example in the schema). It mentions 'MCP tool_name and arguments' generally but doesn't enhance understanding of individual parameters.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Bridge an MCP tool call to an A2A agent'), the resource involved (MCP tool to A2A agent), and distinguishes it from sibling tools by focusing on MCP→A2A translation. It explicitly names the transformation process (translates MCP tool_name and arguments into A2A task) and the outcome (makes MCP tools accessible to A2A agents).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit usage guidance: 'Use this to make any MCP tool accessible to A2A agents (Google's agent ecosystem)' clearly states when to use it. It also specifies a prerequisite ('Requires authentication') and distinguishes from the sibling 'thinkneo_bridge_a2a_to_mcp' by focusing on the opposite direction (MCP→A2A vs A2A→MCP).
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_business_impactBRead-onlyIdempotentInspect
Executive business impact dashboard. Returns a single view of: total value generated by AI agents, total AI cost, net ROI, risk avoided in dollars, cost per decision, top performing agents, and risk event summary. This is the report a CxO needs to justify AI investment.
| Name | Required | Description | Default |
|---|---|---|---|
| period | No | Time period: 'this-week', 'this-month', 'this-quarter', 'all' | this-month |
| workspace | No | Workspace identifier | default |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnlyHint=true and idempotentHint=true, so the agent knows it's safe and repeatable. The description adds 'Executive business impact dashboard' context but doesn't disclose response format, pagination, or data freshness.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Description is two sentences with key outputs listed concisely. Could front-load 'CxO report' for faster scanning. No fluff.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Tool is a simple read-only query with 2 optional params and output schema present. Given low complexity and rich annotations, the description is nearly complete. Could mention output schema or data freshness.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schemas cover both parameters fully (100% coverage). The description does not add parameter-level semantics beyond what the schema provides. Baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states it returns a business impact dashboard with specific metrics (total value, cost, ROI), and positions it as a CxO report. However, the purpose could be sharper by distinguishing from sibling tools like thinkneo_agent_roi or thinkneo_get_savings_report.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this tool versus alternatives. Given many sibling reporting tools (thinkneo_agent_roi, thinkneo_decision_cost, thinkneo_get_savings_report), explicit differentiation is needed.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_cache_lookupARead-onlyIdempotentInspect
Look up a cached LLM response by exact prompt match. Returns the cached response if found and not expired. Use before calling an expensive LLM to save cost and latency. Namespaced to prevent collisions across workspaces. Free tier: shared cache. Enterprise: private + semantic similarity matching.
| Name | Required | Description | Default |
|---|---|---|---|
| model | No | Model name (e.g. 'gpt-5') | |
| prompt | Yes | The prompt text (exact match lookup) | |
| namespace | No | Cache namespace (e.g. workspace or tenant name) | default |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate safe read operation. Description adds behavioral details: returns cached response if found and not expired, namespacing prevents collisions, and tier differences. No contradiction.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Four sentences with no filler. Each sentence adds value: purpose, usage guidance, namespacing, tier differences. Front-loaded with key action.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given output schema exists, return format details are unnecessary. Covers purpose, usage, and behavior. Could mention expiration policy more explicitly, but sufficient overall.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, baseline 3. Description adds meaning: prompt is exact match, model filters, namespace prevents collisions, enhancing understanding beyond schema definitions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool looks up a cached LLM response by exact prompt match, with a specific verb and resource. It distinguishes from siblings like thinkneo_cache_store (store operation) and thinkneo_cache_stats.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly recommends using before expensive LLM calls to save cost/latency. Mentions namespacing and tier differences. However, does not explicitly specify when not to use or mention alternatives like thinkneo_cache_store for storing.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_cache_statsARead-onlyIdempotentInspect
Get cache performance stats for a namespace. Shows hit rate, entries, estimated savings. Use to optimize caching strategy.
| Name | Required | Description | Default |
|---|---|---|---|
| namespace | No | Cache namespace | default |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, making it clear this is a safe read operation. The description complements this by listing the metrics returned (hit rate, entries, estimated savings), adding behavioral context without contradicting annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences with no redundant words. The first sentence front-loads the purpose and output, and the second gives usage guidance. Every word adds value.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
With an existing output schema (indicated by context signals), the description need not explain return values in detail. It mentions key metrics (hit rate, entries, estimated savings), which is sufficient for an agent to understand the tool's output. The single parameter with a default is fully covered.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% for the single 'namespace' parameter, which has a default and description. The description mentions namespace but doesn't add new semantics beyond linking it to the tool's purpose. The baseline of 3 is appropriate as the schema already explains the parameter.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states 'Get cache performance stats for a namespace' with specific outputs like hit rate, entries, and estimated savings. It distinguishes itself from sibling tools like 'thinkneo_cache_lookup' and 'thinkneo_cache_store' by focusing on performance stats rather than data retrieval or storage.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly says 'Use to optimize caching strategy,' providing a clear use case. While it doesn't mention when not to use or alternatives, the context of many sibling tools and the specific focus on stats makes the intended usage clear.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_cache_storeAIdempotentInspect
Store an LLM response in the cache for future lookups. Set TTL in seconds (default 24h). Upsert: replaces existing entries. Use this AFTER calling an LLM provider to cache the response for future calls.
| Name | Required | Description | Default |
|---|---|---|---|
| model | No | Model name | |
| prompt | Yes | The prompt text | |
| response | Yes | The LLM response to cache | |
| namespace | No | Cache namespace | default |
| ttl_seconds | No | Time-to-live in seconds (default 86400 = 24h) |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Description adds 'Upsert: replaces existing entries' which explains the idempotent and destructive behavior hinted by annotations (idempotentHint=true, destructiveHint=true). Also states default TTL. No contradiction with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences, each serving a clear purpose: purpose, TTL/upsert, and usage timing. No wasted words, front-loaded with essential information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Adequately covers the tool's function for a cache store with 5 parameters and an output schema. Mentions upsert and TTL. Does not detail error handling or specific keying logic, but is sufficient given the tool's simplicity.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, so baseline is 3. Description adds value by explaining that ttl_seconds is in seconds with a default of 24h and that the tool performs an upsert, giving context to the parameters.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clearly states the tool stores an LLM response in cache for future lookups, using specific verbs like 'store', 'cache', and 'upsert'. Distinguishes from sibling thinkneo_cache_lookup which likely retrieves cached data.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly advises to use this tool 'AFTER calling an LLM provider to cache the response for future calls', providing clear timing context. Mentions upsert behavior but does not explicitly describe when not to use it or alternative caching tools.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_checkARead-onlyIdempotentInspect
Free-tier prompt safety check. Analyzes text for prompt injection patterns and PII (credit card numbers, Brazilian CPF, US SSN, email, phone, passwords). Returns a safety assessment with specific warnings. No authentication required.
| Name | Required | Description | Default |
|---|---|---|---|
| text | Yes | The text or prompt to check for safety issues (max 50,000 characters) |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds valuable behavioral context beyond annotations: it specifies the free-tier nature, lists the types of patterns checked (prompt injection, PII like credit card numbers, CPF, SSN, email, phone, passwords), and notes no authentication required. Annotations cover read-only and idempotent hints, but the description enhances understanding of the tool's scope and limitations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is front-loaded and concise, with three sentences that efficiently convey purpose, scope, and key behavioral traits without wasted words, making it easy for an AI agent to parse quickly.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's moderate complexity, rich annotations (readOnlyHint, idempotentHint), and the presence of an output schema, the description is complete enough. It covers the tool's function, safety focus, and authentication details, leaving output specifics to the schema.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the input schema fully documents the 'text' parameter. The description adds no additional parameter details beyond what the schema provides, such as examples or format specifics, meeting the baseline for high schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose with specific verbs ('analyzes', 'returns') and resources ('text for prompt injection patterns and PII'), explicitly distinguishing it from siblings by focusing on free-tier safety checks rather than policy, spending, or compliance tools.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
It provides clear context for when to use this tool ('Free-tier prompt safety check') and mentions 'No authentication required', but does not explicitly state when not to use it or name alternatives among siblings like thinkneo_check_policy or thinkneo_evaluate_guardrail.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_check_pii_internationalARead-onlyIdempotentInspect
Detect international PII across 30+ document types from 15+ countries: Brazil (CPF, CNPJ, RG, PIS), USA (SSN, EIN, ITIN, Passport), UK (NINO, UTR), Canada (SIN), EU (IBAN, VAT), Germany (Tax-ID), France (INSEE), Spain (DNI/NIE), Italy (Codice Fiscale), Argentina (CUIT), Mexico (CURP/RFC), Australia (TFN/ABN), India (Aadhaar/PAN), China (ID), Japan (My Number), and credit cards (Luhn validated). Required for LGPD/GDPR/HIPAA compliance. No authentication required.
| Name | Required | Description | Default |
|---|---|---|---|
| text | Yes | Text to scan for PII (max 100,000 chars) | |
| countries | No | Filter by country codes (BR, US, UK, CA, EU, DE, FR, ES, IT, AR, MX, AU, IN, CN, JP, INTL). Empty = all. |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate read-only, idempotent, non-destructive behavior. The description adds context by stating 'No authentication required' and mentioning Luhn validation for credit cards, going beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Description is informative but somewhat lengthy due to listing many countries and document types. It is front-loaded with the main purpose, but could be more concise.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given complexity (30+ document types, 15+ countries) and existing annotations/output schema, the description covers input parameters and compliance context sufficiently, though return format is not detailed (but output schema exists).
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, but the description adds value by listing the valid country codes (e.g., BR, US, UK) that are not constrained by enums in the schema, helping the agent understand parameter options.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states the tool detects international PII across many document types and countries, distinguishing it from sibling tools like thinkneo_check by specifying international scope and listing specific countries.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Description mentions compliance requirements (LGPD/GDPR/HIPAA) implying a use case, but does not explicitly compare to alternatives or state when not to use the tool.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_check_policyARead-onlyIdempotentInspect
Check if a specific model, provider, or action is allowed by the governance policies configured for a workspace. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| model | No | AI model name to check (e.g., gpt-4o, claude-sonnet-4-6, gemini-2.0-flash) | |
| action | No | Specific action to check (e.g., create-completion, use-tool, fine-tune) | |
| provider | No | AI provider to check (e.g., openai, anthropic, google, mistral) | |
| workspace | Yes | Workspace name or ID whose governance policies to check against |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Adds authentication requirement beyond annotations. Annotations already establish readOnly/idempotent traits, so description carries lower burden. Does not elaborate on response semantics (e.g., boolean vs policy object), but output schema exists to cover this.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences with zero waste. Primary purpose front-loaded in first sentence; constraint (authentication) follows. No redundant or filler content.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Adequate for a read-only policy check with complete schema coverage and existing output schema. Captures purpose and auth requirements. Minor gap: does not clarify behavior when optional parameters are null (wildcard check vs error).
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, establishing baseline 3. Description references the parameter concepts ('specific model, provider, or action') but does not add semantic clarifications beyond the schema descriptions (e.g., no syntax guidance, no enum values, no inter-parameter relationships).
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clear verb ('Check') + resource ('governance policies') + scope ('workspace'). Specifies what is being validated (model/provider/action allowance). Distinguishes implicitly from siblings like check_spend (budget) and evaluate_guardrail (runtime content filtering) by specifying 'governance policies', though lacks explicit sibling contrast.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
States prerequisite ('Requires authentication') and implies usage context ('Check if... is allowed' suggests pre-invocation validation). However, lacks explicit when-to-use guidance versus alternatives like evaluate_guardrail or get_compliance_status, and does not specify when all optional parameters should be null vs populated.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_check_spendARead-onlyIdempotentInspect
Check AI spend summary for a workspace, team, or project. Returns cost breakdown by provider, model, and time period. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| period | No | Time period for the report: today, this-week, this-month, last-month, or custom | this-month |
| end_date | No | End date for a custom period in ISO format (YYYY-MM-DD). Only used when period='custom' | |
| group_by | No | Dimension to group costs by: provider, model, team, or project | provider |
| workspace | Yes | Workspace name or ID (e.g., 'prod-engineering', 'finance-team') | |
| start_date | No | Start date for a custom period in ISO format (YYYY-MM-DD). Only used when period='custom' |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations declare read-only/idempotent safety, allowing the description to focus on output structure ('cost breakdown by provider, model, and time period') and critical runtime requirements ('Requires authentication'). These additions provide valuable context beyond the structured annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences with zero waste: action/scope first, return value second, auth requirement third. Every sentence earns its place and the description is appropriately front-loaded.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the presence of an output schema and comprehensive input annotations, the description provides sufficient high-level context about the return structure and authentication needs without duplicating schema details.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the structured fields already document all parameters. The description mentions 'workspace, team, or project' which loosely maps to the workspace parameter and group_by options, but adds no syntax, format details, or semantic constraints beyond what the schema already provides.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states a clear verb ('Check') and specific resource ('AI spend summary'), along with scope ('workspace, team, or project'). It implicitly distinguishes from siblings like thinkneo_get_budget_status (spend vs. budget) and thinkneo_check_policy (cost vs. compliance), though 'Check' is slightly less precise than 'Retrieve'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance provided on when to use this tool versus alternatives like thinkneo_get_budget_status or how it relates to cost management workflows. The description only states what the tool does, not when to invoke it.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_compare_modelsARead-onlyIdempotentInspect
Compare 25+ LLM models across 8 major providers (OpenAI, Anthropic, Google, Meta, Mistral, xAI, DeepSeek, Cohere, NVIDIA) by price, context window, capabilities, and modalities. Optionally estimate cost for a specific workload (input/output token counts). Filter by use case (coding, reasoning, vision, long context, cheap, agentic, multilingual, EU compliant, real-time, open source). No authentication required.
| Name | Required | Description | Default |
|---|---|---|---|
| use_case | No | Filter by use case: coding, reasoning, writing, vision, long context, cheap, agentic, multilingual, EU compliant, real-time, open source | |
| providers | No | Filter by providers (list of provider names) | |
| modalities | No | Required modalities (e.g. ['text','vision']) | |
| min_context | No | Minimum context window in tokens | |
| estimate_input_tokens | No | For cost estimation: number of input tokens | |
| max_input_price_per_m | No | Maximum input price in USD per 1M tokens | |
| estimate_output_tokens | No | For cost estimation: number of output tokens |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, indicating safe, non-destructive behavior. The description adds that no authentication is required, which is useful context. It also mentions optional cost estimation, providing additional transparency beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is three sentences long, front-loading the main action in the first sentence. Each sentence serves a purpose: stating the tool's core functionality, adding the cost estimation option, and mentioning filtering and authentication. No words are wasted, making it very concise.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (comparing many models with filters and cost estimation), the description is comprehensive. It covers the main features, optional capabilities, and authentication requirement. With 100% schema coverage and an output schema, the description does not need to detail return values. It provides enough information for an agent to decide when to use this tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% coverage with descriptions for each of the 7 optional parameters. The description adds context by explaining the use case filter and cost estimation feature, linking parameters like estimate_input_tokens and estimate_output_tokens to real-world usage. This adds meaning beyond the raw schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: comparing 25+ LLM models across 8 providers by price, context, capabilities, and modalities. It uses specific verbs like 'Compare', 'Optionally estimate cost', and 'Filter by use case'. This distinguishes it from sibling tools, which are focused on auditing, logging, policies, etc.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explains what the tool does and its optional features, but does not explicitly state when to use this tool versus alternatives. There is no mention of exclusions or comparisons to sibling tools like thinkneo_benchmark_compare, which might be similar. However, the context is clear enough for basic guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_compliance_generateAInspect
Generate a compliance report for a regulatory framework. Aggregates data from all ThinkNEO layers (Trust Score, guardrails, PII, observability, policies, outcome validation) into a framework-specific report with compliance scoring, findings, and gap analysis. Supported frameworks: 'lgpd' (LGPD Brazil), 'iso_42001' (ISO 42001 AI Management), 'eu_ai_act' (EU AI Act). Returns SHA-256 signed, tamper-evident report. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| days | No | Number of days to include in the report (default 30, max 365) | |
| framework | Yes | Regulatory framework: 'lgpd', 'iso_42001', or 'eu_ai_act' |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations provide minimal info (non-readonly, non-idempotent). Description adds value by stating aggregation from all layers, SHA-256 signing, and authentication requirements, which go beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three efficient sentences, front-loaded with purpose, no filler.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers purpose, behavior, supported frameworks, output signing, and auth. Lacks details on side effects or overwrite behavior, but sufficient given output schema existence.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with descriptions for both parameters. Description does not add meaning beyond schema; baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb 'Generate' and resource 'compliance report', explicitly listing supported frameworks and distinguishing from siblings like thinkneo_compliance_list or thinkneo_get_compliance_status.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Implies usage when a new compliance report is needed for a supported framework, but lacks explicit when-not-to-use or alternative suggestions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_compliance_listARead-onlyIdempotentInspect
List previously generated compliance reports. Shows framework, period, compliance score, and download URL for each report. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Max reports to return (default 20) |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations declare readOnlyHint and idempotentHint, which are consistent with the description. The description adds value by mentioning that each report includes framework, period, compliance score, and download URL. It does not contradict annotations and provides useful behavioral context beyond the schema.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is two sentences, front-loading the purpose and adding essential details (fields, authentication). Every sentence is informative without redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the minimal input schema, presence of annotations, and existence of an output schema, the description is reasonably complete. It explains the purpose, key fields returned, and authentication. It does not mention ordering, pagination, or error handling, but that is acceptable for a simple list tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% coverage with a single 'limit' parameter that already includes a description. The tool description does not provide additional meaning beyond the schema, so a baseline score of 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool lists previously generated compliance reports, specifying the verb 'list' and the resource 'compliance reports'. It differentiates from sibling tools like thinkneo_compliance_generate by implying it only retrieves existing reports.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implicitly suggests using this tool to retrieve existing reports rather than generating new ones, but it lacks explicit guidance on when to use this tool versus alternatives such as thinkneo_compliance_generate or thinkneo_get_compliance_status. It does include an authentication requirement but no exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_decision_costARead-onlyIdempotentInspect
Analyze cost-per-decision for AI agents. Shows the actual AI cost for each decision, compared to the pre-AI baseline. Answers: 'How much does each AI decision cost?' and 'How does it compare to doing it without AI?'
| Name | Required | Description | Default |
|---|---|---|---|
| period | No | Time period: 'today', 'this-week', 'this-month', 'all' | this-month |
| workspace | No | Workspace identifier | default |
| agent_name | No | Filter by specific agent | |
| process_name | No | Filter by specific process |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true and idempotentHint=true, so the description is not required to restate safety. However, it adds value by explaining the output context (actual cost vs baseline) and the posed questions, which is beyond what annotations capture. No contradictions with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences, zero wasted words. Front-loaded with verb+noun, followed by a concrete business question that the tool answers. Efficient and direct.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity (3 optional params, output schema exists), the description is nearly complete. It clarifies the purpose and comparison, though it could benefit from a note about the return format or what 'actual AI cost' includes (e.g., token usage). However, since an output schema exists, the return format is technically covered there.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage, so the schema itself documents all parameters well. The description does not repeat parameter details but adds value by clarifying the overall purpose ('cost-per-decision' and baseline comparison), which contextualizes the parameter values (e.g., period, agent_name).
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses specific verbs ('Analyze', 'Shows', 'Answers') to clearly identify the resource ('cost-per-decision for AI agents') and the comparison ('pre-AI baseline'). It distinguishes itself from siblings like thinkneo_business_impact or thinkneo_get_savings_report by focusing specifically on decision-level cost analysis.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies that this tool is for comparing AI decision costs to baseline, but does not explicitly state when to use it vs. alternatives (e.g., thinkneo_agent_roi or thinkneo_get_savings_report). No exclusions or when-not-to-use guidance is provided.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_detect_injectionARead-onlyIdempotentInspect
Advanced prompt injection detector. Scans text for 50+ known jailbreak techniques: DAN/STAN/DUDE, role-play bypass, system prompt leaks, delimiter injection, safety bypass, indirect injection via documents, base64 smuggling, unicode obfuscation, and chain-of-thought manipulation. Use this BEFORE passing untrusted text to an LLM. No authentication required.
| Name | Required | Description | Default |
|---|---|---|---|
| text | Yes | Text to scan for prompt injection (max 50,000 chars) | |
| strict | No | Strict mode: flag hypothetical/fictional framings as high risk |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds behavioral details beyond annotations, such as scanning for 50+ techniques and no authentication requirement, while the annotations already indicate read-only, idempotent, non-destructive behavior. There is no contradiction.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is three sentences, front-loaded with the core purpose, followed by supporting details and usage guidance, with zero extraneous content.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool has an output schema (not shown), the description adequately covers purpose, usage, and behavioral notes. It could briefly mention the output type but is sufficiently complete for agent selection.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100% (both 'text' and 'strict' are described in the schema). The description does not add parameter-specific meaning beyond the schema, so the baseline score of 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool is an 'advanced prompt injection detector' and lists specific techniques it detects (DAN/STAN/DUDE, role-play bypass, etc.), making the purpose highly specific and distinguishable from sibling tools.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly advises to use this tool 'BEFORE passing untrusted text to an LLM' and notes that no authentication is required, providing clear context for when to use it. However, it does not explicitly mention when not to use it or list alternatives.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_detect_wasteARead-onlyIdempotentInspect
Detect waste and inefficiency in AI operations. Analyzes agent performance, A2A communication overhead, error costs, unused capacity, and cost outliers. Returns specific actionable findings like 'you are losing $3,200/month on error retries' or 'this flow is 5x more expensive than your best-performing flow'. This is the diagnostic tool that creates the buying trigger.
| Name | Required | Description | Default |
|---|---|---|---|
| days | No | Analysis window in days | |
| workspace | No | Workspace identifier | default |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint and idempotentHint as true, so the agent knows it's a safe, read-only operation. The description adds context about returning actionable findings with specific dollar amounts, which is useful. However, it doesn't detail any side effects or authentication requirements beyond the parameter descriptions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Description is concise at 3 sentences. The key information is front-loaded. However, the second sentence lists multiple analysis dimensions which could be slightly trimmed, but overall it's well-structured and not verbose.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's moderate complexity (2 simple params, no required fields, output schema exists), the description adequately covers its purpose and output format. The mention of specific deliverables ('$3,200/month') adds completeness. Could be improved by linking to sibling tools that handle different aspects (e.g., thinkneo_business_impact), but not strictly necessary.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100% for both parameters (days and workspace), so the schema already defines them well. The description adds no extra parameter-level detail beyond what's in the schema. Baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states what the tool does: detect waste and inefficiency in AI operations. It specifies the domains analyzed (agent performance, A2A communication overhead, error costs, unused capacity, cost outliers) and distinguishes itself from sibling tools like thinkneo_a2a_audit or thinkneo_business_impact by framing it as 'the diagnostic tool that creates the buying trigger.'
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Description implies when to use it: when investigating cost and efficiency issues. It doesn't explicitly contrast with siblings, but the focus on waste and detecting dollar amounts sets it apart. No explicit when-not-to-use guidance is given, but the context is clear enough.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_end_traceAInspect
End an active agent trace and get the session summary. Returns total cost, duration, tool/model call counts, and event count. Triggers post-session anomaly detection (cost spikes, error rate). Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| status | No | Final session status: 'success', 'failure', or 'timeout' | success |
| session_id | Yes | Session ID from thinkneo_start_trace |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds valuable behavioral context beyond annotations: it discloses that authentication is required, triggers post-session anomaly detection (e.g., cost spikes, error rate), and describes return values (cost, duration, counts). Annotations indicate it's not read-only and not idempotent, which the description doesn't contradict, but it enriches understanding with operational details.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is front-loaded with the core purpose in the first sentence, followed by additional details in a structured manner. Every sentence adds value (e.g., return details, anomaly detection, auth requirement) with zero waste, making it efficient and well-organized.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (ending a trace with side effects), rich annotations (readOnlyHint: false, idempotentHint: false), and the presence of an output schema, the description is complete. It covers purpose, behavior, authentication, and return aspects without needing to explain output values, making it sufficient for an agent to use the tool effectively.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the input schema fully documents both parameters ('session_id' and 'status'), including defaults and enums. The description does not add any parameter-specific semantics beyond what the schema provides, so it meets the baseline of 3 without compensating for gaps.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('End an active agent trace') and resource ('session summary'), and distinguishes it from sibling tools like 'thinkneo_start_trace' and 'thinkneo_get_trace' by focusing on termination and summary retrieval. It explicitly mentions what the tool does beyond just the name.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for when to use this tool: to end an active trace and get a session summary. It implies usage after starting a trace with 'thinkneo_start_trace' but does not explicitly state when not to use it or name alternatives, such as using 'thinkneo_get_trace' for ongoing trace details without ending it.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_estimate_tokensARead-onlyIdempotentInspect
Estimate token count and cost for text across 25+ LLM models. Uses tokenizer-specific char-per-token factors (GPT/Claude/Gemini/Llama/etc.) Returns per-model estimates with input + output cost breakdown. Useful for: budgeting before a large batch job, comparing models by workload, context window planning. No authentication required.
| Name | Required | Description | Default |
|---|---|---|---|
| text | Yes | Text to estimate tokens for | |
| models | No | Specific models to estimate (list of model names). Defaults to all. | |
| expected_output_tokens | No | Expected output tokens (for cost estimation). Defaults to input token count. |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnlyHint=true and idempotentHint=true. Description adds 'No authentication required' and explains it uses tokenizer-specific factors, providing useful context beyond annotations. Does not contradict.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Four well-structured sentences. Each sentence adds distinct value: purpose, methodology, output, use cases. No filler. Front-loaded with the core action.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
With annotations covering safety/idempotency and output schema assumed to exist, description still provides complete context: what it does, how it works, what it returns, and when to use it. No gaps.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Input schema has 100% coverage with clear descriptions for all three parameters. Description adds some extra context (methodology, defaults) but is not needed for parameter understanding. Baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clearly states it estimates token count and cost for text across 25+ LLM models, with specific verb 'estimate' and resource 'token count and cost'. Distinguishes from sibling tools which are about auditing, monitoring, or optimization.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly lists three use cases: budgeting before batch jobs, comparing models by workload, and context window planning. Does not mention when not to use, but no obvious alternative among siblings.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_evaluate_guardrailARead-onlyIdempotentInspect
Evaluate a prompt or text against ThinkNEO guardrail policies before sending it to an AI provider. Returns risk assessment, violations found, and recommendations. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| text | Yes | The prompt or text content to evaluate for policy violations (max 32,000 characters) | |
| workspace | Yes | Workspace whose guardrail policies to apply for this evaluation | |
| guardrail_mode | No | Evaluation mode: 'monitor' (log violations only) or 'enforce' (block the request on violation) | monitor |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations declare readOnlyHint=true and idempotentHint=true, covering the safety profile. The description adds valuable operational context: it discloses the return structure ('risk assessment, violations found, and recommendations') and critical prerequisites ('Requires authentication') not present in annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three tightly constructed sentences with zero redundancy. The first sentence establishes the core action and timing, the second describes outputs, and the third states prerequisites. Every sentence earns its place with no filler.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the presence of an output schema (per context signals) and rich annotations, the description provides sufficient context: purpose, return value summary, and authentication requirements. Minor gap: could briefly mention the guardrail_mode default behavior, but completeness is strong for a 3-parameter evaluation tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the schema adequately documents all three parameters (text, workspace, guardrail_mode). The description implies the text parameter ('Evaluate a prompt or text') but does not add semantic meaning beyond what the structured schema already provides, warranting the baseline score.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses specific verbs ('Evaluate') and clearly identifies the resource (prompt/text against ThinkNEO guardrail policies). It distinguishes from siblings like 'check_policy' by specifying this evaluates content pre-submission ('before sending it to an AI provider') rather than checking policy configuration.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides temporal context ('before sending it to an AI provider') indicating when to use it, which implies the workflow position. However, it lacks explicit guidance on when NOT to use this versus siblings like 'thinkneo_check_policy' or alternative approaches.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_evaluate_trust_scoreAInspect
Evaluate your organization AI Trust Score (0-100) across 10 dimensions: Guardrails, PII Protection, Injection Defense, Audit Trail, Compliance, Model Governance, Cost Controls, Outcome Validation, Observability, and Smart Routing. Returns a score, detailed breakdown, badge level (Platinum/Gold/Silver/Bronze/Unrated), and actionable recommendations. Score is valid for 30 days. Generates a public badge URL for embedding in websites and documentation. Part of the 'From Prompt to Proof' framework. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| org_name | Yes | Organization name for the trust score badge (e.g., 'Acme Corp') |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate this is not read-only and not idempotent, but the description adds valuable behavioral context beyond that: it specifies the 30-day validity period, mentions that it generates a public badge URL, and states authentication requirements. No contradiction with annotations exists.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is efficiently structured with zero wasted words. It front-loads the core purpose, then provides essential details about outputs, validity period, badge generation, and authentication requirements in a logical flow.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity, the description provides comprehensive context: it explains what the tool evaluates, what it returns (score, breakdown, badge level, recommendations), validity period, badge generation capability, and authentication needs. With an output schema present, it appropriately doesn't need to detail return values.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the single parameter 'org_name' is already well-documented in the schema. The description doesn't add additional parameter semantics beyond what's in the schema, so it meets the baseline expectation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('evaluate') and resource ('AI Trust Score') with detailed scope (0-100 score across 7 categories). It distinguishes from sibling tools like 'thinkneo_get_trust_badge' by emphasizing evaluation rather than retrieval.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for when to use this tool (to measure governance maturity and generate badges) and mentions authentication requirements. However, it doesn't explicitly state when NOT to use it or name specific alternatives among the many sibling tools.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_get_budget_statusARead-onlyIdempotentInspect
Get current budget utilization and enforcement status for a workspace. Shows spend vs limit, alert thresholds, and projected overage. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| workspace | Yes | Workspace name or ID to retrieve current budget status for |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations establish readOnlyHint=true, so the description adds value by disclosing 'Requires authentication' (auth needs) and clarifying the computational nature of the data ('projected overage'). It does not contradict annotations, though it could further describe caching behavior or rate limits.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three distinct sentences efficiently distribute information: purpose (sentence 1), specific outputs (sentence 2), and prerequisites (sentence 3). Minor overlap exists between 'budget utilization' and 'spend vs limit,' but each sentence advances the agent's understanding without redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's low complexity (single parameter, read-only operation), presence of an output schema, and complete schema coverage, the description provides sufficient context. The mention of authentication requirements compensates for the lack of annotation coverage on auth.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the parameter 'workspace' is fully documented in the schema itself ('Workspace name or ID'). The description provides baseline adequacy by not duplicating this information, earning the standard score for high-coverage schemas.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses a specific verb ('Get') with a clear resource ('budget utilization and enforcement status') and scope ('for a workspace'). It distinguishes from sibling thinkneo_check_spend by detailing specific outputs like 'enforcement status,' 'alert thresholds,' and 'projected overage,' though it doesn't explicitly name the sibling for comparison.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage by detailing what it returns ('spend vs limit, alert thresholds'), allowing an agent to infer when this tool is appropriate. However, it lacks explicit guidance on when to use this versus thinkneo_check_spend or other siblings, and doesn't state exclusion criteria.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_get_compliance_statusBRead-onlyIdempotentInspect
Get compliance and audit readiness status for a workspace. Shows governance score, pending actions, and compliance gaps. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| framework | No | Compliance framework to assess: soc2 (SOC 2 Type II), gdpr (GDPR), hipaa (HIPAA), or general (ThinkNEO AI governance) | general |
| workspace | Yes | Workspace name or ID to evaluate compliance readiness for |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already establish readOnlyHint=true and idempotentHint=true. The description adds value by stating 'Requires authentication' (not in annotations) and previewing output content ('governance score, pending actions, and compliance gaps'). However, it omits rate limits, caching behavior, or framework-specific behaviors.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences total, front-loaded with the core action ('Get compliance...'), followed by output preview and auth requirement. No redundant or wasted language; each sentence provides distinct information not available in structured fields.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the presence of an output schema (not shown but indicated in context signals) and comprehensive input schema, the description appropriately focuses on purpose and operational requirements rather than replicating return value documentation. It could improve by noting the framework parameter's default value.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the schema fully documents both parameters (workspace and framework) including valid enum-like values for framework. The description adds no parameter-specific guidance, meeting the baseline expectation for well-schematized tools.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool 'Get[s] compliance and audit readiness status for a workspace' with specific verbs and resource types. It implicitly distinguishes from siblings like check_spend, check_policy, and get_budget_status by focusing on governance/compliance domain, though it doesn't explicitly contrast with similar monitoring tools like evaluate_guardrail.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. It does not clarify the distinction between 'compliance status' (this tool) and 'policy checks' (thinkneo_check_policy) or 'guardrail evaluation' (thinkneo_evaluate_guardrail), leaving the agent to infer appropriate usage from the name alone.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_get_observability_dashboardARead-onlyIdempotentInspect
Get the agent observability dashboard — aggregated metrics for your AI agents. Includes total sessions, events, cost, error rate, latency, top agents, top tools, active alerts, and cost trend over time. Like Datadog, but for AI agents. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| period | No | Time period: '1h', '24h', '7d', or '30d' | 24h |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate read-only and idempotent operations, which the description does not contradict. The description adds valuable context beyond annotations by specifying the metrics included, comparing to Datadog, and stating authentication requirements, enhancing behavioral understanding.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is front-loaded with the core purpose, followed by details on metrics and context. Every sentence adds value, with no redundant information, making it efficient and well-structured.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity, the description is complete: it explains the purpose, metrics, authentication needs, and provides an analogy. With annotations covering safety and an output schema present, no additional details on return values or behavior are necessary.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, fully documenting the single parameter 'period' with its type, default, and allowed values. The description does not add any parameter-specific information beyond what the schema provides, so it meets the baseline for high schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Get') and resource ('agent observability dashboard'), listing detailed metrics included. It distinguishes from sibling tools by focusing on aggregated metrics for AI agents, unlike tools for budgets, traces, or memory operations.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for when to use this tool ('aggregated metrics for your AI agents') and mentions authentication requirements. However, it does not explicitly state when not to use it or name specific alternatives among the sibling tools.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_get_proofARead-onlyIdempotentInspect
Retrieve the immutable proof record for a verified claim. Includes the original claim, verification evidence, verifier identity, and a SHA-256 proof hash for tamper detection. This is the 'proof' in 'From Prompt to Proof'. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| claim_id | Yes | UUID of the claim to get proof for |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint and idempotentHint as true, establishing safe read behavior. The description adds value by specifying the immutable nature, the exact contents of the proof (including SHA-256 hash for tamper detection), and the authentication requirement. No contradiction with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is two sentences long, front-loaded with the action verb 'Retrieve', and every sentence adds important context (contents, connection to proof concept, authentication). No fluff or repetition.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
With annotations covering safety and output schema present, the description need not detail return values. It adequately covers what the proof contains, immutability, and auth requirement. Missing error scenarios or pagination, but acceptable for a simple retrieval tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, with the single claim_id parameter already described as a UUID. The description adds no additional parameter-level semantics beyond the schema, so baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses a specific verb ('Retrieve') and a clear resource ('immutable proof record for a verified claim'). It enumerates the proof contents (original claim, evidence, verifier, hash) and explicitly connects to the 'From Prompt to Proof' concept, distinguishing it from sibling tools like verify_claim or register_claim.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description mentions authentication is required but does not provide explicit guidance on when to use this tool versus siblings (e.g., after verification, not before). The phrase 'This is the 'proof'' gives indirect context but lacks explicit when-to-use / when-not-to-use or alternatives.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_get_savings_reportARead-onlyIdempotentInspect
Get your AI cost savings report. Shows total requests routed, original cost (what you'd have paid with premium models), actual cost, total savings, savings percentage, breakdown by task type, and model distribution. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| period | No | Report period: '7d' (7 days), '30d' (30 days), or '90d' (90 days) | 30d |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate read-only and idempotent operations, which the description aligns with by describing a retrieval function. The description adds value beyond annotations by specifying authentication requirements and detailing the report's content (e.g., breakdowns by task type), though it could mention rate limits or data freshness. No contradiction with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is front-loaded with the core purpose and efficiently lists key output components in a single, well-structured sentence. Every part adds value, such as specifying authentication needs and report details, with no redundant or wasted information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (retrieving detailed savings data), the description is complete: it outlines the report's scope, includes authentication requirements, and lists output components. With annotations covering safety (read-only, idempotent) and an output schema likely detailing return values, no additional explanation of behavior or returns is needed.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage, fully documenting the 'period' parameter with options and default. The description does not add parameter semantics beyond the schema, as it focuses on output details. With high schema coverage, the baseline score of 3 is appropriate, as the description compensates by explaining what the tool returns rather than inputs.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose with specific verbs ('Get', 'Shows') and resources ('AI cost savings report'), listing detailed output components like total requests, costs, savings, and breakdowns. It distinguishes from sibling tools like 'thinkneo_check_spend' or 'thinkneo_simulate_savings' by focusing on a comprehensive report rather than checks or simulations.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage for retrieving cost savings data but does not explicitly state when to use this tool versus alternatives like 'thinkneo_check_spend' or 'thinkneo_simulate_savings'. It mentions authentication requirements, providing some context, but lacks clear exclusions or direct comparisons to sibling tools.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_get_traceARead-onlyIdempotentInspect
Retrieve the full trace for an agent session. Returns the complete timeline of events (tool calls, model calls, decisions, errors), session metadata, total cost, duration, and any alerts triggered. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| session_id | Yes | Session ID to retrieve the trace for |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true and idempotentHint=true, indicating safe, repeatable operations. The description adds valuable context beyond this: it discloses authentication requirements and details the return content (timeline, metadata, cost, duration, alerts), which is not covered by annotations. No contradictions exist.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is front-loaded with the core purpose in the first sentence, followed by specific return details and authentication requirement. Every sentence adds essential information without redundancy, making it efficient and well-structured.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (retrieving detailed trace data), the description is complete: it specifies the return content, authentication needs, and purpose. With annotations covering safety, an output schema likely detailing the return structure, and high schema coverage, no significant gaps remain.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, with the single parameter 'session_id' fully documented in the schema. The description does not add any additional meaning or examples beyond what the schema provides, such as format or validation rules, so it meets the baseline for high coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('retrieve'), resource ('full trace for an agent session'), and scope ('complete timeline of events, session metadata, total cost, duration, alerts triggered'), distinguishing it from siblings like thinkneo_end_trace or thinkneo_start_trace which manage rather than retrieve traces.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly states 'Requires authentication' as a prerequisite, providing clear context for usage. However, it does not specify when to use this tool versus alternatives like thinkneo_get_observability_dashboard or thinkneo_list_alerts, which might offer related but different data.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_get_trust_badgeARead-onlyIdempotentInspect
Get a public AI Trust Score badge by report token. Returns the organization name, score, badge level, and validity period. Use the badge URL to embed the trust badge in websites and documentation. No authentication required.
| Name | Required | Description | Default |
|---|---|---|---|
| report_token | Yes | The report token from a trust score evaluation (URL-safe string) |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnlyHint=true and idempotentHint=true, covering safety and idempotency. The description adds valuable context beyond this: it specifies that no authentication is required (addressing access needs) and describes the return format (organization name, score, badge level, validity period) and the badge's embeddable nature. It doesn't mention rate limits or side effects, but with annotations provided, this is sufficient for a high score.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is front-loaded with the core purpose, followed by return details and usage guidance, all in three concise sentences with zero waste. Each sentence earns its place: the first defines the action, the second lists outputs, and the third covers embedding and authentication. It's efficiently structured and appropriately sized for the tool's complexity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's low complexity (1 parameter, read-only/idempotent annotations, and an output schema), the description is complete. It covers purpose, return values, usage context, and authentication needs. With annotations handling safety and an output schema likely detailing the response structure, no additional information is needed for effective agent use.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, with the parameter 'report_token' fully documented in the schema as 'The report token from a trust score evaluation (URL-safe string).' The description adds no additional parameter details beyond implying the token is used to fetch a badge. Since the schema does the heavy lifting, the baseline score of 3 is appropriate, as the description doesn't significantly enhance parameter understanding.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Get a public AI Trust Score badge') and resource ('by report token'), distinguishing it from siblings like 'thinkneo_evaluate_trust_score' (which likely generates scores) or 'thinkneo_get_compliance_status' (which focuses on compliance). It explicitly mentions what is returned (organization name, score, badge level, validity period) and the badge's use case (embedding in websites/documentation), making the purpose highly specific and differentiated.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool: 'Use the badge URL to embed the trust badge in websites and documentation.' It also states 'No authentication required,' which clarifies prerequisites. While it doesn't name specific alternatives, the context implies this is for retrieving an existing badge (vs. evaluating a new score with 'thinkneo_evaluate_trust_score'), offering clear usage context.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_list_alertsARead-onlyIdempotentInspect
List active alerts and incidents for a workspace. Includes budget alerts, policy violations, guardrail triggers, and provider issues. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Maximum number of alerts to return (1–100) | |
| severity | No | Filter alerts by severity level: critical, warning, info, or all | all |
| workspace | Yes | Workspace name or ID to list active alerts for |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Adds valuable behavioral context beyond annotations: specifies 'active' alerts (temporal scope), enumerates content categories included, and states 'Requires authentication.' Annotations already declare readOnlyHint=true, so the description appropriately focuses on content scope and auth requirements rather than safety.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences totaling 21 words. Front-loaded with core purpose first, followed by content scope, then authentication requirement. Zero redundancy—every sentence provides distinct information not available in structured fields.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Appropriate for complexity level: output schema exists (covering return values), input schema is fully documented, and description covers auth requirements and content filtering. Minor gap regarding pagination behavior or default sorting, but 'limit' parameter implies pagination support.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100% (all 3 parameters fully documented with types, ranges, and valid values). Description does not explicitly discuss parameters, but baseline 3 is appropriate when schema carries the full semantic load.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Specific verb ('List') + resource ('active alerts and incidents') + scope ('for a workspace'). Distinguishes from siblings (check_policy, check_spend, evaluate_guardrail, provider_status) by explicitly enumerating the alert types it aggregates: 'budget alerts, policy violations, guardrail triggers, and provider issues.'
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Implies usage through enumeration of included alert categories (suggesting it's an aggregate view), but lacks explicit when-to-use guidance versus the specific check/evaluation sibling tools. No prerequisites or exclusions stated beyond authentication.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_log_decisionAInspect
Log a business decision made by an AI agent. Tracks the AI cost and the business value generated. If a baseline exists for the process, value is auto-calculated from the baseline cost. Example: agent 'support-bot' resolved a 'customer_support_ticket' at $0.03 AI cost, replacing a $12 human-handled ticket. ROI: 400:1.
| Name | Required | Description | Default |
|---|---|---|---|
| outcome | No | Result: 'success', 'escalated', 'rejected', 'error' | success |
| metadata | No | JSON string with additional context | |
| workspace | No | Workspace identifier | default |
| agent_name | Yes | Name of the AI agent that made the decision, e.g. 'support-bot', 'loan-reviewer' | |
| confidence | No | Confidence score 0.0-1.0 | |
| ai_cost_usd | No | Actual AI cost for this decision in USD, e.g. 0.03 | |
| process_name | No | Links to a baseline process for auto ROI calculation | |
| decision_type | Yes | Type of decision, e.g. 'ticket_resolved', 'loan_approved', 'content_reviewed' | |
| value_generated_usd | No | Explicit business value in USD. If omitted and process_name has a baseline, auto-calculated. |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate readWrite (not readOnly) and non-idempotent, which matches the logging nature. Description adds key behavior: auto-calculation of value when baseline exists. No contradictions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences plus an example relay all essential info without extra words. Front-loaded with core purpose, then concrete illustration.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a logging tool with detailed input schema (9 params, all described) and output schema present, the description covers business logic, example, and scenario, making it complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with descriptions for all parameters. Description adds context about auto-calculation for value_generated_usd and process_name linking, beyond schema. Baseline 3 is appropriate, with slight uplift for added business logic.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states the tool logs business decisions by AI agents, tracking AI cost and business value. The example further clarifies purpose and distinguishes it from siblings like thinkneo_decision_cost or thinkneo_log_event.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Includes a concrete example of when to use (logging agent decisions with cost/value), but does not explicitly state when not to use or compare to alternatives. However, the example provides strong usage context.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_log_eventAInspect
Log an event within an active agent trace. Supports event types: tool_call, model_call, decision, error, pii_access, guardrail_triggered. Returns event_id and running session cost. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| cost | No | Estimated cost in USD for this event (e.g., 0.003 for an API call) | |
| metadata | No | Optional dict with additional event context | |
| tool_name | No | Name of the tool called (for tool_call events) | |
| event_type | Yes | Event type: 'tool_call', 'model_call', 'decision', 'error', 'pii_access', or 'guardrail_triggered' | |
| latency_ms | No | Latency in milliseconds for this event | |
| model_name | No | Model used (for model_call events, e.g., 'gpt-4o', 'claude-sonnet-4-20250514') | |
| session_id | Yes | Session ID from thinkneo_start_trace | |
| input_summary | No | Brief summary of the input (max 500 chars, truncated if longer) | |
| output_summary | No | Brief summary of the output (max 500 chars, truncated if longer) |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate this is not read-only and not idempotent (write operation, non-idempotent). The description adds valuable context about authentication requirements and return values (event_id and running session cost), which aren't covered by annotations. However, it doesn't mention rate limits, error behavior, or other operational constraints that would be helpful.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three concise sentences that each earn their place: first states purpose and event types, second states return values, third states authentication requirement. No wasted words, front-loaded with core functionality.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (9 parameters, write operation) and the presence of both annotations and output schema, the description provides good context about authentication, event types, and return values. However, it could better explain the relationship with thinkneo_start_trace (mentioned in schema) and thinkneo_end_trace (sibling tool) for a more complete picture.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the schema already documents all 9 parameters thoroughly. The description lists the supported event types (tool_call, model_call, decision, error, pii_access, guardrail_triggered), which provides helpful context beyond the schema's enum-less definition, but doesn't add significant semantic value for other parameters.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb ('Log') and resource ('an event within an active agent trace'), and distinguishes it from siblings by specifying its unique function in the observability/tracing system. It's specific about what it does (logging with specific event types) rather than being generic.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context ('within an active agent trace') and mentions authentication requirements, but doesn't explicitly state when NOT to use it or name specific alternatives among the sibling tools. It implies usage for logging events during tracing sessions but lacks explicit exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_log_risk_avoidanceAInspect
Log a risk event that was blocked or avoided by the governance layer. Quantifies the estimated dollar impact of the avoided risk. Examples: PII leak blocked (est. $50K GDPR fine), prompt injection prevented, policy violation caught before production. If estimated_impact_usd is not provided, a default is calculated from severity.
| Name | Required | Description | Default |
|---|---|---|---|
| severity | No | Severity level: 'low', 'medium', 'high', 'critical' | medium |
| risk_type | Yes | Type: 'pii_leak', 'injection_blocked', 'policy_violation', 'spend_limit', 'compliance_breach', 'data_exfiltration' | |
| workspace | No | Workspace identifier | default |
| agent_name | No | Agent involved, if applicable | |
| description | No | Brief description of what was blocked | |
| estimated_impact_usd | No | Estimated cost if this risk had materialized in USD |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Description discloses non-obvious behavior: if estimated_impact_usd is missing, a default is calculated from severity. Annotations already show non-readOnly and non-idempotent, so mutable behavior is expected. No contradictions. Could add detail on what the default calculation entails.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences are each essential: purpose, quantification, examples, and fallback behavior. No redundant or filler content. Front-loaded with the core action.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 6 params (most with defaults), high schema coverage, and output schema (not examined but likely sufficient), the description is largely complete. Minor gap: doesn't describe the return value or any side effects beyond logging. Still adequate for an audit log tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with descriptions for all 6 parameters. The description adds value by explaining the relationship between severity and impact, and by listing example risk types. However, it doesn't elaborate on the 'workspace' or 'agent_name' beyond schema, which are straightforward.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states the tool logs a risk event that was blocked by the governance layer and quantifies dollar impact. It provides concrete examples (PII leak, injection, policy violation) that disambiguate its use from sibling tools like thinkneo_log_event or thinkneo_log_decision, making the specific purpose unmistakable.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Description implies when to use this tool: after a risk is blocked or avoided. It doesn't explicitly exclude scenarios or name alternative tools, but examples give strong context. Could be improved by stating when not to use it (e.g., for actual incidents).
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_optimize_promptARead-onlyIdempotentInspect
Optimize an LLM prompt for clarity, conciseness, and token efficiency. Detects: filler words, verbose phrases, hedging language, repeated content, missing structure (output format, role, constraints), and excessive politeness. Returns suggestions + an automatically rewritten concise version with estimated token savings. No authentication required.
| Name | Required | Description | Default |
|---|---|---|---|
| prompt | Yes | The prompt text to optimize (max 20,000 chars) |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true, destructiveHint=false, idempotentHint=true. Description adds beyond annotations: states 'No authentication required', details return format (suggestions + rewritten version with token savings), and lists detection categories. No contradictions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Concise multi-sentence description: first sentence states purpose, second lists detections, third describes return, fourth notes auth. No wasted words, front-loaded with key info.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given simple one-parameter schema, existing annotations, and presence of output schema, the description covers what the tool does, its inputs, and outputs. No gaps for this complexity level.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Single parameter 'prompt' with schema description 'The prompt text to optimize (max 20,000 chars)'. Schema coverage 100%, but description adds a constraint (max chars) not in schema. Additionally, tool description provides context on what optimization entails.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description uses specific verb 'optimize' on resource 'prompt', lists specific detections (filler words, verbose phrases, etc.) and outputs (suggestions + rewritten version). Clearly distinguishes from siblings like thinkneo_estimate_tokens or thinkneo_detect_waste.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Implies usage via listed detections (when you want to improve prompt efficiency), but no explicit when-to-use vs alternatives, nor when-not-to-use. Siblings exist but no comparative guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_policy_createAInspect
Create or update a declarative AI governance policy. Policies define rules that AI agents must follow — conditions are evaluated against request context, and effects (block, warn, require_approval, log) are enforced automatically. Supports versioning — updating a policy creates a new version and disables the old one. Part of the Policy Engine — Governance-as-Code. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| name | Yes | Policy name (e.g., 'approval_over_10k', 'pii_blocked_models') | |
| scope | No | Scope filter: {agents: ['agent-1', 'finance-*'], actions: ['approve_payment']}. Use ['*'] for all. Default: all agents and actions. | |
| effect | Yes | Effect when all conditions match: 'block', 'warn', 'require_approval', or 'log' | |
| conditions | Yes | List of conditions. Each condition: {field, operator, value}. Fields: 'model', 'provider', 'cost', 'task_type', 'agent_name', 'data_classification', 'action'. Operators: '==', '!=', '>', '>=', '<', '<=', 'in', 'not_in', 'contains', 'matches'. Example: [{"field": "cost", "operator": ">", "value": 10000}] | |
| description | No | Human-readable description of what this policy does |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate readWrite and non-idempotent behavior. The description adds critical context: updating creates a new version and disables the old one, policies are enforced automatically, and authentication is required. This goes beyond what annotations state.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is three sentences, each serving a distinct purpose: stating the action, explaining the policy model, and noting versioning and context. No redundant information; front-loaded with the primary function.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (5 parameters, versioning, output schema), the description covers the essential workflow. It does not detail parameter relationships or error conditions, but the schema and output schema fill those gaps. For a create/update tool, it is adequately complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% coverage with detailed descriptions for all 5 parameters (name, scope, effect, conditions, description). The tool description does not add extra semantic meaning beyond referencing 'conditions' and 'effects' generally; schema already suffices.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool creates or updates AI governance policies, specifying the core components (conditions, effects) and versioning behavior. It distinguishes from sibling tools like thinkneo_policy_list and thinkneo_policy_evaluate by focusing on creation/updating rather than evaluation or listing.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description mentions versioning and authentication as prerequisites but does not explicitly contrast with alternative tools for policy management. However, the context ('Create or update') makes the primary use case clear, and the sibling list provides implicit differentiation.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_policy_evaluateARead-onlyIdempotentInspect
Evaluate a request context against all active policies. Returns whether the request is allowed, which policies were violated, and what effect each violation triggers. Use this before executing agent actions to enforce governance rules. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| context | Yes | Request context to evaluate. Fields: 'model' (e.g., 'gpt-4o'), 'provider' (e.g., 'openai'), 'cost' (e.g., 15000), 'task_type' (e.g., 'legal_analysis'), 'agent_name' (e.g., 'finance-agent'), 'data_classification' (e.g., 'pii', 'public', 'confidential'), 'action' (e.g., 'approve_payment'). Example: {"cost": 15000, "action": "approve_payment", "agent_name": "finance-bot"} |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate idempotentHint=true and readOnly=false. Description adds authentication requirement and clarifies the return structure. No contradiction with annotations; the behavioral traits are well disclosed.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three concise sentences: purpose/output, usage guidance, authentication. No fluff, well front-loaded with the most important information first.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Completely describes the tool's purpose, return values, usage context, and authentication requirement. Given the presence of an output schema for return details, no further explanation is needed.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%. The description reinforces the schema by listing example fields and providing a concrete example value ({"cost": 15000, ...}). This adds marginal value beyond the schema description, but the schema already clearly describes the context parameter.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clearly states the verb 'Evaluate', the resource 'request context against all active policies', and the specific output ('allowed status, violated policies, violation effects'). Distinguishes from siblings like thinkneo_check_policy and thinkneo_policy_violations by focusing on evaluating a request rather than checking or listing policies.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly states when to use: 'before executing agent actions to enforce governance rules.' Does not explicitly mention when not to use or alternatives, but the context is clear and actionable.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_policy_listARead-onlyIdempotentInspect
List all active AI governance policies for your organization. Shows policy name, version, conditions, effect, and scope. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| include_disabled | No | Include disabled/superseded policy versions |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already provide readOnlyHint and idempotentHint, so the description correctly does not repeat them. It adds important behavioral context: 'Requires authentication' and specifies the data returned (policy details). This goes beyond annotations. However, it does not mention potential rate limits or pagination, which could exist for large policy lists.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is only two sentences: the first states the core action and scope, the second lists what is shown and notes authentication. Every sentence is essential; no fluff or repetition. It is highly efficient for an AI agent to parse.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
With an output schema present and high schema coverage, the description does not need to explain return values. It covers the purpose, key behavioral requirement (auth), and the data fields. Given the tool's simplicity (one optional parameter), the description is fully adequate for correct invocation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage, so the schema already documents the 'include_disabled' parameter. The description adds meaning by implying its effect: listing only active policies by default and including disabled when set. This provides context beyond the schema's bare description, earning above the baseline of 3.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states 'List all active AI governance policies', specifying both the action (list) and the resource (policies). It enumerates the fields shown (policy name, version, conditions, effect, scope), making the output concrete. This distinguishes it from sibling tools like thinkneo_policy_create or thinkneo_policy_evaluate.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly indicates when to use the tool: to list active policies. It does not mention when not to use it or provide alternatives, but the context is clear enough given the sibling tool names. A slight gap is the lack of explicit exclusions, so score 4 reflects clear context without exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_policy_violationsARead-onlyIdempotentInspect
View policy violation history — which policies were triggered, by which agents, how many blocks/warnings/approvals occurred. Shows unresolved violations for compliance follow-up. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| days | No | Number of days to look back (default 30, max 365) | |
| unresolved_only | No | Show only unresolved violations |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint and idempotentHint. The description adds 'Requires authentication', a behavioral trait beyond annotations. No contradictions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences with front-loaded purpose. Every sentence adds value; no extraneous information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given an output schema exists, the description need not explain return values. It covers the core data and purpose. Missing minor details like date range flexibility, but adequate.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% so parameters are already described in the schema. The description does not add additional parameter-specific meaning beyond what is in the input schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it views policy violation history, specifying what data is shown (policies, agents, blocks/warnings/approvals). This distinguishes it from siblings like 'thinkneo_policy_list' which lists policies, not violations.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Mentions 'Requires authentication' as a prerequisite and implies use for compliance follow-up, but lacks explicit guidance on when to use this tool versus alternatives like 'thinkneo_check_policy' or 'thinkneo_policy_evaluate'.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_provider_statusARead-onlyIdempotentInspect
Get real-time health and performance status of AI providers routed through the ThinkNEO gateway. Shows latency, error rates, and availability. No authentication required.
| Name | Required | Description | Default |
|---|---|---|---|
| provider | No | Specific provider to check: openai, anthropic, google, mistral, xai, cohere, or together. Omit to get status for all providers. | |
| workspace | No | Workspace context for provider routing configuration (optional) |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations declare readOnlyHint=true; description adds valuable behavioral context including 'real-time' data nature, specific metrics returned (latency, error rates, availability), and authentication requirements. No contradictions with annotations. Could enhance further with rate limit or caching details.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three tightly crafted sentences with zero waste. Front-loaded with core purpose ('Get real-time health...'), followed by specific metrics, then auth requirements. Every sentence earns its place with no redundant or filler content.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 100% schema coverage, existing output schema, and readOnly annotations, the description adequately covers tool purpose, return data characteristics, and access requirements. Complete for a monitoring tool of this complexity, though explicit mention of the 'all providers' default could strengthen it further.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, documenting both provider options (including the 'omit for all' behavior) and workspace context. Description provides general context about 'AI providers' and 'gateway' but does not add parameter-specific semantics beyond what the schema already provides. Appropriate baseline for high schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description uses specific verb 'Get' with clear resource 'health and performance status of AI providers routed through the ThinkNEO gateway'. Explicitly distinguishes from siblings (budget/compliance/policy tools) by focusing on operational metrics like latency and availability.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides clear context that this is for monitoring provider health/performance vs financial or compliance concerns. Explicitly states 'No authentication required' which is valuable usage constraint. Lacks explicit 'when to use vs sibling X' guidance, though the functional distinction is clear.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_read_memoryARead-onlyIdempotentInspect
Read Claude Code project memory files. Without arguments, returns the MEMORY.md index listing all available memories. With a filename argument, returns the full content of that specific memory file. Use this to access project context, user preferences, feedback, and reference notes persisted across Claude Code sessions.
| Name | Required | Description | Default |
|---|---|---|---|
| filename | No | Name of the memory file to read (e.g. 'user_fabio.md', 'project_thinkneodo_droplet.md'). Omit to get the MEMORY.md index with all available files. |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true and idempotentHint=true, so the agent knows this is a safe, repeatable read operation. The description adds valuable context beyond annotations by explaining what types of content are stored ('project context, user preferences, feedback, and reference notes') and that they persist across sessions. It doesn't describe rate limits or authentication needs, but adds meaningful behavioral context.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is efficiently structured in three sentences: first states the core purpose, second explains the two usage modes, third provides context about what types of memories are accessed. Every sentence earns its place with no wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool has comprehensive annotations (readOnlyHint, idempotentHint), 100% schema description coverage, and an output schema exists (so return values are documented elsewhere), the description provides complete contextual information. It explains the tool's purpose, usage patterns, and what types of data it accesses.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already fully documents the single optional parameter. The description adds marginal value by explaining the behavioral difference when the parameter is omitted vs. provided, but doesn't add semantic details beyond what's in the schema. Baseline 3 is appropriate when schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose with specific verbs ('read', 'returns') and resources ('Claude Code project memory files', 'MEMORY.md index', 'specific memory file'). It distinguishes from sibling tools by focusing on reading memory files rather than policy checks, budget monitoring, or writing operations.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool: 'Without arguments, returns the MEMORY.md index listing all available memories. With a filename argument, returns the full content of that specific memory file.' It also implicitly distinguishes from the sibling 'thinkneo_write_memory' by being a read operation.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_register_claimAInspect
Register an action claim from an AI agent. The agent declares it performed an action (e.g., sent an email, created a PR, wrote a file) and ThinkNEO will verify it actually happened. Returns a claim_id for tracking. Part of the Outcome Validation Loop — 'From Prompt to Proof'. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| action | Yes | Type of action claimed: 'email_sent', 'http_request', 'file_written', 'db_insert', 'pr_created', 'payment_processed', 'message_sent', 'api_call', 'task_completed', 'data_exported', 'notification_sent', 'custom' | |
| target | Yes | Target of the action — what was acted upon. Examples: 'user@example.com' (email), 'https://api.example.com/endpoint' (http), '/opt/data/report.pdf' (file), 'usage_log' (db table) | |
| metadata | No | Optional verification context. For http_status: {expected_status: 200, method: 'GET'}. For file_exists: {expected_hash: 'sha256...'}. For db_row_exists: {where_column: 'id', where_value: '123'}. | |
| ttl_hours | No | Hours until claim expires if not verified (default 24, max 168) | |
| agent_name | No | Name of the agent making the claim (e.g., 'marketing-agent') | |
| session_id | No | Optional observability session_id to link this claim to a trace | |
| evidence_type | Yes | How to verify the claim: 'http_status' (check URL response), 'file_exists' (check file path), 'db_row_exists' (check database row), 'webhook' (wait for callback), 'smtp_delivery' (check email delivery), 'manual' (flag for human review) |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds context beyond annotations (which only indicate non-read-only and non-idempotent) by stating it returns a claim_id for tracking and requires authentication. However, it does not disclose side effects like record creation, potential triggering of verification, or what happens on failure. The description is moderately transparent but could be more detailed.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is four sentences long, with the primary purpose in the first sentence. It is front-loaded, concise, and contains no redundant information. Every sentence adds value: purpose, agent action, return value, and context.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the presence of an output schema (mentioned in context signals) and high schema coverage, the description is fairly complete. It covers purpose, input, return, and context. However, it lacks details on error scenarios or verification outcomes, which could be included. Overall, it is adequate for an agent to use correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage, so each parameter is well-documented. The tool description adds little beyond that: it mentions the return value (claim_id) and context (validation loop, auth). With high schema coverage, the description's additional value is limited, meeting the baseline of 3.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states that the tool registers an action claim for verification, distinguishing it from sibling tools like thinkneo_verify_claim and thinkneo_check which focus on verification or checking. The verb 'register' and resource 'claim' are specific and unambiguous.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description mentions it is part of the 'Outcome Validation Loop' and requires authentication, implying when to use it (after an agent action). However, it does not explicitly state when not to use it or provide alternatives, such as using thinkneo_verify_claim for verification. The guidance is implied but not explicit.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_registry_getARead-onlyIdempotentInspect
Get full details for an MCP server package from the ThinkNEO Marketplace. Returns readme, full tools list, version history, reviews, security score, and installation instructions. No authentication required.
| Name | Required | Description | Default |
|---|---|---|---|
| name | Yes | Package name (e.g. 'thinkneo-control-plane', 'filesystem', 'github') |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true and idempotentHint=true, indicating safe, repeatable read operations. The description adds valuable context beyond this by stating 'No authentication required,' which is not covered by annotations and clarifies access requirements. It does not disclose rate limits or other behavioral traits, but the added context justifies a score above baseline.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is front-loaded with the core purpose in the first sentence, followed by specific details in a list format, and ends with authentication info. Every sentence adds value without redundancy, making it efficient and well-structured for quick understanding.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (retrieving detailed package info), the description is complete: it specifies the resource, lists returned details, and notes no authentication needed. With annotations covering safety, 100% schema coverage for the single parameter, and an output schema (implied by 'Has output schema: true'), no additional explanation of return values or parameters is needed, making it fully adequate.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage, with the 'name' parameter fully documented in the schema. The description does not add any additional meaning or details about the parameter beyond what the schema provides (e.g., it doesn't explain format constraints or examples beyond 'Package name'). Given the high schema coverage, the baseline score of 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb ('Get') and resource ('full details for an MCP server package from the ThinkNEO Marketplace'), specifying what information is retrieved (readme, tools list, version history, reviews, security score, installation instructions). It distinctly differentiates from sibling tools like thinkneo_registry_search (search vs. get details) and thinkneo_registry_install (install vs. get details), avoiding tautology.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for usage by listing the returned details, which implies it's for retrieving comprehensive package information. However, it does not explicitly state when to use this tool versus alternatives like thinkneo_registry_search (e.g., for searching vs. getting full details) or mention any exclusions, leaving some guidance implicit rather than explicit.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_registry_installAInspect
Get installation config for an MCP server from the ThinkNEO Marketplace. Returns ready-to-use JSON config for Claude Desktop, Cursor, Windsurf, or custom clients. Tracks the download. No authentication required.
| Name | Required | Description | Default |
|---|---|---|---|
| name | Yes | Package name to install (e.g. 'thinkneo-control-plane') | |
| client_type | No | Your MCP client: claude-desktop, cursor, windsurf, or custom | claude-desktop |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds valuable behavioral context beyond annotations: it discloses that the tool 'Tracks the download' (a side effect not indicated by annotations) and explicitly states 'No authentication required' (though annotations don't contradict this). While annotations provide readOnlyHint=false and idempotentHint=false, the description usefully clarifies the tracking behavior and auth requirements.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is perfectly front-loaded with the core purpose in the first clause, followed by important behavioral details. Every sentence earns its place: the first states what it does, the second specifies the output format, and the third covers tracking and authentication. No wasted words or redundant information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's moderate complexity, the presence of both annotations and an output schema, and the description's coverage of purpose, behavior, and constraints, this description is complete enough. The output schema will handle return value documentation, and the description covers the essential operational context including tracking behavior and authentication requirements.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the input schema already fully documents both parameters. The description doesn't add any additional parameter semantics beyond what's in the schema, so it meets the baseline expectation without providing extra value. The description mentions client types but doesn't elaborate beyond what's in the schema's client_type description.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Get installation config'), resource ('MCP server from the ThinkNEO Marketplace'), and output format ('ready-to-use JSON config'). It distinguishes itself from sibling tools like thinkneo_registry_get, thinkneo_registry_search, and thinkneo_registry_publish by focusing on installation configuration rather than general registry operations.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context about when to use this tool ('Get installation config for an MCP server from the ThinkNEO Marketplace') and mentions specific client types. However, it doesn't explicitly state when NOT to use it or name alternatives among sibling tools like thinkneo_registry_get for general package information.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_registry_publishAInspect
Publish an MCP server to the ThinkNEO Marketplace. Validates the endpoint by calling initialize and tools/list, runs automated security scan for secrets and injection patterns, computes a security score (0-100), and stores the entry with version history. Validates the endpoint (calls initialize + tools/list), runs security scan (secrets detection, injection patterns), and stores the entry. Authentication required.
| Name | Required | Description | Default |
|---|---|---|---|
| name | Yes | Package name (lowercase, hyphens allowed, e.g. 'my-mcp-server') | |
| tags | No | Tags for discoverability (e.g. ['ai', 'governance', 'security']) | |
| readme | No | Full readme/documentation in markdown | |
| license | No | License (e.g. MIT, Apache-2.0) | MIT |
| repo_url | No | Source code repository URL | |
| transport | No | Transport type: streamable-http, sse, or stdio | streamable-http |
| categories | No | Categories: governance, security, data, development, productivity, communication, analytics, devops, finance, marketing, other | |
| description | Yes | Short description of what this MCP server does (max 500 chars) | |
| display_name | Yes | Human-readable display name | |
| endpoint_url | Yes | MCP server endpoint URL (e.g. https://my-server.com/mcp) |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds valuable behavioral context beyond the annotations: it explains the validation process (initialize + tools/list calls), security scanning (secrets detection, injection patterns), and authentication requirements. While annotations indicate this is not read-only and not idempotent, the description provides operational details that help the agent understand what happens during execution.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is perfectly concise and front-loaded: a single sentence that immediately states the core purpose, followed by key behavioral details (validation, security scan, storage, authentication). Every element earns its place with no wasted words or redundant information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (10 parameters, mutation operation), the description provides excellent context: it explains the multi-step publishing process, security validation, and authentication needs. With annotations covering safety hints and an output schema presumably handling return values, the description focuses appropriately on operational behavior rather than repeating structured data.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the input schema already documents all 10 parameters thoroughly. The description doesn't add any parameter-specific information beyond what's in the schema, so it meets the baseline expectation without providing additional semantic context about individual parameters.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Publish an MCP server to the ThinkNEO Marketplace') and distinguishes it from siblings like thinkneo_registry_get, thinkneo_registry_install, and thinkneo_registry_search by focusing on the publishing/validation process rather than retrieval or installation operations.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for when to use this tool ('Publish an MCP server to the ThinkNEO Marketplace') and mentions authentication requirements, but doesn't explicitly state when NOT to use it or name specific alternatives among the sibling tools (e.g., when to use thinkneo_registry_search vs. publish).
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_registry_reviewAIdempotentInspect
Rate and review an MCP server in the ThinkNEO Marketplace. One review per user per package (updates on repeat). Rating from 1 (poor) to 5 (excellent) with optional comment. Reviews affect the package average rating shown in search results. One review per user per package (updates on repeat). Authentication required.
| Name | Required | Description | Default |
|---|---|---|---|
| name | Yes | Package name to review | |
| rating | Yes | Rating from 1 (poor) to 5 (excellent) | |
| comment | No | Review comment (max 2000 chars) |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds valuable behavioral context beyond the annotations: it specifies 'One review per user per package (updates on repeat)', which clarifies idempotent behavior and user constraints, and 'Authentication required', which indicates security needs. The annotations (readOnlyHint: false, idempotentHint: true) are consistent with the description's implications of mutation and repeatability, and the description enriches this with practical details.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise and front-loaded, consisting of just two sentences that efficiently convey the tool's purpose, constraints, and requirements without any wasted words. Every sentence adds critical information, making it easy to parse and understand quickly.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's moderate complexity (mutation with authentication and idempotency), the description is complete enough. It covers purpose, usage rules, and behavioral traits, while the annotations provide safety and idempotency hints, and the output schema (present) will handle return values. No significant gaps remain for effective agent use.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage, providing clear documentation for all parameters (name, rating, comment). The description does not add any additional semantic meaning beyond what the schema already explains, such as format details or usage examples. With high schema coverage, the baseline score of 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Rate and review') and resource ('an MCP server in the ThinkNEO Marketplace'), distinguishing it from sibling tools like thinkneo_registry_get, thinkneo_registry_install, thinkneo_registry_publish, and thinkneo_registry_search. It precisely defines the tool's function beyond just the name.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for when to use this tool ('Rate and review an MCP server in the ThinkNEO Marketplace') and mentions authentication requirements. However, it does not explicitly state when not to use it or name specific alternatives among the sibling tools, such as thinkneo_registry_get for viewing reviews.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_registry_searchARead-onlyIdempotentInspect
Search the ThinkNEO MCP Marketplace — the npm for MCP tools. Discover MCP servers and tools by keyword, category, rating, or verified status. Returns name, description, tools count, rating, downloads, and verified badge. No authentication required.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Max results to return (1-100, default 20) | |
| query | No | Search query — matches name, description, tags, and tool names | |
| category | No | Filter by category: governance, security, data, development, productivity, communication, analytics, devops, finance, marketing, other | |
| min_rating | No | Minimum average rating (1.0-5.0) | |
| verified_only | No | If true, return only verified packages |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true and idempotentHint=true, indicating safe, repeatable operations. The description adds valuable context beyond annotations by specifying 'No authentication required' and describing the return format ('Returns name, description, tools count, rating, downloads, and verified badge'), which helps the agent understand what to expect from the search results.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is efficiently structured in three sentences that each serve distinct purposes: stating the tool's purpose, specifying search criteria, and detailing return values and authentication status. There's no wasted language, and key information is front-loaded appropriately.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given that this is a search tool with read-only/idempotent annotations, 100% schema coverage, and an output schema (implied by 'Has output schema: true'), the description provides complete contextual information. It covers purpose, usage context, behavioral traits, and return format without needing to duplicate structured data.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, all parameters are well-documented in the input schema. The description mentions searchable fields ('keyword, category, rating, or verified status') which aligns with parameters like query, category, min_rating, and verified_only, but doesn't add significant semantic value beyond what the schema already provides. The baseline of 3 is appropriate given the comprehensive schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose with specific verbs ('Search', 'Discover') and resources ('ThinkNEO MCP Marketplace', 'MCP servers and tools'), and distinguishes it from sibling tools like thinkneo_registry_get or thinkneo_registry_install by focusing on search functionality rather than retrieval or installation operations.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for when to use this tool ('Discover MCP servers and tools by keyword, category, rating, or verified status'), but doesn't explicitly state when not to use it or name specific alternatives among siblings. The mention of 'No authentication required' is helpful but not a full alternative guide.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_rotate_keyADestructiveInspect
Rotate the current API key. Generates a new key with the same tier and monthly limit, revokes the old one with reason 'rotated'. IMPORTANT: the returned new_key is shown ONCE. Store it securely. The old key will continue to work for a 5-minute grace period to avoid downtime.
| Name | Required | Description | Default |
|---|---|---|---|
| confirm | Yes | Must be true to confirm rotation intent |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Beyond the destructiveHint annotation, the description discloses important behaviors: the new key is shown only once, the old key is revoked with reason 'rotated', and there is a 5-minute grace period. This adds value for safe invocation.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is four sentences, front-loaded with the core action, followed by critical details. Every sentence serves a purpose with no fluff.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the simple tool (1 parameter, destructive, output schema exists), the description covers all key aspects: the rotation action, the grace period, the one-time key display, and the revocation reason. It is complete without needing to explain output structure.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% for the single parameter 'confirm', and its description is already provided. The tool description does not add further parameter-specific details; it focuses on output behavior, so baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb 'rotate' and resource 'API key', and explains what rotating entails: generating a new key with the same tier/limit and revoking the old one. It is distinct from sibling tools which cover audit, policy, checks, etc.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The tool's purpose is clear, and the context (key rotation) is obvious, but there is no explicit guidance on when to use vs. alternatives or when not to use. Given the narrow scope, it's adequate.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_route_modelARead-onlyIdempotentInspect
AI Smart Router — find the cheapest model that meets your quality threshold. Specify your task type and quality requirements, and ThinkNEO will recommend the optimal model with estimated cost and savings vs premium models. Supports 17+ models across Anthropic, OpenAI, Google, Meta, Mistral, DeepSeek, Alibaba, Cohere, and xAI. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| task_type | Yes | The type of AI task: summarization, classification, code_generation, chat, analysis, translation, or embedding | |
| text_sample | No | Optional sample text for better routing. Helps estimate token count and task complexity. Max 500 characters. | |
| max_latency_ms | No | Maximum acceptable latency in milliseconds. Omit for no limit. | |
| estimated_tokens | No | Estimated total tokens for the request (input + output). Default 1000. | |
| quality_threshold | No | Minimum quality score required (0-100). Default 85 = enterprise-grade. | |
| budget_per_request | No | Maximum budget per request in USD. Omit for no limit. | |
| preferred_providers | No | Comma-separated list of preferred providers (e.g., 'openai,anthropic'). These will be prioritized at similar cost. |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations declare readOnlyHint=true and idempotentHint=true, indicating safe, repeatable operations. The description adds valuable behavioral context beyond annotations by specifying the tool 'requires authentication' and that it provides 'estimated cost and savings vs premium models.' It doesn't mention rate limits or detailed response behavior, but adds meaningful operational context.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is efficiently structured in three sentences: purpose statement, usage instructions, and scope/authentication details. Every sentence earns its place with no wasted words, and key information is front-loaded.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (7 parameters, authentication requirement, multi-provider routing), the description is complete enough. With annotations covering safety, 100% schema coverage for inputs, and an output schema present (which handles return values), the description provides adequate context about purpose, scope, and authentication needs.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the input schema already fully documents all 7 parameters. The description mentions 'task type and quality requirements' which map to two parameters, but doesn't add significant meaning beyond what the schema provides. The baseline of 3 is appropriate when the schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('find the cheapest model that meets your quality threshold') and resource ('17+ models across Anthropic, OpenAI, Google, Meta, Mistral, DeepSeek, Alibaba, Cohere, and xAI'). It distinguishes itself from sibling tools by focusing on model routing rather than bridging, checking, logging, or registry operations.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for when to use this tool ('Specify your task type and quality requirements'), but doesn't explicitly mention when not to use it or name specific alternatives among the sibling tools. It implies usage for cost optimization but lacks explicit exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_router_explainARead-onlyIdempotentInspect
Explain why the Smart Router would choose a specific model for a task type. Shows both benchmark-based (real outcomes) and static quality estimates, and explains the reasoning behind the recommendation. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| task_type | Yes | Task type: 'summarization', 'code_generation', 'classification', 'translation', 'analysis', 'chat' | |
| quality_threshold | No | Minimum quality score required (0-100, default 85) |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnly and idempotent behavior. The description adds that the tool requires authentication, which is useful. However, it does not disclose other behavioral traits like rate limits, caching, or what happens if the task type is invalid. For a tool with strong annotations, the description provides minimal extra behavioral context.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise: two sentences that front-load the core purpose and then provide additional details about what is shown and authentication. Every sentence serves a purpose with no wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's simplicity (2 parameters, 100% schema coverage, annotations present, output schema exists), the description covers the main aspects: purpose, key details (benchmark vs. static estimates, reasoning), and authentication. It could mention the output format or edge cases but is largely complete for the complexity level.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage for both parameters, so the baseline is 3. The description does not add additional parameter semantics beyond what is already in the schema. It mentions what the tool shows but does not explain the parameters further.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: to explain why the Smart Router would choose a specific model for a task type. It distinguishes itself from sibling tools like thinkneo_route_model (which likely performs the actual routing) by focusing on explanation. The verb 'Explain' combined with the resource 'Smart Router model choice' is specific and unambiguous.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies the tool is used for understanding routing decisions but does not explicitly state when to use it over alternatives (e.g., thinkneo_route_model for actual routing). No usage scenarios, prerequisites, or exclusions are provided. The context of 'explain' is clear but lacks direct guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_scan_secretsARead-onlyIdempotentInspect
Scan text for leaked secrets, API keys, tokens, credentials, and private keys before sending to an LLM or committing to version control. Detects 40+ secret types across AWS, GCP, Azure, Stripe, OpenAI, Anthropic, GitHub, Slack, Twilio, SendGrid, private keys, JWTs, database URIs and more. Returns partial matches with positions so you can redact before sending. No authentication required.
| Name | Required | Description | Default |
|---|---|---|---|
| text | Yes | Text to scan for secrets (max 100,000 chars) | |
| include_matches | No | Include partial (masked) matches in response |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Disclosures go beyond annotations: returns partial matches with positions, no authentication required. Annotations already indicate read-only, idempotent, non-destructive. Adds value without contradiction.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences front-loading the core purpose, listing capabilities, and appending key behavioral details. No redundant words, every sentence adds value.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers primary use case, detected types, output behavior, and auth. Output schema exists so return details aren't needed. Missing explicit mention of max text length (present in schema) and connection to include_matches, but overall sufficient.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, so baseline is 3. Description does not add extra meaning beyond schema; it mentions 'partial matches' but does not explicitly link to the include_matches parameter.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clearly states the tool scans text for leaked secrets, API keys, tokens, etc., with a specific use case (before sending to LLM or committing). The verb 'scan' and resource 'text for secrets' are explicit, and the list of 40+ types distinguishes it from sibling tools like thinkneo_detect_injection or thinkneo_check.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides clear context for use: 'before sending to an LLM or committing to version control'. However, it does not explicitly exclude alternative tools or mention when not to use it, leaving some ambiguity with sibling scanning tools.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_schedule_demoAInspect
Schedule a demo or discovery call with the ThinkNEO team. Collects contact information and preferences. No authentication required.
| Name | Required | Description | Default |
|---|---|---|---|
| role | No | Contact's role: cto, cfo, security, engineering, or other | |
| Yes | Business email address to receive follow-up from the ThinkNEO team | ||
| company | Yes | Company or organization name | |
| context | No | Additional context such as current AI providers used, request volume, or specific use case | |
| interest | No | Primary area of interest: guardrails, finops, observability, governance, or full platform | |
| contact_name | Yes | Full name of the person requesting the demo | |
| preferred_dates | No | Preferred meeting dates, times, and timezone (e.g., 'Tuesdays or Thursdays, 9-11am EST') |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate readOnlyHint: false and idempotentHint: false (write operation, not idempotent). The description adds valuable behavioral context about authentication requirements not found in annotations. However, it omits consequences of the non-idempotent nature (duplicate submissions) and what happens after scheduling (e.g., confirmation flow).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences, zero waste. Front-loaded with the core action ('Schedule a demo...'), followed by parameter categorization, and closes with the critical auth constraint. Every sentence earns its place with no redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the presence of an output schema (not shown but indicated in context signals), the description appropriately omits return value details. For a 7-parameter scheduling tool, it covers the essential action and auth requirements, though mentioning the side effect (creating a request/ticket) would strengthen completeness given the idempotentHint: false annotation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the baseline is met. The description categorizes parameters as 'contact information and preferences,' which loosely maps to the schema fields, but does not add significant semantic detail, validation rules, or format guidance beyond what the schema already provides.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses specific verbs ('Schedule') and identifies the resource clearly ('demo or discovery call with the ThinkNEO team'). It effectively distinguishes from siblings, which are all read-only query operations (check_, get_, list_), while this is the sole booking/action tool.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides clear context that 'No authentication required' is needed to invoke this tool, which is critical usage information. While it lacks explicit 'when not to use' guidance, the singular nature of this scheduling tool among query-focused siblings makes the appropriate use case self-evident.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_set_audit_exportAIdempotentInspect
Configure automated audit export to a SIEM integration (Splunk HEC, Elasticsearch, or generic webhook). Validates endpoint URL for SSRF safety. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| enabled | No | Enable or disable the export | |
| auth_token | No | Auth token for the integration | |
| integration | Yes | Integration type: splunk, elastic, webhook | |
| endpoint_url | Yes | SIEM endpoint URL (e.g., https://splunk.example.com:8088/services/collector) |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description discloses SSRF safety validation and authentication requirements, adding value beyond annotations (readOnlyHint=false, idempotentHint=true). There is no contradiction with annotations, and the behavioral traits are clearly stated.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two concise sentences that are front-loaded: first sentence defines the core purpose, second adds critical behavioral details. No unnecessary words or redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the presence of an output schema and rich annotations, the description covers purpose, validation, and auth. It omits what happens on validation failure or update semantics, but these are minor given the overall completeness.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, so the schema already describes parameters adequately. The description does not add additional meaning to the parameters beyond what is in the schema, resulting in a baseline score of 3.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it configures automated audit export to SIEM integrations, listing specific types (Splunk HEC, Elasticsearch, generic webhook). The verb 'Configure' and resource 'automated audit export' are specific, distinguishing it from sibling tools like thinkneo_audit_export (likely for viewing status).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides context about when to use (setting up SIEM export) and mentions validation and authentication requirements. However, it does not explicitly state when not to use or name alternatives, though the purpose is clear enough to differentiate from siblings.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_set_baselineAIdempotentInspect
Define the pre-AI cost baseline for a business process. Example: 'customer_support_ticket costs $12 per ticket and takes 15 minutes without AI'. This baseline is used to calculate ROI when agents handle the same process. Call this once per process to establish the comparison point.
| Name | Required | Description | Default |
|---|---|---|---|
| notes | No | Additional context about this baseline | |
| workspace | No | Workspace identifier | default |
| unit_label | No | What one unit represents, e.g. 'ticket', 'review', 'decision', 'document' | unit |
| process_name | Yes | Name of the business process, e.g. 'customer_support_ticket', 'loan_review', 'content_moderation' | |
| cost_per_unit_usd | Yes | Pre-AI cost per unit in USD, e.g. 12.00 for a $12 support ticket | |
| avg_duration_minutes | No | Average time in minutes for one unit without AI, e.g. 15 |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations show `readOnlyHint: false` and `idempotentHint: true`, but the description adds that this baseline is 'used to calculate ROI' and implies it's a one-time setup, which aligns with idempotency. It does not contradict annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is only three sentences, front-loaded with the core purpose and example, and no wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the schema covers all parameters, annotations are clear, and output schema exists, the description is largely complete. It could mention what happens if the baseline already exists (idempotent behavior), but overall suffices.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema has 100% description coverage, so baseline is 3. The description does not add additional meaning beyond the schema, but it reinforces the semantics with an example.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description explicitly states the tool defines a pre-AI cost baseline for a business process, uses a concrete example, and explains its purpose for ROI calculation. It is distinct from sibling tools like thinkneo_agent_roi or thinkneo_detect_waste which focus on different aspects.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description clearly says 'Call this once per process to establish the comparison point,' indicating when to use it. However, it does not explicitly mention when not to use it or alternatives among siblings.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_simulate_savingsARead-onlyIdempotentInspect
Simulate how much your organization would save on AI costs using ThinkNEO Smart Router. Enter your current monthly AI spend and primary model, and see estimated monthly and annual savings with a recommended model mix. No authentication required — try it now!
| Name | Required | Description | Default |
|---|---|---|---|
| primary_model | No | Your primary model: 'gpt-4o', 'claude-opus-4', 'claude-sonnet-4', 'gpt-4.1', or 'gemini-2.5-pro' | gpt-4o |
| monthly_ai_spend | Yes | Your current monthly AI API spend in USD (e.g., 5000.00) | |
| task_distribution | No | JSON string of task distribution, e.g., '{"chat": 0.3, "summarization": 0.2, "code_generation": 0.2, "classification": 0.15, "analysis": 0.1, "translation": 0.05}'. Values should sum to ~1.0. Omit for default enterprise distribution. |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true and idempotentHint=true, indicating a safe, repeatable operation. The description adds valuable context beyond this: it specifies 'No authentication required' (clarifying access needs) and 'try it now!' (suggesting it's for quick estimates). It does not mention rate limits or detailed behavioral traits like response format, but with annotations covering safety, this is sufficient.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is front-loaded with the core purpose, followed by usage instructions and behavioral context. Every sentence earns its place: the first states what it does, the second explains inputs, and the third adds access and encouragement. No wasted words, and it's appropriately sized for a simulation tool.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's moderate complexity (3 parameters, 1 required), rich annotations (readOnlyHint, idempotentHint), and the presence of an output schema (implied by context signals), the description is complete enough. It covers purpose, usage context, and access needs, and with the output schema handling return values, no additional explanation is needed.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema fully documents all parameters (monthly_ai_spend, primary_model, task_distribution). The description adds minimal semantic value beyond the schema, only mentioning 'Enter your current monthly AI spend and primary model' without explaining parameter interactions or defaults. Baseline 3 is appropriate as the schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Simulate how much your organization would save on AI costs using ThinkNEO Smart Router.' It specifies the action ('simulate'), resource ('AI costs'), and scope ('using ThinkNEO Smart Router'), and distinguishes itself from sibling tools focused on routing, compliance, or observability rather than cost simulation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for when to use this tool: to estimate savings based on current spend and model. It mentions 'No authentication required — try it now!' which implies a low-barrier, exploratory use case. However, it does not explicitly state when not to use it or name alternatives among siblings (e.g., vs. thinkneo_get_savings_report).
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_sla_breachesARead-onlyIdempotentInspect
View SLA breach history — which SLAs were breached, by which agents, actual vs threshold values, and resolution status. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| days | No | Days to look back (default 30) | |
| agent_name | No | Filter by agent name |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint and idempotentHint, so the description's addition of 'Requires authentication' provides some extra context. It also notes what data is returned, but does not describe other behavioral traits like pagination or rate limits.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, information-dense sentence followed by an authentication requirement. Every word adds value, and it is front-loaded with the core action.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given only 2 optional parameters and an existing output schema, the description sufficiently covers the tool's purpose and return value composition. Context is complete for a simple retrieval tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, with each parameter having a description. The tool description does not add meaningful semantics beyond the schema. Baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it's for viewing SLA breach history, listing specific data elements (breached SLAs, agents, actual vs threshold, resolution status). It implicitly distinguishes from sibling tools like 'thinkneo_sla_status' or 'thinkneo_sla_dashboard' by focusing on historical breaches.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage via 'View SLA breach history,' but lacks explicit guidance on when to use this tool versus alternatives (e.g., thinkneo_sla_status for current status). No exclusions or when-not-to-use context.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_sla_dashboardARead-onlyIdempotentInspect
SLA overview dashboard — all agents, current status, error budgets, and recent breaches (7d). The SRE dashboard for AI agents. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description discloses the authentication requirement, which adds value beyond annotations. The idempotentHint: true annotation is consistent with a dashboard that can be called multiple times without side effects. The readOnlyHint: false is not contradicted; the description does not claim immutability explicitly. Overall, the behavioral context is adequate.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise, with two sentences that efficiently convey the tool's content and audience (SRE). Every word contributes meaning, and critical information is front-loaded. No redundant or filler content exists.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool has no parameters, an output schema exists, and annotations provide idempotency, the description is largely complete. It covers what data is shown (status, error budgets, breaches) and authentication. Minor gaps include refresh cadence and data latency, but these are not critical for a dashboard understanding.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With zero parameters and 100% schema coverage trivially, the description adds no parameter details, which is acceptable. The baseline score of 4 applies as per the calibration for no-parameter tools. No further documentation is needed.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly specifies the tool as an 'SLA overview dashboard' covering 'all agents, current status, error budgets, and recent breaches (7d)'. This distinctively positions it as a holistic dashboard compared to more specific sibling tools like thinkneo_sla_breaches and thinkneo_sla_status, which focus on individual aspects.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies use when a high-level SLA health overview across agents is needed. It provides context ('the SRE dashboard for AI agents') but does not explicitly exclude scenarios or name alternatives. Given sibling tools with narrower scopes, the intended use case is clear.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_sla_defineAIdempotentInspect
Define or update an SLA (Service Level Agreement) for an AI agent. Set accuracy, quality, cost, safety, or latency thresholds with automatic breach detection and configurable actions (alert, escalate, disable, switch_model). Like SRE SLOs but for AI agent outcomes. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| metric | Yes | Metric to monitor: 'accuracy' (outcome verification rate %), 'response_quality' (avg quality score), 'cost_efficiency' (cost per verified outcome), 'safety' (guardrail pass rate %), 'latency' (avg response ms) | |
| window | No | Rolling window: '1h', '24h', '7d', or '30d' | 7d |
| threshold | Yes | Target threshold value (e.g., 95.0 for 95% accuracy) | |
| agent_name | Yes | Agent name to set SLA for (e.g., 'support-bot', 'finance-agent') | |
| breach_action | No | Action on breach: 'alert' (notify), 'escalate' (notify + flag), 'disable' (stop agent), 'switch_model' (fallback model) | alert |
| threshold_direction | No | 'min' = actual must be >= threshold (for accuracy, quality). 'max' = actual must be <= threshold (for cost, latency). | min |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate readOnlyHint=false and idempotentHint=true. The description adds value by explaining automatic breach detection and configurable actions (alert, escalate, disable, switch_model), providing behavioral context beyond the structured fields.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two concise sentences: first defines purpose, second provides context and authentication requirement. No fluff, front-loaded, and every sentence contributes meaningfully.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
With an output schema present and full parameter coverage, the description adequately covers the tool's behavior. It mentions breach detection and actions. Could be more explicit about idempotent update behavior, but the annotation covers that.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with all parameters described. The description restates the high-level categories (accuracy, quality, cost, safety, latency) but does not add new parameter-specific details beyond the schema. Baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states 'Define or update an SLA (Service Level Agreement) for an AI agent' with specific verb+resource. It distinguishes from sibling tools like thinkneo_sla_breaches and thinkneo_sla_status by focusing on definition/update rather than monitoring or breach listing.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides context (like SRE SLOs but for AI agents) and mentions authentication requirement. It does not explicitly state when to use vs alternatives, but the sibling set and clear purpose make the intended use case evident.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_sla_statusARead-onlyIdempotentInspect
Check current SLA status for all agents or a specific agent. Shows actual metric values vs thresholds, healthy/breached status, and error budget remaining. Automatically records breaches. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| agent_name | No | Optional: specific agent name. Leave empty for all agents. |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Discloses side effect 'Automatically records breaches' beyond annotations; requires authentication. Annotations show readOnlyHint=false consistent with recording, idempotentHint=true but side effect may be idempotent.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three concise sentences: purpose, what it shows, side effects/auth. No redundant information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers purpose, parameter usage, side effect, auth needs. Output schema exists so return values not needed.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% and description adds no extra parameter info beyond the schema's own description of agent_name. Baseline 3 applies.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states 'Check current SLA status' with verb and resource, and distinguishes from siblings like thinkneo_sla_breaches or thinkneo_sla_dashboard by focusing on current status.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Specifies usage for all agents or a specific agent, and that leaving empty queries all. However, no explicit when-not-to-use or comparison with alternatives, though implied by sibling names.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_start_traceAInspect
Start a new agent observability trace. Creates a session that tracks all tool calls, model calls, decisions, and errors for an AI agent run. Returns a session_id to use with thinkneo_log_event and thinkneo_end_trace. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| metadata | No | Optional dict with additional context (e.g., {"task": "email-draft", "user_id": "u123"}) | |
| agent_name | Yes | Name of the agent being traced (e.g., 'marketing-agent', 'support-bot') | |
| agent_type | No | Type of agent: 'assistant', 'autonomous', 'workflow', 'pipeline', or 'generic' | generic |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate this is not read-only and not idempotent, which the description aligns with by describing a creation action. The description adds valuable context beyond annotations: it specifies that authentication is required, describes what the trace tracks (tool calls, model calls, decisions, errors), and mentions the return value (session_id) for subsequent operations. This provides good behavioral insight for an agent.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is three concise sentences with zero waste: first states the purpose, second explains what it creates and tracks, third covers prerequisites and output usage. Every sentence earns its place by providing essential information without redundancy, and it's front-loaded with the core action.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given that this tool has annotations covering safety aspects, 100% schema description coverage, and an output schema (implied by 'Returns a session_id'), the description provides complete contextual information. It explains the tool's role in the observability system, what it tracks, authentication requirements, and how the output connects to other tools, making it fully adequate for agent understanding.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema fully documents all three parameters (agent_name, agent_type, metadata) with their purposes and formats. The description doesn't add any parameter-specific information beyond what's in the schema, which is acceptable given the comprehensive schema coverage. Baseline 3 is appropriate when the schema handles parameter documentation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Start a new agent observability trace'), the resource involved ('creates a session'), and the scope ('tracks all tool calls, model calls, decisions, and errors for an AI agent run'). It distinguishes itself from sibling tools like thinkneo_end_trace and thinkneo_log_event by being the initiation point for tracing sessions.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for when to use this tool: to begin tracking an AI agent run, with returns a session_id for use with thinkneo_log_event and thinkneo_end_trace. It doesn't explicitly state when not to use it or name alternatives, but the context is sufficiently clear for an agent to understand its role in the observability workflow.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_usageARead-onlyIdempotentInspect
Returns usage statistics for your ThinkNEO API key. Shows calls today, this week, this month, monthly limit, remaining calls, top tools used, estimated cost, and current tier. Works without authentication (returns general info).
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate read-only and idempotent behavior, but the description adds valuable context beyond this: it specifies that it works without authentication and returns general info, which clarifies access requirements and data scope. It also lists the specific metrics returned (e.g., calls today, top tools used), enhancing behavioral understanding. No contradictions with annotations are present.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is front-loaded with the core purpose in the first sentence, followed by specific details in a concise list. Every sentence adds value: the first states what it does, the second enumerates returned statistics, and the third provides authentication context. There is no wasted text, making it highly efficient and well-structured.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's simplicity (0 parameters, read-only/idempotent annotations, and an output schema), the description is complete. It explains the purpose, usage context, behavioral traits, and output details (e.g., statistics like top tools used and estimated cost), which aligns well with the structured data. No gaps are present for this low-complexity tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 0 parameters with 100% coverage, so the schema fully documents the lack of inputs. The description compensates by explaining that no parameters are needed and clarifies the tool's behavior (e.g., returns general info without authentication). This adds meaningful context beyond the schema, though it's not required for parameter documentation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose with specific verbs ('Returns usage statistics') and resources ('ThinkNEO API key'), and distinguishes it from siblings by focusing on usage metrics rather than checks, policies, budgets, or other functions. It explicitly lists the specific statistics returned, making the purpose highly specific and differentiated.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context on when to use this tool: for retrieving usage statistics, including details like calls, limits, costs, and tier. It mentions it 'Works without authentication,' which is useful guidance. However, it does not explicitly state when not to use it or name alternatives among the sibling tools, such as for budget or compliance checks, which prevents a perfect score.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_verification_dashboardARead-onlyIdempotentInspect
Aggregated outcome verification metrics — verification rates, failure patterns, agent reliability rankings, and daily trends. Shows how reliably your AI agents are delivering verified outcomes. 'Datadog for AI outcomes'. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| period | No | Time period: '24h', '7d', or '30d' | 7d |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate read-only and idempotent behavior. The description adds that authentication is required and describes the kind of data returned (rates, patterns, rankings, trends). This provides useful behavioral context beyond annotations. No contradiction.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences plus a tagline. Front-loaded with key metrics. No redundant information. Every sentence contributes meaning.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool has one optional parameter, an output schema, and strong annotations, the description is sufficient. It mentions authentication and summarizes the metrics. It does not explain the output schema details, but that is covered by the output schema itself.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with a single parameter 'period' having a clear description ('24h', '7d', or '30d'). The tool description does not add extra parameter information, but the schema already fully documents it. Baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it provides aggregated outcome verification metrics like rates, failure patterns, reliability rankings, and trends. The tagline 'Datadog for AI outcomes' reinforces purpose. It distinguishes itself from siblings like thinkneo_get_observability_dashboard (general) and thinkneo_a2a_audit (detailed audit) by focusing specifically on verification outcomes.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this tool versus alternatives such as thinkneo_get_observability_dashboard or thinkneo_a2a_audit. Lacks explicit context about prerequisites or when not to use it. The description implies use for verification metrics but does not help the agent choose among similar tools.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_verify_claimAIdempotentInspect
Trigger verification of a registered action claim. Runs the appropriate verification adapter (HTTP check, file check, database check, etc.) and returns the result with evidence. If already verified, returns cached result (use force=true to re-verify). Part of the Outcome Validation Loop — 'From Prompt to Proof'. Requires authentication.
| Name | Required | Description | Default |
|---|---|---|---|
| force | No | Force re-verification even if already verified/failed | |
| claim_id | Yes | UUID of the claim to verify (from thinkneo_register_claim) |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description clarifies behavioral traits beyond annotations: it runs adapters (side effect), caches results (idempotency), and requires authentication. This aligns with idempotentHint=true and readOnlyHint=false, adding value by explaining the underlying mechanism.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is four sentences long, each providing unique value: purpose, adapter/result behavior, caching with force, and validation loop context with auth requirement. It is front-loaded with the core purpose and avoids redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the presence of an output schema (not shown), the description adequately covers the main use case. It mentions the validation loop and adapter varieties. However, it does not specify whether verification failures are cached (requiring force to retry), which is a minor gap.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema already provides full coverage (100%) for both parameters. The description adds slight extra context by linking claim_id to register_claim and explaining the force parameter's purpose in re-verification, but does not introduce new semantic details.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description starts with 'Trigger verification of a registered action claim', which clearly states the action and target resource. It differentiates from sibling tools like thinkneo_register_claim (registration) and thinkneo_get_proof (result retrieval) by focusing on the verification trigger. Mentioning adapters (HTTP, file, database) further specifies the resource behavior.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides context on when to use the tool: after registering a claim, as part of the 'Outcome Validation Loop', and with authentication required. It also explains caching and the force parameter for re-verification. However, it does not explicitly compare to alternatives or state when not to use it.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
thinkneo_write_memoryAIdempotentInspect
Write or update a Claude Code project memory file (.md). Persists project context, user preferences, feedback, and reference notes across Claude Code sessions. Filename must end in .md with lowercase alphanumeric characters. Path traversal is blocked. Requires authentication.Use this to persist project context, user preferences, feedback, and reference notes across Claude Code sessions. The filename must end in .md and contain only lowercase letters, digits, underscores, and hyphens (e.g. 'user_fabio.md', 'project_new_feature.md'). Path traversal is blocked.
| Name | Required | Description | Default |
|---|---|---|---|
| content | Yes | Full markdown content to write to the file. | |
| filename | Yes | Name of the memory file to write (e.g. 'user_fabio.md', 'project_thinkneodo_droplet.md'). Must end in .md. |
Output Schema
| Name | Required | Description |
|---|---|---|
| result | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate readOnlyHint=false and idempotentHint=true, and the description adds valuable context beyond this: it specifies filename constraints ('must end in .md and contain only lowercase letters...'), mentions path traversal blocking, and clarifies that it handles both writing and updating. This enriches behavioral understanding without contradicting annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is front-loaded with the core purpose in the first sentence, followed by essential details like usage context, filename rules, and security note. Every sentence adds value without waste, making it efficient and well-structured for quick comprehension.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (write/update operation), rich annotations (readOnlyHint, idempotentHint), and the presence of an output schema, the description is complete. It covers purpose, usage, behavioral traits, and parameter context adequately, leaving no significant gaps for the agent to understand and invoke the tool correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents both parameters fully. The description adds minimal extra meaning by reinforcing the filename format with examples and noting that content is 'markdown', but this is largely redundant with the schema. Baseline 3 is appropriate as the schema carries the primary burden.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Write or update') and resource ('a Claude Code project memory file (.md)'), distinguishing it from sibling tools like 'thinkneo_read_memory'. It specifies the exact type of file and its purpose ('persist project context, user preferences, feedback, and reference notes across Claude Code sessions'), making the purpose unambiguous and distinct.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for when to use this tool ('to persist project context... across Claude Code sessions') and includes a sibling tool ('thinkneo_read_memory') that serves as an alternative for reading. However, it does not explicitly state when not to use it or compare it to other write-related tools, which slightly limits guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:
{
"$schema": "https://glama.ai/mcp/schemas/connector.json",
"maintainers": [{ "email": "your-email@example.com" }]
}The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.
Control your server's listing on Glama, including description and metadata
Access analytics and receive server usage reports
Get monitoring and health status updates for your server
Feature your server to boost visibility and reach more users
For users:
Full audit trail – every tool call is logged with inputs and outputs for compliance and debugging
Granular tool control – enable or disable individual tools per connector to limit what your AI agents can do
Centralized credential management – store and rotate API keys and OAuth tokens in one place
Change alerts – get notified when a connector changes its schema, adds or removes tools, or updates tool definitions, so nothing breaks silently
For server owners:
Proven adoption – public usage metrics on your listing show real-world traction and build trust with prospective users
Tool-level analytics – see which tools are being used most, helping you prioritize development and documentation
Direct user feedback – users can report issues and suggest improvements through the listing, giving you a channel you would not have otherwise
The connector status is unhealthy when Glama is unable to successfully connect to the server. This can happen for several reasons:
The server is experiencing an outage
The URL of the server is wrong
Credentials required to access the server are missing or invalid
If you are the owner of this MCP connector and would like to make modifications to the listing, including providing test credentials for accessing the server, please contact support@glama.ai.
Discussions
No comments yet. Be the first to start the discussion!
Your Connectors
Sign in to create a connector for this server.