Kopern

Server Details

AI Agent Builder, Orchestrator & Grader. Build, test, optimize and deploy AI agents from any MCP client. 32 tools: agent CRUD, template deployment, grading & AutoResearch optimization, multi-agent teams, pipelines, memory management, 5-channel deployment (widget, Slack, Telegram, WhatsApp, webhooks), OAuth connectors (email, calendar), usage analytics, EU AI Act compliance reports, and portable agent export/import

Status: Healthy
Last Tested: 2026-07-05 01:43
Transport: Streamable HTTP
URL

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client

Glama

MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.

Tool Definition Quality

A3.8/5.0

Tool DescriptionsA

Average 3.9/5 across 31 of 31 tools scored. Lowest: 3.2/5.

Server CoherenceA

Disambiguation4/5

Most tools have distinct purposes, but some overlap exists: 'kopern_create_agent' and 'kopern_deploy_template' both create agents, and 'kopern_grade_prompt' and 'kopern_run_grading' both involve grading, though with different scopes. The descriptions clarify the differences, but an agent might occasionally misselect between these pairs.

Naming Consistency5/5

All tools follow a consistent 'kopern_verb_noun' pattern with snake_case throughout. The verbs are clear and descriptive (e.g., create, get, list, run, connect), making the naming highly predictable and easy to understand.

Tool Count3/5

With 31 tools, the count is borderline high for a single server, suggesting potential complexity. However, given the broad scope of agent management, grading, and integrations, it's not unreasonable, though it may feel heavy for some use cases.

Completeness5/5

The tool set provides comprehensive coverage for AI agent lifecycle management, including creation, configuration, grading, deployment, integration, and deletion. There are no obvious gaps; it supports CRUD operations, testing, optimization, and various connectors, ensuring agents can be fully managed and evaluated.

Available Tools

31 tools

kopern_compliance_reportA

Read-only

Inspect

Generate an EU AI Act compliance report for an agent. Checks Art. 6 (risk), Art. 12 (audit trail), Art. 14 (human oversight), Art. 52 (transparency). No LLM cost.

ParametersJSON Schema

Name	Required	Description	Default
`agent_id`	Yes	The agent ID or name

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint=true, indicating a safe read operation. The description adds valuable context beyond annotations by specifying the compliance scope (EU AI Act articles checked) and noting 'No LLM cost,' which informs about cost implications not covered by annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose, followed by specific details (articles checked and cost note) in a single, efficient sentence with no wasted words, making it highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (compliance reporting), annotations cover safety (read-only), and schema fully documents the single parameter, the description adds useful context (scope and cost). However, without an output schema, it lacks details on return values (e.g., report format), leaving a minor gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with one parameter (agent_id) fully documented in the schema. The description does not add any additional meaning or details about the parameter beyond what the schema provides, so it meets the baseline of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Generate an EU AI Act compliance report') and resource ('for an agent'), and it distinguishes from sibling tools by focusing on compliance reporting rather than connection, creation, deletion, or other operations listed among siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context by specifying 'for an agent' and listing the articles checked (Art. 6, 12, 14, 52), but it does not explicitly state when to use this tool versus alternatives (e.g., other compliance or reporting tools) or provide any exclusions or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kopern_connect_calendarAInspect

Connect an agent to Google Calendar or Microsoft Calendar for scheduling tools (list_events, create_event, etc.). Requires OAuth in browser. Enables the service_calendar builtin tool.

ParametersJSON Schema

Name	Required	Description	Default
`agent_id`	Yes	The agent ID or name
`provider`	Yes	Calendar provider

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations only indicate readOnlyHint=false (implying mutation). The description adds valuable behavioral context beyond annotations: it discloses the OAuth requirement ('Requires OAuth in browser') and the downstream effect ('Enables the service_calendar builtin tool'), which are crucial for understanding tool behavior. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise and front-loaded: it states the core purpose in the first clause, adds critical behavioral constraints (OAuth requirement), and notes the enabling effect in just two sentences. Every sentence earns its place with essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given this is a 2-parameter mutation tool with no output schema, the description is reasonably complete: it covers purpose, prerequisites (OAuth), and downstream effects. However, it doesn't detail potential side effects, error conditions, or what 'connect' entails operationally (e.g., permissions scope, token storage), leaving some gaps for a mutation tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, providing clear documentation for both parameters (agent_id and provider with enum). The description adds minimal semantic context by mentioning 'agent' and 'Google Calendar or Microsoft Calendar', which loosely maps to parameters but doesn't provide additional syntax or format details beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Connect an agent to Google Calendar or Microsoft Calendar') and the resource involved ('agent', 'calendar'). It distinguishes this tool from siblings by specifying it's for scheduling tools like list_events and create_event, unlike other connect tools for email, slack, etc.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool ('for scheduling tools') and mentions a prerequisite ('Requires OAuth in browser'). However, it doesn't explicitly state when NOT to use it or name specific alternatives among the sibling tools for different purposes.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kopern_connect_emailAInspect

Connect an agent to Gmail or Outlook for email tools (read_emails, send_email, reply_email). Requires OAuth in browser. Enables the service_email builtin tool.

ParametersJSON Schema

Name	Required	Description	Default
`agent_id`	Yes	The agent ID or name
`provider`	Yes	Email provider

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable behavioral context beyond the annotations: it discloses the OAuth requirement (authentication need) and that it enables another tool (service_email). The annotations only indicate it's not read-only (readOnlyHint: false), so the description compensates by providing operational details. No contradiction with annotations exists.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded and efficient: two sentences with zero waste. The first sentence states the purpose and enabled tools, the second covers prerequisites and side effects. Every element adds necessary information without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (OAuth setup, enabling other tools) and lack of output schema, the description is reasonably complete. It covers the core purpose, prerequisites, and consequences. However, it doesn't detail error conditions or what happens post-connection, leaving some gaps for a setup tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters (agent_id and provider with enum). The description adds minimal semantic value by implying the provider connects to specific email services (Gmail/Outlook), but doesn't elaborate on parameter usage beyond what the schema provides. Baseline 3 is appropriate given high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Connect an agent to Gmail or Outlook') and resource ('email tools'), distinguishing it from sibling tools like connect_calendar or connect_slack by specifying its purpose for email functionality. It explicitly names the enabled tools (read_emails, send_email, reply_email), making the purpose highly specific and differentiated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool: it requires OAuth in browser and enables the service_email builtin tool. It implicitly distinguishes from alternatives by focusing on email providers (google/microsoft) rather than other communication platforms like Slack or WhatsApp, though it doesn't explicitly name when-not scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kopern_connect_slackA

Read-only

Inspect

Connect an agent to Slack. Returns an OAuth install URL to authorize in your browser (Slack requires interactive OAuth).

ParametersJSON Schema

Name	Required	Description	Default
`agent_id`	Yes	The agent ID or name

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true, suggesting a safe read operation. The description adds behavioral context by explaining that it returns an OAuth install URL for browser authorization, which is useful beyond the annotations. However, it does not detail potential side effects, rate limits, or authentication requirements, leaving some behavioral aspects uncovered.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose in the first sentence and adds essential behavioral detail in the second. Both sentences earn their place by providing clear, actionable information without redundancy or fluff, making it highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (OAuth flow), lack of output schema, and annotations covering safety, the description is mostly complete. It explains the purpose and key behavior (returns OAuth URL), but could improve by detailing the return format or error handling. However, it adequately covers the essential context for an agent to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with the single parameter 'agent_id' fully documented. The description does not add any parameter-specific semantics beyond what the schema provides, such as format examples or constraints. Baseline 3 is appropriate since the schema handles parameter documentation effectively.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Connect an agent to Slack') and resource ('agent'), distinguishing it from sibling tools like kopern_connect_calendar or kopern_connect_whatsapp by specifying the Slack platform. It uses a precise verb ('Connect') and identifies the target service.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool: for Slack authorization via OAuth. It mentions that Slack requires interactive OAuth, implying this is for browser-based setup. However, it does not explicitly state when not to use it or name alternatives among siblings, such as other connect_* tools for different platforms.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kopern_connect_telegramA

Idempotent

Inspect

Connect an agent to Telegram via a bot. Requires a bot token from @BotFather.

ParametersJSON Schema

Name	Required	Description	Default
`agent_id`	Yes	The agent ID or name
`bot_token`	Yes	Telegram bot token from @BotFather

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate this is not read-only (readOnlyHint: false) and is idempotent (idempotentHint: true). The description adds context about the required bot token, which is useful beyond annotations. However, it doesn't disclose other behavioral traits like potential side effects, authentication needs beyond the token, or rate limits. With annotations covering basic safety, the description adds some value but not rich behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences with zero waste: the first states the purpose, and the second provides a key prerequisite. It's front-loaded and appropriately sized, earning its place efficiently.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has annotations (readOnlyHint: false, idempotentHint: true) and 100% schema coverage, but no output schema, the description is somewhat complete. It covers the purpose and a prerequisite, but lacks details on what happens after connection (e.g., success response, error handling). For a mutation tool with no output schema, more context on expected outcomes would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with clear descriptions for both parameters (agent_id and bot_token). The description mentions the bot token requirement, which aligns with the schema but doesn't add significant meaning beyond it. Since the schema does the heavy lifting, the baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Connect an agent to Telegram via a bot') and specifies the resource (Telegram). It distinguishes from siblings like kopern_connect_slack or kopern_connect_whatsapp by specifying the platform. However, it doesn't explicitly differentiate from other 'connect' tools beyond mentioning Telegram, which is slightly less specific than ideal for a 5.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a prerequisite ('Requires a bot token from @BotFather'), which gives some context for when to use it. However, it doesn't explicitly state when to use this tool versus alternatives (e.g., other 'connect' tools for different platforms) or mention any exclusions. The guidance is implied rather than explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kopern_connect_webhookAInspect

Create an inbound or outbound webhook for an agent. Inbound: receive messages via HTTP POST. Outbound: send events to your URL (n8n, Zapier, Make compatible).

ParametersJSON Schema

Name	Required	Description
`name`	No	Webhook name
`type`	No	Webhook direction. Default: inbound
`events`	No	Events to subscribe to (outbound only): message_sent, tool_call_completed, session_ended, error
`secret`	No	HMAC secret for signature verification (optional)
`agent_id`	Yes	The agent ID or name
`target_url`	No	Target URL (required for outbound)

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate this is a non-readOnly, non-idempotent operation (create action). The description adds useful context about HTTP POST for inbound and event subscription details for outbound, but doesn't disclose important behavioral traits like authentication requirements, rate limits, error handling, or what happens if webhook creation fails. With annotations covering basic safety profile, the description adds moderate value.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly concise with two sentences that each earn their place: the first establishes the core purpose, the second explains the two variants with practical details. No wasted words, front-loaded information, and excellent structure.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a creation tool with no output schema, the description provides good context about what gets created (webhooks with specific directions) and compatibility notes. However, it doesn't explain what the tool returns upon success/failure or provide examples of typical use cases. Given the complexity of webhook configuration, some additional guidance would be helpful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the schema already documents all 6 parameters thoroughly. The description mentions inbound/outbound types and event compatibility but doesn't add significant semantic meaning beyond what's in the schema. The baseline score of 3 reflects adequate but not exceptional value addition.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Create an inbound or outbound webhook') and resource ('for an agent'), with explicit differentiation between inbound (receive messages) and outbound (send events). It distinguishes this from sibling tools like kopern_connect_slack or kopern_connect_email by focusing on webhook functionality rather than other integration types.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context about when to use each type (inbound for receiving messages, outbound for sending events to external services), including compatibility notes for n8n/Zapier/Make. However, it doesn't explicitly state when NOT to use this tool or mention alternatives among the sibling tools, which prevents a perfect score.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kopern_connect_whatsappA

Idempotent

Inspect

Connect an agent to WhatsApp Business. Requires Meta Cloud API credentials.

ParametersJSON Schema

Name	Required	Description
`agent_id`	Yes	The agent ID or name
`access_token`	Yes	WhatsApp Cloud API access token
`phone_number`	No	Display phone number (optional)
`verify_token`	No	Webhook verify token (optional)
`phone_number_id`	Yes	WhatsApp phone number ID (from Meta dashboard)

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate this is a non-read-only (readOnlyHint: false) and idempotent (idempotentHint: true) operation. The description adds that it 'requires Meta Cloud API credentials,' which provides useful authentication context beyond the annotations. However, it doesn't describe potential side effects, rate limits, or what 'connect' entails behaviorally.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise (two short sentences) with zero wasted words. It's front-loaded with the core purpose and follows with a critical prerequisite. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a connection/mutation tool with no output schema and minimal annotations, the description provides basic purpose and credential requirements. However, it lacks details about what 'connect' means operationally, what happens on success/failure, or how this differs from other connect tools. Given the complexity of connecting to WhatsApp Business, more context would be helpful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all 5 parameters. The description doesn't add any parameter-specific details beyond what's in the schema. According to guidelines, when schema coverage is high (>80%), the baseline score is 3 even with no param info in the description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Connect an agent to WhatsApp Business') and specifies the resource (WhatsApp Business). It distinguishes from siblings like kopern_connect_calendar or kopern_connect_slack by naming WhatsApp specifically, but doesn't explicitly differentiate beyond the platform name.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool ('Connect an agent to WhatsApp Business') and mentions a prerequisite ('Requires Meta Cloud API credentials'). However, it doesn't explicitly state when not to use it or name alternatives among sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kopern_connect_widgetA

Idempotent

Inspect

Enable the embeddable chat widget for an agent. Returns the embed code for your website.

ParametersJSON Schema

Name	Required	Description
`agent_id`	Yes	The agent ID or name
`position`	No	Widget position. Default: bottom-right
`allowed_origins`	No	Allowed website domains (CORS). Empty = all origins.
`welcome_message`	No	Greeting message shown in widget

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=false (mutation) and idempotentHint=true (safe to retry), which the description does not repeat. The description adds value by specifying the output ('Returns the <script> embed code'), but does not disclose other behavioral traits like rate limits, authentication needs, or side effects beyond the basic operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two concise sentences with zero waste: the first states the action and resource, the second specifies the return value. It is front-loaded with the core purpose and efficiently communicates essential information without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (mutation with idempotency) and lack of output schema, the description adequately covers the basic operation and return value. However, it lacks details on error conditions, side effects, or integration context, which would be helpful for an agent invoking this tool in practice.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so parameters are fully documented in the schema. The description does not add any additional meaning or context for the parameters (e.g., explaining how agent_id is used or what allowed_origins entails), relying entirely on the schema. Baseline score of 3 is appropriate given high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Enable the embeddable chat widget') and the resource ('for an agent'), distinguishing it from sibling tools that handle other connection types like calendar, email, or slack. It also specifies the return value ('Returns the <script> embed code for your website'), making the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It does not mention prerequisites (e.g., needing an existing agent), exclusions, or comparisons to sibling tools like kopern_connect_webhook or kopern_connect_telegram, leaving the agent to infer usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kopern_create_agentAInspect

Create a new AI agent with a system prompt, model, and optional skills. Returns the agentId.

ParametersJSON Schema

Name	Required	Description
`name`	Yes	Agent name
`model`	No	Model ID. Default: claude-sonnet-4-6
`domain`	No	Domain (e.g. 'customer_support', 'coding', 'other'). Default: other
`skills`	No	Optional skills (domain knowledge blocks)
`provider`	No	LLM provider. Default: anthropic
`description`	No	Short description
`builtin_tools`	No	Built-in tools to enable: web_fetch, memory, github_read, github_write, bug_management, datagouv, piste, service_email, service_calendar
`system_prompt`	Yes	The agent's system prompt

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate this is not read-only and not idempotent, which the description doesn't contradict. However, the description adds minimal behavioral context beyond annotations—it mentions the return value (agentId) but doesn't disclose permissions needed, rate limits, or what happens if creation fails. With annotations covering basic safety, it adds some value but lacks depth.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core action and key parameters. It avoids redundancy but could be slightly more structured by separating optional elements or mentioning defaults. Every word earns its place, making it clear and direct.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (8 parameters, creation operation) and lack of output schema, the description is somewhat incomplete. It mentions the return value but doesn't cover error cases, side effects, or dependencies. With annotations providing basic hints and schema covering parameters, it's adequate but leaves gaps for a mutation tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so parameters are well-documented in the schema. The description adds no additional parameter semantics beyond implying that system prompt, model, and skills are key components. It doesn't explain parameter interactions or provide examples, so it meets the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Create a new AI agent') and specifies key components (system prompt, model, optional skills), distinguishing it from sibling tools like kopern_get_agent, kopern_update_agent, and kopern_delete_agent. It explicitly mentions the return value (agentId), which is unique to creation operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when creating an agent, but provides no explicit guidance on when to use this tool versus alternatives like kopern_import_agent or kopern_deploy_template. It mentions optional skills but doesn't clarify prerequisites or constraints for successful agent creation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kopern_create_grading_suiteBInspect

Create a grading suite with test cases on an agent. Each case has an input prompt and expected behavior for evaluation.

ParametersJSON Schema

Name	Required	Description
`name`	No	Suite name (optional)
`cases`	Yes	Test cases
`agent_id`	Yes	The agent ID or name
`description`	No	Suite description (optional)

Tool Definition Quality

B3.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate this is a non-readOnly, non-idempotent operation (readOnlyHint=false, idempotentHint=false), implying it creates new resources and may have side effects. The description adds that it creates a grading suite with test cases, which aligns with annotations but doesn't provide additional behavioral details like permissions needed, rate limits, or what happens on duplicate suite names. No contradiction with annotations exists.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, clear sentence that efficiently conveys the core action and components. It's front-loaded with the main purpose and avoids unnecessary details, though it could be slightly more structured by explicitly mentioning optional parameters or linking to sibling tools.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (creating a suite with multiple test cases), annotations cover safety (non-readOnly), and schema fully describes inputs, but there's no output schema. The description lacks details on return values, error conditions, or how the suite integrates with other grading tools. It's minimally adequate but leaves gaps for an agent to understand full context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all parameters (agent_id, cases, name, description) and their nested properties. The description mentions 'cases' with input and expected behavior, which mirrors the schema but doesn't add extra meaning or clarify nuances like case naming conventions. Baseline 3 is appropriate as the schema handles parameter documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Create a grading suite with test cases') and the target ('on an agent'), specifying that each case includes an input prompt and expected behavior. It distinguishes from siblings like 'kopern_grade_prompt' or 'kopern_run_grading' by focusing on suite creation rather than execution or grading. However, it doesn't explicitly differentiate from 'kopern_create_agent' or 'kopern_create_pipeline' beyond the resource type.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'kopern_grade_prompt' (for single prompts) or 'kopern_run_grading' (for executing grading). It mentions the tool's function but offers no context about prerequisites, timing, or comparisons with sibling tools, leaving the agent to infer usage scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kopern_create_pipelineBInspect

Create a multi-step pipeline on an agent. Steps chain agents sequentially with configurable input mapping.

ParametersJSON Schema

Name	Required	Description
`name`	Yes	Pipeline name
`steps`	Yes	Pipeline steps
`agent_id`	Yes	The parent agent ID
`description`	No	Pipeline description

Tool Definition Quality

B3.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate this is a non-read-only and non-idempotent operation (readOnlyHint: false, idempotentHint: false), which the description aligns with by implying creation of a new pipeline. The description adds some behavioral context by mentioning 'configurable input mapping', but it does not disclose other traits like error handling, permissions needed, or rate limits, which are not covered by annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that efficiently conveys the core functionality without redundancy. It is front-loaded with the main action and includes essential details about step chaining and input mapping, making it easy to understand quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of creating a pipeline with multiple steps and configurable inputs, the description is somewhat complete but lacks details on output, error handling, or dependencies. With no output schema and annotations only covering read-only and idempotency hints, the description should ideally provide more context about what happens after creation, such as success indicators or pipeline state.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the input schema fully documents all parameters. The description adds minimal value beyond the schema by hinting at 'configurable input mapping', which relates to the 'input_mapping' parameter, but it does not provide additional semantics or examples. This meets the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Create a multi-step pipeline on an agent') and the mechanism ('Steps chain agents sequentially with configurable input mapping'), which is specific and informative. However, it does not explicitly differentiate this tool from its sibling 'kopern_run_pipeline', which might cause confusion about when to create versus run a pipeline.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, such as 'kopern_run_pipeline' or 'kopern_create_agent'. It lacks context about prerequisites, typical use cases, or exclusions, leaving the agent to infer usage from the tool name and parameters alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kopern_create_teamBInspect

Create a multi-agent team. Agents work together in parallel, sequential (chain), or conditional (router) mode.

ParametersJSON Schema

Name	Required	Description
`name`	Yes	Team name
`agents`	Yes	Team members
`description`	No	Team description
`execution_mode`	No	How agents collaborate. Default: sequential

Tool Definition Quality

B3.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate this is a non-read-only and non-idempotent tool, implying it performs a mutable creation operation. The description adds context by specifying execution modes (parallel, sequential, conditional), which helps understand team behavior. However, it lacks details on permissions, rate limits, or error handling, leaving gaps in behavioral understanding despite the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's purpose and key behavioral aspect (execution modes). It is front-loaded with the main action and avoids unnecessary details, making it highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of an output schema and annotations covering only basic hints, the description should provide more context on what happens after creation (e.g., team ID, status). It adequately covers the creation action and modes but falls short in explaining outcomes or integration with sibling tools, leaving room for improvement in completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the input schema already documents all parameters thoroughly. The description adds minimal value by mentioning execution modes, which are covered in the schema's enum for 'execution_mode'. No additional semantic insights beyond the schema are provided, meeting the baseline for high coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Create a multi-agent team') and specifies the resource ('team'), making the purpose evident. However, it does not explicitly differentiate from sibling tools like 'kopern_create_agent' or 'kopern_run_team', which could cause confusion about when to use each tool for team-related operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, such as 'kopern_create_agent' for individual agents or 'kopern_run_team' for executing an existing team. It mentions execution modes but does not clarify prerequisites, dependencies, or contextual triggers for choosing this tool over others.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kopern_delete_agentA

Destructive

Inspect

Permanently delete an agent and all its data (skills, tools, grading suites, sessions, connectors).

ParametersJSON Schema

Name	Required	Description	Default
`agent_id`	Yes	The agent ID to delete

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide destructiveHint=true and readOnlyHint=false, indicating a non-read, destructive operation. The description adds valuable context beyond annotations by specifying what gets destroyed ('all its data' with examples like skills, tools) and emphasizing permanence ('permanently delete'), which helps the agent understand severity. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the key action ('permanently delete') and includes essential details without waste. Every part (action, target, scope of deletion) earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a destructive tool with no output schema, the description is reasonably complete: it covers purpose, scope of deletion, and behavioral context. However, it lacks details on error conditions, confirmation steps, or return values, which could be helpful given the high-stakes nature of deletion.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with one parameter (agent_id) fully documented in the schema. The description doesn't add any parameter-specific details beyond what the schema provides (e.g., format or validation rules for agent_id), so it meets the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('permanently delete') and the target resource ('an agent and all its data'), with explicit listing of what data is included (skills, tools, grading suites, sessions, connectors). It distinguishes from siblings like kopern_get_agent (read) and kopern_update_agent (modify).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context through 'permanently delete' and the list of data removed, suggesting this is for complete removal. However, it doesn't explicitly state when to use this vs. alternatives (e.g., kopern_export_agent for backup first) or prerequisites (e.g., ensure agent isn't in use).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kopern_deploy_templateAInspect

Deploy an agent from a template (28 general + 9 vertical). Creates agent + skills + tools + grading suite in one shot. Use kopern_list_templates to see available slugs.

ParametersJSON Schema

Name	Required	Description	Default
`slug`	Yes	Template slug (from kopern_list_templates)
`answers`	No	Onboarding answers to personalize the template (e.g. { businessName: 'Plomberie Dupont', zone: 'Paris 12-15' })

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate this is a non-readOnly, non-idempotent operation (mutation with potential side effects). The description adds valuable context beyond annotations by specifying what gets created ('agent + skills + tools + grading suite in one shot') and the template scope ('28 general + 9 vertical'), though it doesn't mention authentication needs, rate limits, or error conditions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly front-loaded with the core purpose in the first sentence, followed by a practical usage note. Both sentences earn their place by providing essential information without any wasted words or redundant explanations.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no output schema, the description provides good context about what gets created and template scope. However, it doesn't describe the response format, success/failure conditions, or potential side effects beyond the creation statement, leaving some behavioral aspects unspecified.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the schema already fully documents both parameters. The description adds minimal value beyond the schema by mentioning template slugs come from kopern_list_templates and giving an example of answers format, but doesn't provide additional syntax, constraints, or usage details for the parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Deploy an agent from a template') and resource ('agent + skills + tools + grading suite'), with explicit scope details ('28 general + 9 vertical' templates). It distinguishes from sibling tools by referencing kopern_list_templates for available slugs, differentiating it from other agent-related tools like kopern_create_agent or kopern_import_agent.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool ('Deploy an agent from a template') and when to use an alternative ('Use kopern_list_templates to see available slugs'), creating a clear workflow. It also implies this is for comprehensive agent creation versus simpler alternatives like kopern_create_agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kopern_export_agentA

Read-only

Inspect

Export an agent as a portable JSON object (agent config, skills, tools, extensions, grading suites with cases). Use kopern_import_agent to re-import. No LLM cost.

ParametersJSON Schema

Name	Required	Description	Default
`agent_id`	Yes	The agent ID to export

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The annotations already provide readOnlyHint=true, indicating this is a safe read operation. The description adds valuable behavioral context beyond annotations by specifying the export format ('portable JSON object'), what components are included, and the 'No LLM cost' constraint. However, it doesn't mention potential limitations like export size or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely efficient with three short sentences that each serve distinct purposes: stating the core functionality, providing usage guidance, and disclosing a key constraint. There's zero wasted text, and the most important information appears first.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a single-parameter read operation with readOnlyHint annotation, the description provides excellent context about what gets exported and the complementary import tool. The main gap is the lack of output schema, so the agent doesn't know the structure of the returned JSON object, but the description compensates well by specifying what components are included.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the input schema fully documents the single required parameter (agent_id). The description doesn't add any parameter-specific information beyond what's already in the schema, but the baseline score of 3 is appropriate when schema coverage is complete.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Export an agent as a portable JSON object') and resource ('agent'), with explicit details about what's included (agent config, skills, tools, extensions, grading suites with cases). It distinguishes from sibling tools by naming the complementary import tool (kopern_import_agent).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool ('Export an agent as a portable JSON object') and when to use an alternative ('Use kopern_import_agent to re-import'), creating a clear usage context. It also mentions a key constraint ('No LLM cost') that helps determine appropriate usage scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kopern_get_agentA

Read-only

Inspect

Get full details of an agent: system prompt, model, skills count, tools count, grading suites count.

ParametersJSON Schema

Name	Required	Description	Default
`agent_id`	Yes	The agent ID or name

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The annotations already declare readOnlyHint=true, indicating a safe read operation. The description adds value by specifying the types of details returned (e.g., system prompt, counts), which helps the agent understand the output format. However, it does not disclose additional behavioral traits like error handling, rate limits, or authentication needs beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the purpose ('Get full details of an agent') and lists specific details without unnecessary words. Every element earns its place by clarifying the scope of information returned.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple read operation with one parameter and readOnlyHint annotation, the description is mostly complete. It specifies the details returned, compensating for the lack of an output schema. However, it could be more complete by mentioning potential errors (e.g., if agent_id is invalid) or response structure.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with the single parameter 'agent_id' documented as 'The agent ID or name'. The description does not add further semantic details about the parameter, such as format examples or constraints. Baseline score of 3 is appropriate given high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Get full details') and the resource ('an agent'), specifying what details are returned (system prompt, model, skills count, tools count, grading suites count). It distinguishes from sibling tools like 'kopern_list_agents' (which likely lists agents without details) and 'kopern_update_agent' (which modifies agents).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when detailed information about a specific agent is needed, but does not explicitly state when to use this tool versus alternatives like 'kopern_list_agents' or 'kopern_get_session'. No exclusions or prerequisites are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kopern_get_grading_resultsA

Read-only

Inspect

Get detailed results of a grading run: per-case scores, agent outputs, criteria evaluations, improvement notes. No LLM cost.

ParametersJSON Schema

Name	Required	Description
`run_id`	Yes	The grading run ID
`agent_id`	Yes	The agent ID or name
`suite_id`	Yes	The grading suite ID

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The annotation 'readOnlyHint: true' already indicates it's a safe read operation. The description adds valuable behavioral context by specifying the types of results returned (e.g., 'per-case scores, agent outputs') and noting 'No LLM cost', which informs about cost implications not covered by annotations. It does not disclose rate limits, authentication needs, or pagination behavior, but adds meaningful details beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that efficiently conveys purpose, details of results, and a key behavioral note ('No LLM cost'). It is front-loaded with the main action and avoids unnecessary words, making it highly concise and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (retrieving detailed grading results), the description provides good context on what results to expect, complemented by annotations indicating read-only safety. However, there is no output schema, so the description doesn't fully explain return values (e.g., format, structure). It adequately covers the tool's purpose and key constraints but could benefit from more detail on output behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with all three parameters ('run_id', 'agent_id', 'suite_id') documented in the schema. The description does not add any parameter-specific semantics beyond what the schema provides (e.g., it doesn't explain relationships between parameters). Baseline score of 3 is appropriate as the schema fully covers parameter descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Get detailed results') and resource ('of a grading run'), with explicit details about what results are included ('per-case scores, agent outputs, criteria evaluations, improvement notes'). It distinguishes from siblings like 'kopern_list_grading_runs' (which likely lists runs rather than details) and 'kopern_grade_prompt' (which initiates grading).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when detailed grading results are needed, but does not explicitly state when to use this tool versus alternatives. It mentions 'No LLM cost' as a benefit, which provides some context, but lacks guidance on prerequisites (e.g., needing a completed grading run) or exclusions compared to other tools like 'kopern_get_agent' or 'kopern_list_grading_runs'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kopern_get_sessionA

Read-only

Inspect

Get full details of a session including message events, tool calls, and metrics. No LLM cost.

ParametersJSON Schema

Name	Required	Description	Default
`agent_id`	Yes	The agent ID or name
`session_id`	Yes	The session ID

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint=true, indicating it's a safe read operation. The description adds value by disclosing behavioral traits beyond annotations: it specifies what details are included (message events, tool calls, metrics) and notes 'No LLM cost', which is useful context for cost considerations. However, it lacks details on rate limits or error handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose in the first clause and adds a useful behavioral note in the second. It is concise with two sentences, zero waste, and efficiently communicates essential information without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (read-only with two parameters) and annotations covering safety, the description is mostly complete. It adds context on included details and cost, but lacks an output schema, so return values are undocumented. For a read tool with good annotations, this is sufficient but not fully comprehensive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with clear descriptions for agent_id and session_id. The description does not add meaning beyond the schema, as it doesn't explain parameter usage or constraints. With high schema coverage, the baseline score of 3 is appropriate, as the schema carries the parameter documentation burden.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Get' and the resource 'full details of a session', specifying the scope with 'including message events, tool calls, and metrics'. It distinguishes from sibling tools like kopern_list_sessions (which likely lists sessions) and kopern_get_agent (which gets agent details), making the purpose specific and differentiated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage by mentioning 'No LLM cost', suggesting it's a low-cost operation, but does not explicitly state when to use this tool versus alternatives like kopern_list_sessions or provide context on prerequisites. The guidance is implied rather than explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kopern_get_usageA

Read-only

Inspect

Get token usage and cost metrics. Shows input/output tokens, cost, grading runs, and per-agent breakdown. No LLM cost.

ParametersJSON Schema

Name	Required	Description	Default
`year_month`	No	Period in YYYY-MM format. Default: current month
`include_history`	No	Include last 6 months history. Default: false

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true, and the description aligns by describing a read operation ('Get'). It adds valuable context beyond annotations by specifying what metrics are included (input/output tokens, cost, grading runs, per-agent breakdown) and exclusions ('No LLM cost'), enhancing behavioral understanding without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose in the first sentence, followed by specific details and exclusions. Every sentence earns its place by adding clarity without redundancy, making it efficient and well-structured for quick comprehension.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (2 parameters, no output schema), the description is largely complete. It covers purpose, scope, and exclusions, but could benefit from more details on output format or error handling. With annotations covering safety, it provides adequate context for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents both parameters. The description does not add any parameter-specific semantics beyond what the schema provides, such as explaining how 'year_month' affects data retrieval or the impact of 'include_history'. Baseline 3 is appropriate as the schema handles parameter documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Get token usage and cost metrics') and resource (token usage/cost data), with precise scope details ('input/output tokens, cost, grading runs, and per-agent breakdown'). It distinguishes from siblings by specifying 'No LLM cost,' which is unique among tools focused on usage reporting.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool (to retrieve token usage and cost metrics), but does not explicitly mention when not to use it or name alternatives among sibling tools. It implies usage for monitoring purposes without specifying exclusions or comparisons to other tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kopern_grade_promptA

Read-only

Inspect

Grade a system prompt against inline test cases. Uses 6 criteria types (output_match, schema_validation, tool_usage, safety_check, custom_script, llm_judge). Returns score 0-1. Uses YOUR API keys.

ParametersJSON Schema

Name	Required	Description
`model`	No	Model ID. Default: provider default
`provider`	No	LLM provider. Default: anthropic
`test_cases`	Yes	Test cases: { name, input, expected }
`system_prompt`	Yes	The system prompt to evaluate

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true and openWorldHint=true, covering safety and scope. The description adds valuable context beyond annotations: it discloses that the tool uses external API keys (implying authentication needs), specifies the 6 grading criteria types, and mentions the return score range (0-1). No contradictions with annotations exist.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded and highly concise, with three sentences that efficiently convey purpose, criteria, and key behavioral details (score range and API key usage). Every sentence adds essential information without redundancy or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (grading with 6 criteria types) and lack of output schema, the description provides good context: it explains the grading purpose, criteria, and score output. However, it does not detail the grading process, error handling, or how criteria are applied, leaving some gaps for a tool with no output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, providing detailed parameter documentation. The description adds minimal semantic value beyond the schema, as it does not explain parameter interactions or usage nuances. With high schema coverage, the baseline score of 3 is appropriate, as the description does not significantly enhance parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('grade a system prompt') and resources ('against inline test cases'), and distinguishes it from siblings by focusing on prompt evaluation rather than connection, creation, or management operations. It specifies the 6 criteria types used for grading.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context through 'grade a system prompt against inline test cases' and mentions 'Uses YOUR API keys,' which suggests external dependencies. However, it lacks explicit guidance on when to use this tool versus alternatives like 'kopern_run_grading' or 'kopern_create_grading_suite,' and does not specify prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kopern_import_agentAInspect

Import an agent from a Kopern export JSON. Creates a new agent with all skills, tools, extensions, and grading suites. No LLM cost.

ParametersJSON Schema

Name	Required	Description	Default
`data`	Yes	The full Kopern agent export JSON (from kopern_export_agent)

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=false and idempotentHint=false, covering mutation and non-idempotency. The description adds valuable context beyond annotations: it specifies what gets created (skills, tools, etc.), mentions 'No LLM cost' (a behavioral trait not in annotations), and implies creation of a new agent (aligning with annotations). No contradiction exists.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose in the first sentence, followed by specifics and a cost note. Every sentence earns its place: the first defines the action, the second details the creation scope, and the third adds behavioral context ('No LLM cost'). No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given one parameter with full schema coverage, annotations covering mutation/non-idempotency, and no output schema, the description is mostly complete. It adds creation details and cost information, but lacks output expectations or error handling context, which would be helpful for a mutation tool without output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with the schema fully documenting the single 'data' parameter as 'The full Kopern agent export JSON (from kopern_export_agent)'. The description adds minimal semantics beyond this, only referencing the export source. Baseline 3 is appropriate as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Import an agent') with the resource ('from a Kopern export JSON') and details what gets created ('new agent with all skills, tools, extensions, and grading suites'). It distinguishes from siblings like kopern_export_agent (export vs. import) and kopern_create_agent (import vs. manual creation).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context by specifying the source ('Kopern export JSON') and referencing kopern_export_agent, but doesn't explicitly state when to use this versus alternatives like kopern_create_agent or when not to use it (e.g., for partial imports). It provides clear prerequisites but lacks explicit exclusions or comparisons.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kopern_list_agentsA

Read-only

Inspect

List all your Kopern agents (name, description, model, domain, grading score). No LLM cost.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint=true, indicating a safe read operation. The description adds value beyond annotations by specifying the exact fields returned and noting 'No LLM cost,' which informs about cost implications not covered by annotations. However, it doesn't disclose other behavioral traits like rate limits, pagination, or error conditions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose in a single, efficient sentence, and the second sentence adds valuable operational detail ('No LLM cost') without redundancy. Every word serves a clear purpose, making it highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (0 parameters, read-only operation), annotations cover safety, and the description specifies output fields and cost. However, without an output schema, the description doesn't detail the return format (e.g., array structure), leaving a minor gap in completeness for an agent invoking the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0 parameters and 100% schema description coverage, the baseline is high. The description compensates by implicitly confirming no inputs are needed for listing all agents, which aligns with the empty schema. No additional parameter details are required, making this sufficient.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('List') and resource ('Kopern agents'), specifies the exact fields returned (name, description, model, domain, grading score), and distinguishes it from sibling tools like 'kopern_get_agent' by indicating it lists all agents rather than retrieving a specific one. The phrase 'No LLM cost' adds further specificity about operational characteristics.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context by stating it lists 'all your Kopern agents,' suggesting it's for retrieving a comprehensive overview rather than details of a single agent. However, it doesn't explicitly state when to use this tool versus alternatives like 'kopern_get_agent' or provide exclusions, though the sibling tool names help differentiate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kopern_list_grading_runsA

Read-only

Inspect

List grading runs for a suite. Shows score history, pass rates, and versions over time. No LLM cost.

ParametersJSON Schema

Name	Required	Description	Default
`agent_id`	Yes	The agent ID or name
`suite_id`	Yes	The grading suite ID

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint=true, indicating a safe read operation. The description adds valuable behavioral context beyond this: it specifies what data is returned (score history, pass rates, versions over time) and explicitly states 'No LLM cost,' which informs about cost implications not covered by annotations. No contradiction with annotations exists.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise and front-loaded: the first clause states the core purpose, followed by specific details on data shown and cost benefit. Every sentence earns its place with no wasted words, making it easy for an AI agent to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (list operation with 2 parameters), annotations cover safety (read-only), and the description adds key behavioral details (data returned, cost). However, without an output schema, the description could benefit from more specifics on return format (e.g., pagination, structure). It's mostly complete but has minor gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with both parameters (agent_id, suite_id) fully documented in the schema. The description doesn't add any parameter-specific semantics beyond what's in the schema, such as format examples or constraints. Baseline score of 3 is appropriate given high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'List grading runs for a suite' specifies the verb (list) and resource (grading runs), with additional detail on what information is shown (score history, pass rates, versions over time). It distinguishes from siblings like kopern_get_grading_results by focusing on historical runs rather than current results, but doesn't explicitly name alternatives.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context through 'for a suite' and mentions 'No LLM cost' as a benefit, suggesting this is a low-cost operation. However, it doesn't provide explicit guidance on when to use this tool versus alternatives like kopern_get_grading_results or kopern_run_grading, nor does it mention prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kopern_list_sessionsA

Read-only

Inspect

List conversation sessions for an agent. Shows purpose, source, token usage, cost, timestamps. No LLM cost.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No	Max sessions to return (1-50). Default: 20
`agent_id`	Yes	The agent ID or name

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint=true, indicating this is a safe read operation. The description adds value by specifying what fields are returned (purpose, source, token usage, cost, timestamps) and clarifying 'No LLM cost'—useful context not covered by annotations. However, it doesn't disclose behavioral traits like pagination, sorting, default ordering, or error conditions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose and key details. Every word earns its place: 'List conversation sessions for an agent' establishes the action, followed by specific fields shown and the 'No LLM cost' clarification. No redundant or vague phrasing.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a read-only list tool with good annotations (readOnlyHint) and full schema coverage, the description is reasonably complete. It specifies the resource scope (agent sessions) and output fields. However, without an output schema, it could benefit from more detail on return structure (e.g., array format, field definitions) or behavioral context like pagination.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with clear documentation for both parameters (agent_id and limit). The description doesn't add any parameter-specific semantics beyond what's in the schema—it doesn't explain format for agent_id or implications of the limit. Baseline 3 is appropriate since the schema fully covers parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('List') and resource ('conversation sessions for an agent'), specifying what fields are shown (purpose, source, token usage, cost, timestamps). It distinguishes from sibling 'kopern_get_session' by indicating this lists multiple sessions rather than retrieving a single one. However, it doesn't explicitly differentiate from other list tools like 'kopern_list_agents' or 'kopern_list_templates' beyond the resource type.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage by specifying it's for listing sessions for an agent, but doesn't provide explicit guidance on when to use this versus alternatives like 'kopern_get_session' (for single session details) or 'kopern_get_usage' (which might overlap in cost/token reporting). No when-not-to-use scenarios or prerequisites are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kopern_list_templatesA

Read-only

Inspect

List all 37 AI agent templates (28 general + 9 vertical/business). Returns slug, title, domain, tagline. No LLM cost.

ParametersJSON Schema

Name	Required	Description	Default
`category`	No	Filter by category. Default: all

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable behavioral context beyond the readOnlyHint annotation: it specifies the exact count of items returned (37 templates), breaks down the composition (28 general + 9 vertical), describes the return fields (slug, title, domain, tagline), and explicitly states 'No LLM cost' - which is important operational information not captured in annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise and well-structured: a single sentence contains the core action, resource scope, return format, and cost information. Every element serves a clear purpose with zero wasted words, and the most critical information (what it lists and what it returns) is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple read-only listing tool with good annotations and a well-documented single parameter, the description provides excellent context: it specifies the exact resource count and composition, return fields, and cost implications. The main gap is the lack of output schema, but the description compensates by explicitly stating what fields are returned.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage and only one optional parameter with clear enum values documented in the schema, the description doesn't need to add parameter semantics. It mentions filtering by category implicitly through the template breakdown, but doesn't provide additional parameter context beyond what the schema already documents adequately.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('List all 37 AI agent templates') and resource ('templates'), with precise scope details (28 general + 9 vertical/business templates). It effectively distinguishes this from sibling tools like kopern_list_agents or kopern_list_sessions by specifying it's about templates rather than agents or sessions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context by mentioning the template categories and what's returned, but doesn't explicitly state when to use this tool versus alternatives. No guidance is provided about prerequisites, timing, or comparisons with other list operations in the sibling set.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kopern_manage_memoryAInspect

Manage an agent's persistent memory. Actions: remember (save key-value), recall (search by query), forget (delete by key), list (all memories). No LLM cost.

ParametersJSON Schema

Name	Required	Description
`key`	No	Memory key (for remember/forget)
`query`	No	Search query (for recall)
`value`	No	Memory value (for remember)
`action`	Yes	Memory action
`agent_id`	Yes	The agent ID or name
`category`	No	Memory category (for remember). Default: custom

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The annotation (readOnlyHint: false) already indicates this is not a read-only operation. The description adds useful context by specifying the four distinct actions and noting 'No LLM cost,' but doesn't disclose other behavioral traits like authentication requirements, rate limits, or what happens when operations fail. No contradiction with annotations exists.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly concise and front-loaded: it starts with the core purpose, lists all actions with their functions, and ends with a relevant operational note. Every sentence earns its place with zero wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (6 parameters, 4 distinct actions) and the absence of an output schema, the description provides good coverage of what the tool does. However, it doesn't explain return values or error conditions, which would be helpful since no output schema exists. The 100% schema coverage and clear action enumeration compensate somewhat.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the input schema already documents all 6 parameters thoroughly. The description adds minimal value beyond the schema by mentioning the four action types, but doesn't provide additional semantic context about parameter usage or relationships beyond what's already in the structured fields.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs and resources: 'Manage an agent's persistent memory' followed by detailed action enumeration (remember, recall, forget, list). It distinguishes itself from sibling tools by focusing on memory management rather than connectivity, reporting, or agent lifecycle operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context about when to use each action (save key-value, search by query, delete by key, list all memories) and mentions 'No LLM cost' as a relevant consideration. However, it doesn't explicitly state when NOT to use this tool or name specific alternatives among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kopern_run_autoresearchAInspect

Run AutoTune optimization on an agent. Iteratively mutates the system prompt, re-grades, and keeps improvements. Returns the optimized score. Uses YOUR API keys. Can take several minutes.

ParametersJSON Schema

Name	Required	Description
`agent_id`	Yes	The agent ID or name
`suite_id`	Yes	The grading suite ID to optimize against
`target_score`	No	Stop when this score is reached (0-1). Optional
`max_iterations`	No	Max optimization iterations (1-20). Default: 5
`max_token_budget`	No	Max total tokens to spend. Optional

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=false and openWorldHint=true, which the description does not contradict. The description adds valuable behavioral context beyond annotations: it discloses that the tool 'iteratively mutates the system prompt' (a destructive action), 'Uses YOUR API keys' (authentication needs), 'Can take several minutes' (performance/rate limit implication), and returns 'the optimized score.' This compensates well for the lack of output schema.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized with three sentences that are front-loaded with the core purpose. Each sentence adds value: the first explains the action and process, the second states the return value, and the third provides critical behavioral context (API keys and duration). There is minimal waste, though it could be slightly more structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (iterative optimization with mutations), lack of output schema, and annotations that only cover read/write and open-world hints, the description does a good job of completeness. It explains the process, return value, authentication, and performance implications. However, it could mention error handling or specific constraints on mutations to be fully complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all 5 parameters. The description does not add any parameter-specific semantics beyond what the schema provides (e.g., it doesn't explain the format of agent_id or suite_id). However, it implies the iterative nature relates to parameters like max_iterations, but this is already clear from the schema. Baseline 3 is appropriate given high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Run AutoTune optimization'), target resource ('on an agent'), and process details ('iteratively mutates the system prompt, re-grades, and keeps improvements'). It distinguishes this tool from all sibling tools which involve connection, creation, deletion, listing, or other operations, not iterative optimization.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context by mentioning 'Uses YOUR API keys' and 'Can take several minutes,' which suggests when to consider this tool. However, it does not explicitly state when to use it versus alternatives like 'kopern_grade_prompt' or 'kopern_update_agent,' nor does it provide exclusions or prerequisites beyond the implied time and resource considerations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kopern_run_gradingAInspect

Run a grading suite on an agent. Executes all test cases, evaluates with configured criteria, returns detailed scores. Uses YOUR API keys.

ParametersJSON Schema

Name	Required	Description	Default
`agent_id`	Yes	The agent ID or name
`suite_id`	Yes	The grading suite ID

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate this is not read-only (readOnlyHint: false) and is open-world (openWorldHint: true), which the description aligns with by describing an execution/evaluation action. The description adds valuable context beyond annotations: it specifies that it runs 'all test cases', uses 'configured criteria', returns 'detailed scores', and importantly discloses 'Uses YOUR API keys' (implying authentication needs). No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that efficiently conveys the tool's purpose, behavior, and key operational detail ('Uses YOUR API keys'). Every part adds value with zero waste, making it easy to parse and understand quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (executing grading suites with evaluation), annotations cover safety and world hints, and schema fully documents parameters. The description adds important behavioral context (test execution, criteria, scoring, API key usage). However, without an output schema, it could benefit from more detail on return format (e.g., score structure), though 'detailed scores' provides some indication.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, clearly documenting both required parameters (agent_id, suite_id). The description does not add any parameter-specific details beyond what the schema provides, such as format examples or constraints. With high schema coverage, the baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Run a grading suite'), target resource ('on an agent'), and scope ('all test cases, evaluates with configured criteria, returns detailed scores'), distinguishing it from siblings like kopern_grade_prompt or kopern_get_grading_results. It uses precise verbs and specifies the resource involved.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for executing grading tests on agents, but provides no explicit guidance on when to use this tool versus alternatives like kopern_grade_prompt or kopern_get_grading_results. It mentions 'Uses YOUR API keys' which hints at prerequisites, but lacks clear when/when-not instructions or sibling comparisons.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kopern_run_pipelineAInspect

Execute a pipeline on a prompt. Steps run sequentially, each feeding its output to the next. Uses YOUR API keys.

ParametersJSON Schema

Name	Required	Description
`prompt`	Yes	The input prompt
`agent_id`	Yes	The parent agent ID
`pipeline_id`	Yes	The pipeline ID

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=false (mutation) and openWorldHint=true (external effects), which the description aligns with by implying execution and external API usage. The description adds context about sequential step execution and API key usage, but does not detail rate limits, error handling, or specific behavioral traits beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose in the first sentence, followed by additional context in two concise sentences. Each sentence adds value: execution details, sequential flow, and API key usage, with zero waste or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and annotations covering mutation/external effects, the description adequately explains the tool's purpose and basic behavior. However, it lacks details on return values, error conditions, or execution constraints, which would be helpful for a tool with external API dependencies and mutation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so parameters are fully documented in the schema. The description does not add meaning beyond the schema, such as explaining relationships between agent_id, pipeline_id, and prompt, or providing usage examples. Baseline 3 is appropriate as the schema handles parameter documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Execute a pipeline on a prompt') and the resource ('pipeline'), specifying that steps run sequentially with outputs feeding forward. It distinguishes from siblings like 'kopern_create_pipeline' (creation vs. execution) and 'kopern_run_autoresearch' (different execution type).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for running a pipeline on a prompt, but does not explicitly state when to use this tool versus alternatives like 'kopern_run_autoresearch' or 'kopern_run_team'. It mentions 'Uses YOUR API keys', which hints at authentication context but lacks clear exclusions or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kopern_run_teamAInspect

Execute a multi-agent team on a prompt. Returns each agent's output and the final combined result. Uses YOUR API keys.

ParametersJSON Schema

Name	Required	Description	Default
`prompt`	Yes	The task/prompt to send to the team
`team_id`	Yes	The team ID or name

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=false (implying mutation) and openWorldHint=true (suggesting flexible inputs), which the description does not contradict. It adds useful context about API key usage ('Uses YOUR API keys') and output details ('Returns each agent's output and the final combined result'), but lacks information on rate limits, authentication specifics, or error handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose in the first sentence, followed by output and key usage details in two additional sentences, with no wasted words or redundant information, making it highly efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (executing multi-agent teams), lack of output schema, and annotations covering basic hints, the description is mostly complete—it explains purpose, output, and API key usage. However, it could benefit from more details on error cases or performance expectations to fully guide the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the input schema fully documents the two parameters (team_id and prompt). The description does not add semantic details beyond what the schema provides, such as examples or constraints, so it meets the baseline for high schema coverage without extra value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Execute a multi-agent team on a prompt') and resource ('multi-agent team'), distinguishing it from sibling tools like kopern_run_pipeline or kopern_run_autoresearch by focusing on team execution rather than pipeline or research workflows.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool ('Execute a multi-agent team on a prompt'), but does not explicitly state when not to use it or name alternatives among siblings like kopern_run_pipeline, leaving some ambiguity for the agent in choosing between similar execution tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kopern_update_agentA

Idempotent

Inspect

Update any part of an agent: config, skills, tools, or extensions. Use add/remove arrays for granular control over subcollections.

ParametersJSON Schema

Name	Required	Description
`name`	No	New agent name
`model`	No	Model ID override
`domain`	No	New agent domain/category
`agent_id`	Yes	The agent ID or name
`provider`	No	LLM provider (anthropic, openai, google, mistral, ollama)
`tools_add`	No	Add custom tools (sandboxed JS)
`skills_add`	No	Add skills (domain knowledge blocks)
`description`	No	New agent description
`tools_remove`	No	Remove custom tools by name
`builtin_tools`	No	Built-in tools to enable: web_fetch, memory, github_read, github_write, bug_management, datagouv, piste, service_email, service_calendar
`skills_remove`	No	Remove skills by name
`system_prompt`	No	New system prompt
`extensions_add`	No	Add extensions (event hooks)
`extensions_remove`	No	Remove extensions by name

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate this is a non-read-only (readOnlyHint: false) and idempotent (idempotentHint: true) operation. The description adds valuable context by specifying that updates can target 'any part' and mentions 'granular control over subcollections' via add/remove arrays, which clarifies the mutation behavior beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with zero waste: the first states the purpose and scope, the second provides a key usage tip. It's front-loaded and appropriately sized for the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's high complexity (14 parameters, mutation operation) and lack of output schema, the description is reasonably complete. It covers the core purpose and a critical behavioral aspect (granular control via arrays), though it could benefit from more guidance on error handling or response format.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all 14 parameters. The description adds marginal value by hinting at the purpose of add/remove arrays for 'subcollections' but doesn't provide additional syntax or format details beyond what's in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Update') and resource ('any part of an agent') with specific components listed (config, skills, tools, or extensions). It distinguishes from sibling tools like kopern_create_agent (creation) and kopern_delete_agent (deletion) by focusing on modification.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for granular updates to existing agents but doesn't explicitly state when to use this vs. alternatives like kopern_create_agent for new agents or kopern_import_agent for bulk changes. No exclusions or prerequisites are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [{ "email": "your-email@example.com" }]
}

The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Resources

Need Help?