Skip to main content
Glama

icme-preflight

Server Details

Jailbreak-proof AI guardrails. Automated Reasoning SMT solver, not an LLM. ZK proofs included.

Status
Healthy
Last Tested
Transport
Streamable HTTP
URL
Repository
ICME-Lab/jolt-atlas
GitHub Stars
55

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client
Glama
MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.
Tool DescriptionsA

Average 4.2/5 across 14 of 14 tools scored. Lowest: 3.3/5.

Server CoherenceA
Disambiguation4/5

Most tools have distinct purposes, such as check_action for formal verification, make_rules for policy creation, and get_scenarios for testing. However, check_action and quick_check both provide guardrail verdicts, with quick_check being a lightweight version, which could cause some confusion if an agent doesn't carefully read the descriptions. The paid vs. free variants (e.g., check_action vs. check_action_paid) are clearly differentiated by payment method.

Naming Consistency5/5

Tool names follow a consistent snake_case pattern throughout, with clear verb_noun structures like check_action, create_account, and run_tests. This predictability makes it easy for an agent to understand and navigate the toolset without confusion from mixed naming conventions.

Tool Count5/5

With 14 tools, the set is well-scoped for a policy verification and account management server. It covers core workflows from policy creation (make_rules) to verification (check_action, quick_check), testing (run_tests, get_scenarios), and account operations (create_account, top_up), with each tool serving a clear purpose without redundancy.

Completeness5/5

The toolset provides complete coverage for the domain of policy enforcement and verification. It includes policy creation (make_rules), verification with different options (check_action, quick_check, check_logic), testing (run_tests, get_scenarios), account management (create_account, top_up), and proof verification (verify_proof). There are no obvious gaps, and agents can handle the full lifecycle from setup to enforcement.

Available Tools

14 tools
check_actionAInspect

Enforce a guardrail: verify an agent action against a compiled policy using formal verification. An SMT solver — not an LLM — determines whether the action satisfies every rule. Returns SAT (allowed) or UNSAT (blocked) with extracted values and a cryptographic ZK proof that the check was performed correctly. Cannot be jailbroken. 1 credit ($0.01). Requires api_key. Tip: end the action with an explicit claim like 'I assert this complies with the policy' for best extraction.

ParametersJSON Schema
NameRequiredDescriptionDefault
actionYesThe agent action to verify against the policy (max 2000 chars)
api_keyYesYour ICME API key
policy_idYesPolicy ID from make_rules
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and does so effectively by disclosing key behavioral traits: it uses an SMT solver for verification, returns SAT/UNSAT outcomes with extracted values and a ZK proof, cannot be jailbroken, costs 1 credit ($0.01), and requires an API key. It doesn't contradict any annotations, but could add more on rate limits or error handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, starting with the core purpose and key differentiators. Each sentence adds value, such as cost, requirements, and tips, with no wasted words. It could be slightly more structured by separating behavioral details into distinct clauses, but remains efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (formal verification tool with cryptographic proofs), no annotations, and no output schema, the description does well by explaining the verification method, return values (SAT/UNSAT with extracted values and proof), cost, and requirements. It could improve by detailing the output format more explicitly, but is largely complete for agent use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all three parameters thoroughly. The description adds minimal value beyond the schema by implying the 'action' parameter should include an explicit claim for best extraction, but doesn't provide additional syntax or format details. This meets the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific verb ('enforce a guardrail: verify') and resource ('agent action against a compiled policy'), and distinguishes from siblings by specifying formal verification with an SMT solver rather than LLM-based checks. It explicitly contrasts with tools like 'check_logic' or 'quick_check' by emphasizing cryptographic proof generation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool ('verify an agent action against a compiled policy using formal verification') and when not to use alternatives (e.g., not for LLM-based checks). It includes a tip for best practices ('end the action with an explicit claim') and mentions prerequisites ('Requires api_key'), offering comprehensive usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_action_paidAInspect

Same formal verification as check_action, but pay per call with x402 ($0.10 USDC on Base) instead of using credits. No API key or account needed — any agent with a wallet can verify actions on the fly. Returns SAT (allowed) or UNSAT (blocked) with extracted values and optional ZK proof.

ParametersJSON Schema
NameRequiredDescriptionDefault
actionYesThe agent action to verify against the policy (max 2000 chars)
policy_idYesPolicy ID from make_rules
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key traits: payment requirement ($0.10 USDC on Base), no API key or account needed, and returns SAT/UNSAT outcomes with extracted values and optional ZK proof. However, it doesn't mention rate limits, error handling, or transaction confirmation times.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured in two sentences: the first establishes purpose and differentiation, the second covers payment, accessibility, and return values. Every phrase adds value with zero wasted words, making it highly concise and well-organized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a 2-parameter tool with no annotations and no output schema, the description does well by explaining the payment model, accessibility, and return format. However, it could benefit from mentioning error cases or confirmation details for the payment transaction, slightly reducing completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters thoroughly. The description doesn't add any additional meaning about the parameters beyond what's in the schema descriptions, maintaining the baseline score of 3 for adequate but not enhanced parameter semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs 'formal verification' of actions against policies, specifically distinguishing it from sibling 'check_action' by noting it uses payment via x402 instead of credits. It explicitly mentions the verb 'verify' and resource 'actions' with clear differentiation from alternatives.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool: 'instead of using credits' and 'any agent with a wallet can verify actions on the fly.' It clearly contrasts with 'check_action' by specifying the payment mechanism and lack of API key/account requirements, offering clear alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_logicAInspect

Catch contradictions in reasoning before acting on it. FREE — no account needed. Extracts quantitative and logical claims from any plan, calculation, or chain of thought, then uses a Z3 SAT solver to mathematically prove whether they contradict each other. This is formal verification, not an LLM second-guessing itself. Returns CONSISTENT, CONTRADICTION, or UNKNOWN with the extracted claims.

ParametersJSON Schema
NameRequiredDescriptionDefault
show_smtNoInclude the generated SMT-LIB2 formula in the response for inspection (default: false)
reasoningYesThe reasoning, plan, or chain of thought to check. Be specific — include numbers, conditions, and constraints for the best results (max 2000 chars)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well by disclosing key behaviors: it's free/no account needed, uses formal verification with Z3 SAT solver, extracts claims automatically, and returns three specific outcomes. However, it doesn't mention rate limits, computational constraints, or error handling for invalid inputs, leaving some gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured and front-loaded: first sentence states core purpose, second explains methodology, third clarifies what it's not, fourth specifies outputs. Every sentence earns its place with no wasted words, and it fits complex information into four concise sentences.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 2 parameters, 100% schema coverage, and no output schema, the description is mostly complete. It explains the verification approach, accessibility, and return values well. However, without annotations or output schema, it could better describe error cases or limitations (e.g., character limit mentioned in schema but not description).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters thoroughly. The description adds minimal value beyond the schema—it emphasizes specificity for the 'reasoning' parameter but doesn't explain parameter interactions or provide additional semantic context. Baseline 3 is appropriate when schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('catch contradictions', 'extracts quantitative and logical claims', 'uses a Z3 SAT solver to mathematically prove') and resources ('any plan, calculation, or chain of thought'). It distinguishes from siblings by emphasizing formal verification rather than LLM second-guessing, and explicitly names what it returns (CONSISTENT, CONTRADICTION, UNKNOWN).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidance: 'before acting on it' indicates when to use (pre-action validation), 'FREE — no account needed' clarifies accessibility, and 'not an LLM second-guessing itself' distinguishes it from alternatives like check_action or quick_check. It also specifies input requirements ('Be specific — include numbers, conditions, and constraints').

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_relevanceAInspect

Free relevance screen — checks whether an action touches any of your policy variables before running a paid check. Returns a relevance score and which variables matched. If should_check is true, run check_action. If false, the action is unrelated to your policy — skip the paid check. No credits charged. Requires api_key.

ParametersJSON Schema
NameRequiredDescriptionDefault
actionYesThe agent action to screen (max 2000 chars)
api_keyYesYour ICME API key
policy_idYesPolicy ID from make_rules
thresholdNoRelevance threshold (0.0 to 1.0). Default 0.0 — any match triggers should_check. Optional.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key traits: it's a free screen (no credits charged), returns a relevance score and matched variables, and requires an api_key. However, it lacks details on rate limits, error handling, or response format, leaving some behavioral aspects unclear.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose, followed by usage guidelines and key behavioral details. Every sentence adds value—explaining the tool's role, output, decision logic, cost, and prerequisites—without redundancy or unnecessary elaboration, making it highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description does well to cover purpose, usage, and key behaviors like cost and authentication. However, it lacks details on the output format (e.g., structure of relevance score and matched variables) and error cases, which could be important for a tool with no output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters. The description adds minimal value beyond the schema, mentioning 'api_key' as a requirement but not elaborating on parameter interactions or semantics. Baseline 3 is appropriate as the schema handles most of the parameter documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Free relevance screen — checks whether an action touches any of your policy variables before running a paid check.' It specifies the verb ('checks'), resource ('policy variables'), and distinguishes it from sibling tools like check_action (paid check) and check_action_paid, making it specific and differentiated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool: 'If should_check is true, run check_action. If false, the action is unrelated to your policy — skip the paid check.' It names the alternative tool (check_action) and specifies the decision logic, offering clear context for usage versus alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_accountAInspect

Create an ICME Preflight account with x402 USDC payment ($5.00 on Base). Returns an API key and 325 starting credits immediately. Save the api_key — it is shown only once. Use create_account_card instead if paying by credit card.

ParametersJSON Schema
NameRequiredDescriptionDefault
usernameYesUnique username (1-32 chars, alphanumeric + hyphens/underscores)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behaviors: the payment amount ($5.00 on Base), immediate returns (API key and 325 starting credits), and a critical warning ('Save the api_key — it is shown only once'). However, it lacks details on error conditions, rate limits, or authentication requirements, which would be helpful for a creation tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose and key details (payment, returns, warning), followed by the alternative tool mention. Every sentence adds value—none are redundant or vague—making it efficiently structured and appropriately sized for the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (account creation with payment), lack of annotations, and no output schema, the description does well by covering payment details, immediate outcomes, and a critical warning. However, it could be more complete by mentioning potential errors (e.g., duplicate username) or the format of the returned API key, though the warning about saving it mitigates some of this gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with the 'username' parameter fully documented in the schema. The description does not add any parameter-specific information beyond what the schema provides, such as examples or usage context for the username. This meets the baseline of 3 when schema coverage is high.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Create an ICME Preflight account') and the resource involved ('with x402 USDC payment'), distinguishing it from the sibling tool 'create_account_card' which is mentioned as an alternative for credit card payments. It goes beyond just restating the name by specifying the payment method and immediate outcomes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly provides when to use this tool ('with x402 USDC payment') and when to use an alternative ('Use create_account_card instead if paying by credit card'), offering clear guidance on tool selection based on payment method. This directly addresses sibling differentiation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_account_cardAInspect

Create an ICME Preflight account with a credit card ($5.00 via Stripe). Returns a checkout_url — open it in a browser to pay. After payment completes, call poll_session with the session_id to retrieve your API key.

ParametersJSON Schema
NameRequiredDescriptionDefault
usernameYesUnique username (1-32 chars, alphanumeric + hyphens/underscores)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behaviors: it's a mutation tool (creates an account), involves a payment of $5.00 via Stripe, returns a checkout_url that must be opened in a browser, and requires a follow-up call to 'poll_session' with the session_id. However, it doesn't mention error handling, rate limits, or authentication needs, leaving some gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core action and efficiently structured in two sentences: one for the creation and payment process, and another for the follow-up steps. Every sentence provides essential information without waste, making it highly concise and well-organized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of a paid account creation tool with no annotations and no output schema, the description is mostly complete. It covers the purpose, usage flow, and key behaviors like payment and follow-up actions. However, it lacks details on error cases, response formats beyond the checkout_url, or what happens if payment fails, which could be important for full contextual understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with the single parameter 'username' fully documented in the schema. The description adds no additional parameter information beyond what the schema provides, such as format examples or constraints beyond the schema's description. This meets the baseline of 3 for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Create an ICME Preflight account with a credit card') and resource ('account'), distinguishing it from siblings like 'create_account' (which presumably doesn't involve payment) and 'top_up_card' (which is for existing accounts). It specifies the payment method and amount, making the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use this tool: to create a paid account via credit card. It provides clear alternatives by directing the user to call 'poll_session' after payment to retrieve the API key, distinguishing it from other account-related tools like 'create_account' (likely unpaid) and 'top_up_card' (for topping up).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_scenariosAInspect

Review auto-generated test scenarios for a compiled policy. Shows example actions that should pass and fail, so you can verify the policy behaves as intended before deploying it. Requires api_key.

ParametersJSON Schema
NameRequiredDescriptionDefault
api_keyYesYour ICME API key
policy_idYesPolicy ID from make_rules
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses that the tool 'shows example actions that should pass and fail' and 'requires api_key', adding useful context about output format and authentication. However, it lacks details on rate limits, pagination, error handling, or whether it's read-only (implied but not explicit).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded: the first sentence states the core purpose, the second adds key behavioral context, and the third specifies a prerequisite. Every sentence earns its place with no wasted words or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (2 required parameters, no output schema), the description is reasonably complete: it explains what the tool does, when to use it, and key behavioral traits. However, without annotations or output schema, it could benefit from more details on response format or error cases, though it covers the essentials adequately.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters fully. The description adds no additional meaning beyond what's in the schema (e.g., it doesn't explain how policy_id relates to 'make_rules' or provide examples). With high schema coverage, the baseline is 3 even without param info in the description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('review auto-generated test scenarios'), the target resource ('for a compiled policy'), and the purpose ('verify the policy behaves as intended before deploying it'). It distinguishes itself from siblings like 'run_tests' or 'check_action' by focusing on reviewing pre-generated scenarios rather than executing tests or checking specific actions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context about when to use this tool ('before deploying' a policy to verify behavior), but does not explicitly state when not to use it or name specific alternatives. It implies usage for pre-deployment verification, which helps differentiate from tools like 'run_tests' that might be for execution rather than review.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

make_rulesAInspect

Turn a plain-English policy into a jailbreak-proof guardrail. Write rules the way you would explain them to a colleague — ICME compiles them into formal logic (SMT-LIB) that a mathematical solver enforces. No prompt engineering. No LLM judges. The solver either proves an action complies or it doesn't. Returns a policy_id and test scenarios for review. Costs 300 credits. Requires api_key.

ParametersJSON Schema
NameRequiredDescriptionDefault
policyYesYour policy in plain English, e.g. 'No action may access user data without explicit consent. External API calls require approval above $100.' Up to 50 rules.
api_keyYesYour ICME API key (from create_account or create_account_card)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behaviors: the compilation process (ICME compiles to SMT-LIB), enforcement mechanism (mathematical solver), deterministic outcome (proves compliance or doesn't), cost implications (300 credits), and authentication requirement (api_key). It doesn't mention rate limits or error handling, but covers the essential operational characteristics.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with front-loaded core purpose, followed by implementation details, constraints, and requirements. Every sentence adds value: the first establishes the transformation process, the second explains the compilation/enforcement, the third clarifies what it's NOT, the fourth describes the deterministic outcome, and the final two cover outputs and requirements. It could be slightly more concise by combining some elements.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a 2-parameter tool with no annotations and no output schema, the description provides substantial context about the transformation process, enforcement mechanism, cost, and authentication. It explains the compilation to SMT-LIB and mathematical solver enforcement well. The main gap is lack of information about return values beyond 'policy_id and test scenarios' - without an output schema, more detail about the response format would be helpful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already fully documents both parameters. The description adds minimal additional context about the policy parameter ('plain-English policy' and 'Up to 50 rules'), but doesn't provide meaningful semantic information beyond what's in the schema descriptions. The baseline of 3 is appropriate when the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: converting plain-English policies into formal guardrails using SMT-LIB logic. It specifies the output (policy_id and test scenarios), distinguishes it from prompt engineering/LLM approaches, and differentiates from siblings like check_action or verify_proof by focusing on rule creation rather than validation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context about when to use this tool: when you need to create jailbreak-proof guardrails from natural language policies. It mentions the 300-credit cost and api_key requirement as prerequisites. However, it doesn't explicitly state when NOT to use it or name specific alternatives among the sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

poll_sessionAInspect

Check the status of a Stripe card payment. Returns pending or complete. On completion after signup, returns the api_key. Call this after create_account_card or top_up_card once the user has paid.

ParametersJSON Schema
NameRequiredDescriptionDefault
session_idYesThe session_id returned by create_account_card or top_up_card
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well by explaining the return behavior ('Returns pending or complete... returns the api_key') and the completion condition ('after signup'). However, it doesn't mention error conditions, rate limits, or authentication requirements, which would be helpful for a payment-related tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences with zero waste: first states purpose and return values, second adds completion detail, third provides clear usage guidance. Every sentence earns its place by adding distinct, necessary information in a logical flow.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a single-parameter tool with no annotations and no output schema, the description provides good coverage of purpose, usage context, and return behavior. However, as a payment status tool, it could benefit from mentioning error handling or what happens with invalid sessions to be fully complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents the session_id parameter thoroughly. The description adds minimal value by mentioning that session_id comes from 'create_account_card or top_up_card,' but doesn't provide additional format or validation details beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Check the status'), resource ('Stripe card payment'), and outcome ('Returns pending or complete... returns the api_key'). It distinguishes from siblings by mentioning specific triggering tools (create_account_card, top_up_card), making the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit guidance is provided on when to use this tool: 'Call this after create_account_card or top_up_card once the user has paid.' This clearly defines the triggering condition and temporal relationship with sibling tools, leaving no ambiguity about proper usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

quick_checkAInspect

Fast yes/no guardrail verdict — returns only ALLOWED or BLOCKED with no proof details. Use when you need a lightweight gate and don't need the full SAT/UNSAT report or ZK proof. Same formal verification under the hood, just a minimal response. 1 credit. Requires api_key.

ParametersJSON Schema
NameRequiredDescriptionDefault
actionYesThe agent action to check (max 2000 chars)
api_keyYesYour ICME API key
policy_idYesPolicy ID from make_rules
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses key behavioral traits: it's a fast/lightweight check, returns only ALLOWED/BLOCKED outcomes, omits proof details, uses formal verification internally, costs 1 credit, and requires an api_key. This covers performance, output format, internal mechanism, cost, and authentication needs.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is highly concise and well-structured in three sentences. Each sentence adds distinct value: purpose, usage guidelines, and additional context (verification method, credit cost, authentication). There is no wasted text, and information is front-loaded effectively.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description does a good job covering purpose, usage, behavior, and constraints. It could be more complete by specifying error conditions or response formats, but for a tool with 100% schema coverage and clear behavioral disclosure, it's largely adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all three parameters. The description adds no specific parameter semantics beyond implying 'action' relates to agent actions and 'policy_id' connects to 'make_rules', but this is minimal enhancement over the schema's descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Fast yes/no guardrail verdict — returns only ALLOWED or BLOCKED with no proof details.' It specifies the verb ('check'), resource ('guardrail verdict'), and distinguishes it from siblings by mentioning it's a lightweight alternative to tools that provide full reports or proofs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use this tool: 'Use when you need a lightweight gate and don't need the full SAT/UNSAT report or ZK proof.' It also distinguishes it from alternatives by mentioning what it lacks compared to other tools, providing clear context for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

run_testsAInspect

Run saved test cases against a policy to confirm it blocks what it should block and allows what it should allow. Run this after make_rules and before using the policy in production. Requires api_key.

ParametersJSON Schema
NameRequiredDescriptionDefault
api_keyYesYour ICME API key
policy_idYesPolicy ID to test
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions the requirement for an API key, which is useful context, but lacks details on what the tool returns (e.g., test results format), error handling, or execution time. It adequately describes the action but misses richer behavioral traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, with two sentences that efficiently convey purpose, usage guidelines, and prerequisites without waste. Every sentence earns its place by adding value beyond the tool name.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (testing tool with no annotations and no output schema), the description is fairly complete for purpose and usage but lacks details on return values or error scenarios. It covers key aspects like timing and prerequisites, but could benefit from more on behavioral outcomes.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters ('api_key' and 'policy_id'). The description adds no additional meaning beyond what the schema provides, such as explaining how the policy_id relates to test cases or API key usage specifics. Baseline 3 is appropriate when the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('run saved test cases against a policy') and resources ('policy'), and distinguishes it from siblings by mentioning 'after make_rules' and 'before using the policy in production'. It explicitly explains what the tool does: confirm blocking and allowing behaviors.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool ('after make_rules and before using the policy in production'), which distinguishes it from alternatives like 'make_rules' or production deployment. It also mentions prerequisites ('Requires api_key'), offering clear context for usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

top_upAInspect

Add 500 credits to your account via x402 USDC payment ($5.00 on Base). Use top_up_card instead for credit card payment with volume discounts. Requires api_key.

ParametersJSON Schema
NameRequiredDescriptionDefault
api_keyYesYour ICME API key
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses the required authentication ('Requires api_key') and implies a financial transaction, but doesn't mention rate limits, error conditions, or what happens after payment (e.g., immediate credit application). It adds some behavioral context but lacks completeness for a payment tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely efficient - three sentences that each serve a distinct purpose: stating the tool's function, providing alternative guidance, and noting authentication requirements. No wasted words, perfectly front-loaded with the core functionality.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a single-parameter payment tool with no annotations and no output schema, the description provides good context about payment method, amount, pricing, and authentication. However, it doesn't explain what the tool returns (e.g., transaction ID, new balance) or potential errors, leaving some gaps in completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents the single parameter 'api_key'. The description adds no additional parameter information beyond what's in the schema. With high schema coverage, the baseline is 3 even without extra param details in the description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Add 500 credits to your account') and the payment method ('via x402 USDC payment ($5.00 on Base)'), distinguishing it from the sibling tool 'top_up_card' which uses credit card payment. This provides a precise verb+resource+method combination.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use this tool ('via x402 USDC payment') and when to use an alternative ('Use top_up_card instead for credit card payment with volume discounts'). This provides clear guidance on tool selection based on payment method preferences.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

top_up_cardBInspect

Add credits via credit card with volume bonuses. Returns a checkout_url. Tiers: $5 = 500 credits, $10 = 1,050 (+5%), $25 = 2,750 (+10%), $50 = 5,750 (+15%), $100 = 12,000 (+20%). Credits never expire. Requires api_key.

ParametersJSON Schema
NameRequiredDescriptionDefault
api_keyYesYour ICME API key
amount_usdYesTop-up amount in USD: 5, 10, 25, 50, or 100
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It discloses key behavioral aspects: returns a checkout_url, includes specific pricing tiers with volume bonuses, states credits never expire, and mentions api_key requirement. However, it doesn't cover important aspects like error conditions, rate limits, authentication scope, or what happens after checkout_url is used.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with key information front-loaded: purpose, return value, pricing tiers, expiration policy, and requirement. The pricing tier list is detailed but necessary for understanding the tool's behavior. Some sentences could be more concise, but overall it's well-organized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a payment tool with no annotations and no output schema, the description provides good coverage of the core functionality but has gaps. It explains the pricing model and return value (checkout_url) well, but doesn't describe error cases, response format beyond the URL, or post-checkout workflow. The absence of output schema means the description should ideally explain what happens after the checkout_url is provided.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The description adds meaningful context beyond the schema by explaining the specific dollar amounts correspond to credit tiers with bonus percentages, which helps the agent understand the business logic behind the parameter values. However, it doesn't explain parameter interactions or constraints beyond what's in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Add credits via credit card') and resource ('card'), and specifies it's for top-ups with volume bonuses. However, it doesn't explicitly differentiate from sibling 'top_up' tool, which appears to be a similar function based on naming.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions 'Requires api_key' as a prerequisite, but provides no guidance on when to use this tool versus the sibling 'top_up' tool or other payment-related tools like 'create_account_card'. No explicit when/when-not instructions or alternative recommendations are included.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

verify_proofAInspect

Independently verify a ZK proof from a prior check_action call. Confirms the guardrail check was performed correctly without re-running it — any third party or monitoring agent can verify in under one second. No additional cost. Wait a few minutes after the check for the proof to be generated. Single-use per proof.

ParametersJSON Schema
NameRequiredDescriptionDefault
proof_idYesThe proof_id returned by a prior check_action or check_action_paid call
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and adds valuable behavioral context: it discloses that verification is fast (under one second), has no additional cost, requires waiting after a check, and is single-use per proof. However, it doesn't mention error handling, rate limits, or authentication needs, leaving some gaps in behavioral transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, with every sentence earning its place: the first states the core purpose, followed by key behavioral details (speed, cost, timing, usage). No redundant information, and it efficiently covers essential aspects in a compact form.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (verification with timing constraints), no annotations, and no output schema, the description is mostly complete: it covers purpose, usage, and key behaviors. However, it lacks details on output format, error cases, or specific prerequisites beyond waiting, leaving minor gaps for full contextual understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents the proof_id parameter as a UUID from prior calls. The description adds minimal semantics beyond this, only implying the parameter's source (prior calls) without detailing format or constraints. Baseline 3 is appropriate as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('verify a ZK proof') and resources ('from a prior check_action call'), distinguishing it from siblings like check_action (which creates proofs) and poll_session (which might check status). It explicitly mentions independent verification by third parties, which clarifies its unique role.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidelines: use after a prior check_action call, wait a few minutes for proof generation, and it's single-use per proof. It distinguishes from alternatives by specifying this is for verification only, not re-running checks, and mentions no cost, which contrasts with paid siblings like check_action_paid.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Sign in to create a connector for this server.