Skip to main content
Glama
LuciferForge

agent-safety-mcp

by LuciferForge

Server Quality Checklist

50%
Profile completionA complete profile improves this server's visibility in search results.
  • This repository includes a README.md file.

  • This repository includes a LICENSE file.

  • Latest release: v0.2.1

  • No tool usage detected in the last 30 days. Usage tracking helps demonstrate server value.

    Tip: use the "Try in Browser" feature on the server page to seed initial usage.

  • Add a glama.json file to provide metadata about your server.

  • This server provides 17 tools. View schema
  • No known security issues or vulnerabilities reported.

    Report a security issue

  • Are you the author?

  • Add related servers to improve discoverability.

Tool Scores

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden. It adds valuable behavioral constraints (agent_id format 'org/agent-name', purpose min 10 chars, comma-separated capabilities) but lacks mutation semantics (idempotency, overwrite behavior, return value) or auth requirements.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Uses an Args section efficiently to document parameters without redundancy. The structure is clear and scannable, though the docstring format slightly differs from typical prose descriptions.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given no output schema exists, the description should ideally specify what the tool returns (card ID, success boolean, full card object). Parameter coverage is complete, but the omission of return value description leaves a gap for a creation tool.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Excellent compensation for 0% schema description coverage. The description documents all 6 parameters with formats, examples, and constraints (e.g., 'min 10 chars for validity', 'Comma-separated list') that the schema completely omits.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the action (Create) and resource (KYA identity card) with helpful expansion of the acronym. It distinguishes from non-KYA siblings, though it could better clarify the relationship with kya_sign_card and kya_verify_card in the workflow.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No explicit guidance on when to use this tool versus kya_sign_card or kya_verify_card, or the sequence of operations (create vs sign vs verify). Given the sibling relationships, workflow guidance is missing.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden. It discloses the blocking side-effect and enumerates the severity scale ('LOW' through 'CRITICAL'), but omits critical behavioral details: the type of injection detected (prompt injection, SQL, etc.), the return value format on pass/fail, and whether blocking throws an exception or returns a status object.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is efficiently structured with a single purpose statement followed by a clear Args section. There is no redundant text, and every sentence provides necessary information not found in the structured fields.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a tool with no output schema and no annotations, the description adequately covers the input parameters but leaves a significant gap regarding the return behavior. Since this is a security gate that 'blocks', specifying whether it returns a boolean, a detailed object, or raises an exception is necessary for correct agent invocation.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The Args section effectively compensates for the 0% schema description coverage by documenting both parameters: 'text' is defined as the input to check, and 'threshold' includes the valid severity strings. It loses a point for failing to mention the default value 'MEDIUM' specified in the schema, which is essential for optional parameter usage.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the tool scans text and blocks based on a severity threshold, using specific verbs ('scan', 'block') and identifying the resource ('text'). However, it fails to differentiate from the sibling tool 'injection_scan', leaving ambiguity about when to use this blocking variant versus the scanning alternative.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    There is no explicit guidance on when to use this tool versus alternatives like 'injection_scan' or 'safety_check'. The description implies usage through the word 'block' but does not specify prerequisites, when-not-to-use conditions, or comparative scenarios with sibling tools.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden. It discloses what data is returned (step count, errors, timing) but omits critical safety/disposition details: whether calling this clears the trace, if it requires an active session, or idempotency.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Single sentence, front-loaded with the action and resource. The em-dash efficiently lists summary contents without wasted words. Every clause earns its place.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Adequate for a zero-parameter tool but incomplete given no output schema exists. The description hints at return content (step count, errors, timing) but doesn't clarify error conditions (e.g., behavior if no trace is active) or output structure.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Input schema has zero parameters. Per rubric, zero-parameter tools baseline at 4. The description requires no parameter clarification.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    Clear verb ('Get') and resource ('summary of the current trace session'). The enumerated fields (step count, errors, timing) help distinguish it from sibling trace tools like trace_start or trace_step, though 'Get' is slightly generic.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No explicit guidance on when to use this versus trace_start, trace_step, or trace_save. The description implies a reporting function but doesn't state prerequisites (e.g., 'use after starting a trace') or workflow position.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Without annotations, the description carries the full burden. It partially compensates by specifying the return content ('per-token pricing'), indicating the data structure includes cost metrics. However, it lacks critical behavioral context: no indication of side effects (though 'List' implies read-only), rate limits, caching behavior, or whether the data is real-time versus static.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Single sentence, zero waste. Front-loaded with the verb and core object. No redundant phrases or unnecessary verbosity given the tool's simplicity.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness4/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given zero parameters and no output schema, the description adequately compensates by specifying what information is returned (model list with pricing). For a simple discovery tool, this is sufficient context, though explicit mention of return format (array, object) would improve it further.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Input schema contains zero parameters. Per scoring rules, zero-parameter tools receive a baseline score of 4. The description appropriately does not invent parameters where none exist.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the action ('List') and resource ('supported models'), including specific scope ('per-token pricing'). It effectively distinguishes from siblings like cost_guard_check (validation) and cost_guard_configure (settings management) by focusing on discovery/retrieval of pricing data.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No explicit guidance on when to use this tool versus alternatives (e.g., when to query this vs. cost_guard_check). No mention of prerequisites or typical workflow position. The description only states what the tool does, not when to invoke it.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden. It partially compensates by disclosing the return content (budget remaining, percentage used, call details), which helps since there is no output schema. However, it lacks disclosure of safety properties (read-only vs. mutation), rate limits, or authentication requirements.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is a single, front-loaded sentence with zero waste. The em-dash efficiently separates the action from the return value details. Every word earns its place.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness4/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the tool's simplicity (zero parameters, no output schema, no annotations), the description adequately compensates by verbally describing the output content. It successfully communicates what the tool does and returns without excess verbosity.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The tool has zero parameters, which establishes a baseline score of 4 per the scoring guidelines. No parameter description is needed or provided.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description uses a specific verb ('Check') and resource ('budget spend') and clarifies what information is returned ('how much is left, percentage used, call details'). However, it fails to differentiate from the sibling tool 'cost_guard_check', which has a nearly synonymous name.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this tool versus siblings like 'cost_guard_check' or 'cost_guard_record'. There are no prerequisites, conditions, or exclusions mentioned.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden. It successfully discloses the algorithm (Ed25519) and storage location (~/.kya/keys/), which are critical behavioral traits. However, it fails to mention idempotency rules (overwriting existing keys), required permissions, or what the tool returns (file paths vs key content).

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is efficiently structured with a clear single-sentence purpose followed by Args documentation. No sentences are wasted. However, the docstring-style 'Args:' formatting is slightly rigid compared to natural language descriptions, though still perfectly readable.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a single-parameter tool without output schema or annotations, the description covers the essential purpose and parameter semantics. However, it leaves significant gaps regarding the return value (does it return key material, file paths, or just status?) and side-effect behavior (overwrite policies), which are important for a cryptographic generation tool.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 0%, requiring the description to compensate. It successfully explains that 'name' refers to the 'Key name,' documents the default value 'mcp-session,' and provides critical context that keys are stored at ~/.kya/keys/, giving semantic meaning to the parameter beyond the bare schema.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    Clearly states the specific action (Generate), resource (Ed25519 keypair), and intended use case (for signing agent identity cards). The mention of 'signing' helps distinguish it from sibling tools like `kya_create_card` and `kya_verify_card`, though it doesn't explicitly contrast with `kya_sign_card` which likely consumes these keys.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines3/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Provides implicit context by stating the keys are 'for signing agent identity cards,' suggesting it should be used before signing operations. However, it lacks explicit guidance on when to use this versus existing keys, prerequisites (e.g., directory permissions), or workflow sequence relative to `kya_sign_card`.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden. It discloses file system side effects ('Directory to save trace files'), indicating this is a write operation. However, it omits critical behavioral details such as whether existing traces are overwritten, required directory permissions, return value format, or error handling behavior.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is appropriately sized with a clear purpose statement followed by a structured Args block. Every sentence earns its place. The docstring-style Args format is readable, though slightly informal; it efficiently packs necessary parameter details without redundancy.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a three-parameter tool with no output schema and no annotations, the description adequately covers parameter semantics but leaves gaps regarding the return value (success indication, session ID, or handle) and error conditions (invalid paths, permission failures). It mentions side effects but doesn't fully characterize the session lifecycle.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters5/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Given 0% schema description coverage, the description fully compensates by documenting all three parameters in the Args section: 'agent' explains filename usage, 'trace_dir' clarifies the save location purpose, and 'model' notes the metadata attachment. This provides essential semantic meaning beyond the bare types in the schema.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the verb 'Start' and resource 'trace session for an AI agent', providing immediate understanding. However, it does not explicitly differentiate from siblings like trace_step, trace_save, or trace_summary, which could help the agent understand the workflow sequence.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No guidance is provided on when to use this tool versus alternatives, nor are prerequisites mentioned (e.g., whether to check for existing traces first). The description lacks 'when-not' conditions or sequencing guidance relative to the other trace-related sibling tools.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations provided, so description carries full burden. It successfully discloses that dry_run mode raises BudgetExceededError and describes the alert threshold behavior. However, it omits whether configuration is persistent/session-based, atomic, or if multiple calls overwrite previous settings.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Uses an efficient 'Args:' docstring format with zero wasted words. Each line earns its place by providing type hints (USD, percentage range) or behavioral context. Slightly unconventional structure for MCP but highly scannable.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Adequate for a 3-parameter configuration tool with no output schema. Documents all inputs including defaults behavior, but lacks mention of return values or side effects (e.g., whether it validates the configuration immediately or only on subsequent cost operations).

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters5/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    With 0% schema description coverage, the description fully compensates by documenting all three parameters: weekly_budget_usd (units), alert_at_pct (range 0.0-1.0), and dry_run (behavioral effect). This prevents the agent from guessing parameter semantics.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose5/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description states 'Configure the cost guard budget' - a specific verb (configure) with clear resource (cost guard budget). Combined with the name, it clearly distinguishes from siblings like cost_guard_check (monitoring) and cost_guard_status (reporting).

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    No explicit guidance on when to use this tool versus alternatives like cost_guard_check or cost_guard_record. No mention of prerequisites (e.g., whether to call this before recording costs) or configuration persistence scope.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden. It successfully indicates the tool is non-blocking and returns a risk assessment, but omits details about the return structure, error behaviors, or side effects that would be necessary for a mutation-sensitive operation.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is efficiently structured with a clear purpose statement followed by an Args section. Every sentence earns its place; there is no redundant or wasted text.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness4/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the tool's low complexity (2 simple parameters, no nested objects) and lack of output schema, the description provides adequate coverage by explaining the inputs and general return type. It appropriately meets the minimum requirements for this simple tool, though more detail on the assessment format would strengthen it.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Despite 0% schema description coverage, the Args section effectively documents both parameters: it explains 'text' as 'The text to scan for injection attempts' and clarifies 'threshold' accepts specific enum values ('LOW', 'MEDIUM', 'HIGH', 'CRITICAL'), fully compensating for the schema's lack of descriptions.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the tool 'Scan[s] text for prompt injection patterns' with a specific verb and resource. However, it does not explicitly differentiate from siblings like 'injection_check' or 'injection_patterns', though 'Returns risk assessment' hints at its analytical nature.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines3/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The phrase 'without blocking' implies this is for assessment rather than enforcement, providing implicit usage context. However, it lacks explicit when-to-use guidance or named alternatives compared to sibling tools like 'injection_check'.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden. It discloses the output formats (JSON and Markdown) and disk write behavior, but omits safety-critical details like file overwrite behavior, required disk permissions, or what the operation returns upon completion.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is a single, front-loaded sentence with zero waste. Every word contributes essential information: the action, target, destination, and file formats.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness4/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the tool's simplicity (no parameters) and lack of output schema, the description adequately covers the core functionality. It could be improved by indicating where files are saved or what success feedback looks like, but it is sufficient for basic invocation.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema contains zero parameters. Per the baseline rules for zero-parameter tools, this earns a default score of 4. The description correctly implies no configuration is needed by omitting any parameter references.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose5/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description provides a specific verb (Save), resource (trace), destination (disk), and output formats (JSON and Markdown). It clearly distinguishes from siblings like trace_start, trace_step, and trace_summary by emphasizing the persistence aspect.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines3/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description implies usage context (saving a trace that presumably exists), but lacks explicit guidance on when to use this versus trace_summary or prerequisites such as having an active trace in progress.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full disclosure burden. It successfully documents return values ('safe/blocked status') but omits other behavioral traits like whether the operation is read-only, idempotent, or if it consumes rate limit quotas.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is efficiently structured with the purpose front-loaded in the first sentence, return values immediately following, and necessary parameter documentation in a compact Args section. No redundant or wasted text.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness4/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a 3-parameter tool with no output schema, the description adequately covers the essential contract: input parameters, operation purpose, and return value categories. A score of 5 would require additional detail on the return format structure or error conditions.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters5/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Given 0% schema description coverage, the description fully compensates by documenting all three parameters: it provides concrete examples for the model identifier and clarifies the semantics of the token count parameters ('Expected input token count').

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose5/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the specific action ('Pre-check') and resource ('model call is within budget'), distinguishing it from sibling tools like cost_guard_configure or cost_guard_status. It also specifies the return value semantics ('safe/blocked status').

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines3/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The prefix 'Pre-check' implies when to use the tool (before making a model call), but there is no explicit guidance on when NOT to use it versus alternatives like cost_guard_status (current budget) or cost_guard_record (post-hoc recording).

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden. It establishes this is a post-call write operation ('record'), but omits details about persistence guarantees, idempotency, error behavior, or return values (no output schema exists to supplement this).

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is well-structured with the core purpose front-loaded in the first sentence, followed by a clean Args section. Every sentence provides value with no redundancy or waste.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness4/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a simple 4-parameter recording tool with basic types, the description adequately covers inputs and operation context. A minor gap exists regarding return behavior or success confirmation, though the absence of an output schema suggests a simple acknowledgement return.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters5/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Given 0% schema description coverage, the Args section fully compensates by documenting all 4 parameters: 'model' (identifier), 'input_tokens'/'output_tokens' (actual consumed counts), and 'purpose' (optional label with examples), adding necessary semantics missing from the schema.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose5/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the specific action ('Record') and target ('completed LLM call's token usage and cost'), distinguishing it from siblings like cost_guard_check, cost_guard_configure, and cost_guard_status through the distinct 'record' verb.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines3/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description implies timing by specifying 'completed LLM call,' suggesting post-call usage, but lacks explicit guidance on when to use this versus alternatives like cost_guard_check (likely for pre-call estimation) or cost_guard_status.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations provided, so description carries full burden. Adds valuable context about output structure (mentions 'categories and weights' fields). However, lacks disclosure on pagination, caching behavior, rate limits, or whether patterns are mutable. 'Built-in' implies read-only system patterns, but this could be explicit.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Single efficient sentence with zero waste. Front-loaded with action verb 'List'. Every clause earns its place by specifying scope (all, built-in) and output characteristics (categories, weights).

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness5/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Appropriately complete for a zero-parameter tool with output schema. Description hints at output structure (categories/weights) without needing to fully document the return values since output schema exists. No gaps given the tool's simplicity.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Zero parameters present (baseline 4). Description appropriately focuses on operation behavior rather than non-existent parameters. No parameter documentation needed or provided.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose5/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    Specific verb 'List' + resource 'built-in injection detection patterns' + scope detail 'with categories and weights'. Effectively distinguishes from siblings 'injection_check' and 'injection_scan' by clarifying this retrieves the pattern definitions/catalog rather than checking content against them.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines3/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Usage is implied by the description (retrieving pattern catalog vs checking content), but lacks explicit guidance on when to use this versus 'injection_check' or 'injection_scan'. No mention of prerequisites like 'use this first to see available patterns before configuring checks'.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden. It adds valuable behavioral constraints (confidence range 0.0-1.0, outcome enum 'ok'/'error') but omits side effects, persistence guarantees, or error conditions.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Follows an efficient two-part structure: single-sentence purpose followed by an Args block. No redundancy; every sentence earns its place by providing unique information not present in structured fields.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness4/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the 6 parameters and lack of output schema/annotations, the description adequately covers input semantics. Minor gap: could clarify the relationship to trace_start (is it a prerequisite?) and what the tool returns.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters5/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 0%, but the description fully compensates by documenting all 6 parameters with semantic meaning, examples ('analyze_signal'), and value constraints that the schema lacks.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose5/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description opens with a specific verb ('Log') and clear resource ('decision step in the current trace session'), precisely distinguishing it from siblings like trace_start (initiates session) and trace_save (persists session).

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines3/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    While the phrase 'current trace session' implies this should be called during an active tracing workflow, there is no explicit guidance on when to use this versus trace_save or trace_summary, nor prerequisites like calling trace_start first.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    Discloses what verification entails (structure, completeness, signature check). However, with no annotations provided, the description fails to confirm read-only safety, describe return values (boolean vs. report), or explain failure modes (exceptions vs. error codes).

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Four lines with zero redundancy. Purpose is front-loaded in the first sentence; Args section efficiently documents parameter semantics. Every sentence earns its place.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness3/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Adequate for a two-parameter tool with simple types, covering inputs well. However, given no output schema and no annotations, the description should explain return values and safety profile (read-only status). Current coverage is minimum viable but has clear gaps.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters5/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Fully compensates for 0% schema description coverage. The Args section explains agent_id references session-created cards and card_json accepts external raw JSON, clarifying they are alternative input methods—critical semantics entirely missing from the schema titles.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose5/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    States specific verb 'Verify' and resource 'KYA identity card', and details what is checked (structure, completeness, signature). Clearly distinguishes from siblings kya_create_card and kya_sign_card by specifying the verification function.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines4/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Describes two mutually exclusive usage patterns: looking up 'a card created in this session' (referencing sibling kya_create_card workflow) versus passing external card JSON. Lacks explicit guidance on when to verify vs. other KYA operations, but provides clear context for the two input methods.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden of behavioral disclosure. It successfully discloses that the tool 'logs the decision' (revealing a write side effect) and lists the three distinct operations performed. However, it lacks critical safety profile information (idempotency, error handling, failure modes) that would be necessary for a safety-critical tool without annotation coverage.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is optimally structured with three distinct sections: a one-line functional summary, a usage context sentence, and a structured Args block. There is no redundant or wasted prose; every line provides essential information not available in the structured schema fields.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness4/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the tool's moderate complexity (5 parameters, 3-in-1 functionality) and lack of output schema, the description adequately covers the operational scope. It could be improved by describing the return value format (boolean, object, or exception behavior) since no output schema exists to provide this information, but the parameter and behavioral documentation is comprehensive.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters5/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    The input schema has 0% description coverage (titles only). The Args section in the description compensates perfectly by documenting all 5 parameters (text, model, estimated_input_tokens, estimated_output_tokens, step_name) with clear semantics, including noting that 'model' is optional and explaining the purpose of each token estimate field.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose5/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description explicitly states the tool performs a 'unified safety check' combining 'injection scan + cost check + trace step', clearly distinguishing it from sibling tools like injection_scan, cost_guard_check, and trace_step which perform individual functions. The verb 'Run' and resource 'safety check' are specific and unambiguous.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines4/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    The description identifies this as the 'recommended single tool for pre-flight safety', providing clear contextual guidance on when to use it. However, it lacks explicit 'when-not-to-use' guidance or direct mention of the specific sibling alternatives (though the three-component breakdown implies their existence).

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries full burden. It discloses the use of Ed25519 private keys and the prerequisite call requirement. However, it lacks disclosure of mutation semantics (idempotency, what happens on re-sign), return value structure, or error conditions (e.g., if agent_id doesn't exist), which are critical for a cryptographic signing operation.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is efficiently structured with a clear action statement followed by an Args block. There is no redundant text; every sentence serves to clarify purpose, mechanism, or parameter usage. The length is appropriate for the tool's complexity.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness4/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the low complexity (single string parameter, no output schema), the description adequately covers the essential workflow context and cryptographic method. It is missing only edge case documentation (e.g., already-signed cards), which prevents a perfect score given the lack of structured metadata.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters4/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 0%, leaving the description to compensate. The Args section effectively documents the single parameter by explaining it represents 'the card to sign' and reinforcing the prerequisite constraint. It successfully adds necessary semantic meaning absent from the schema.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose5/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the specific action ('Sign'), the resource ('existing KYA card'), and the mechanism ('session's Ed25519 private key'). The mention of 'existing' effectively distinguishes this from the sibling kya_create_card, while 'Sign' distinguishes it from kya_verify_card.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines5/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Explicitly states the prerequisite workflow constraint: 'Must call kya_create_card first.' This provides clear sequencing guidance and explicitly names the required preceding sibling tool, removing ambiguity about when to use this tool.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

GitHub Badge

Glama performs regular codebase and documentation scans to:

  • Confirm that the MCP server is working as expected.
  • Confirm that there are no obvious security issues.
  • Evaluate tool definition quality.

Our badge communicates server capabilities, safety, and installation instructions.

Card Badge

agent-safety-mcp MCP server

Copy to your README.md:

Score Badge

agent-safety-mcp MCP server

Copy to your README.md:

How to claim the server?

If you are the author of the server, you simply need to authenticate using GitHub.

However, if the MCP server belongs to an organization, you need to first add glama.json to the root of your repository.

{
  "$schema": "https://glama.ai/mcp/schemas/server.json",
  "maintainers": [
    "your-github-username"
  ]
}

Then, authenticate using GitHub.

Browse examples.

How to make a release?

A "release" on Glama is not the same as a GitHub release. To create a Glama release:

  1. Claim the server if you haven't already.
  2. Go to the Dockerfile admin page, configure the build spec, and click Deploy.
  3. Once the build test succeeds, click Make Release, enter a version, and publish.

This process allows Glama to run security checks on your server and enables users to deploy it.

How to add a LICENSE?

Please follow the instructions in the GitHub documentation.

Once GitHub recognizes the license, the system will automatically detect it within a few hours.

If the license does not appear on the server after some time, you can manually trigger a new scan using the MCP server admin interface.

How to sync the server with GitHub?

Servers are automatically synced at least once per day, but you can also sync manually at any time to instantly update the server profile.

To manually sync the server, click the "Sync Server" button in the MCP server admin interface.

How is the quality score calculated?

The overall quality score combines two components: Tool Definition Quality (70%) and Server Coherence (30%).

Tool Definition Quality measures how well each tool describes itself to AI agents. Every tool is scored 1–5 across six dimensions: Purpose Clarity (25%), Usage Guidelines (20%), Behavioral Transparency (20%), Parameter Semantics (15%), Conciseness & Structure (10%), and Contextual Completeness (10%). The server-level definition quality score is calculated as 60% mean TDQS + 40% minimum TDQS, so a single poorly described tool pulls the score down.

Server Coherence evaluates how well the tools work together as a set, scoring four dimensions equally: Disambiguation (can agents tell tools apart?), Naming Consistency, Tool Count Appropriateness, and Completeness (are there gaps in the tool surface?).

Tiers are derived from the overall score: A (≥3.5), B (≥3.0), C (≥2.0), D (≥1.0), F (<1.0). B and above is considered passing.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/LuciferForge/agent-safety-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server