Server Quality Checklist
- Disambiguation5/5
Each tool targets a distinct scope: individual message classification (check_message_safety), session-level risk trajectory (get_session_risk), and historical audit logs (list_recent_escalations). No functional overlap exists between the three operations.
Naming Consistency5/5All tools follow a consistent verb_noun pattern using snake_case (check_message_safety, get_session_risk, list_recent_escalations). The verbs (check, get, list) clearly indicate the operation type and align with standard CRUD conventions.
Tool Count4/5Three tools is at the lower bound of the ideal range for a focused safety classification proxy. While sufficient for core classification, monitoring, and audit retrieval workflows, the surface is minimal and leaves little room for operational management capabilities.
Completeness3/5The read-only surface covers classification and observation but lacks crucial lifecycle operations: there is no tool to act on the 'should_escalate' flag (e.g., escalate, resolve), no way to reset/clear session risk trajectories, and no feedback mechanism for false positives/negatives.
Average 3.8/5 across 3 of 3 tools scored.
See the tool scores section below for per-tool breakdowns.
This repository includes a README.md file.
This repository includes a LICENSE file.
Latest release: v0.3.2
No tool usage detected in the last 30 days. Usage tracking helps demonstrate server value.
Tip: use the "Try in Browser" feature on the server page to seed initial usage.
This repository includes a glama.json configuration file.
- This server provides 3 tools. View schema
No known security issues or vulnerabilities reported.
This server has been verified by its author.
Tool Scores
- Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It compensates partially by detailing the return structure (spike_detected, trend, window_scores, etc.), which helps agents understand the data shape. However, it omits operational details like whether this is read-only, rate limits, or caching behavior.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
Uses a standard docstring format (Parameters/Returns) that efficiently organizes information. The first sentence states purpose immediately, and every subsequent line adds specific technical detail about inputs or outputs without redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the simple single-parameter input and documented return structure (dict with specific fields), the description is functionally complete. The return value documentation suffices given the explicit output schema structure provided in the description text.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters4/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, so the description must compensate. The Parameters section explicitly defines session_id as 'The session identifier to query,' which clarifies the required input. It could be improved with format expectations, but adequately covers the single parameter.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5Does the description clearly state what the tool does and how it differs from similar tools?
The opening sentence 'Return the current risk trajectory for a session' clearly identifies the action (Return) and resource (risk trajectory). It implicitly distinguishes from siblings check_message_safety (message-level) and list_recent_escalations (historical list) by focusing on session-level trajectory, though explicit differentiation is absent.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines2/5Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance provided on when to use this versus check_message_safety (which likely evaluates individual messages) or list_recent_escalations. No prerequisites, error conditions, or filtering guidance is mentioned.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It establishes the read-only nature implicitly by describing a 'Return' operation from an 'audit log,' but fails to define the time window for 'recent' events or disclose rate limits, permission requirements, or the specific fields contained within the returned `list[dict]`.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
The description uses a clean docstring format with 'Parameters' and 'Returns' sections that front-load critical information without redundancy. Every line serves a distinct purpose, efficiently documenting functionality, constraints, and return type in minimal space.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the simple two-parameter structure and the presence of an output schema, the description adequately covers the tool's contract. The only notable gap is the undefined timeframe for 'recent' events, though the parameter documentation and return type declaration provide sufficient context for invocation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters5/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 0% description coverage, requiring the description to provide all parameter semantics. It fully compensates by documenting both `limit` (with default value) and `category` (including the specific enum values `"self_harm"` and `"criminal_intent"` and the behavior when omitted).
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool 'Return[s] recent escalation events from the audit log,' specifying the verb (return), resource (escalation events), and source (audit log). While this implicitly distinguishes it from siblings like `check_message_safety` (which likely assesses individual messages) and `get_session_risk` (which retrieves risk scores), it does not explicitly clarify when to prefer this tool over those alternatives.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines2/5Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no explicit guidance on when to use this tool versus its siblings (`check_message_safety`, `get_session_risk`). It lacks statements about prerequisites, such as needing audit log access permissions, or scenarios where this historical query is preferred over real-time safety checks.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, description carries full burden and documents detailed return structure including decision flags (should_escalate, stage_reached) and trajectory tracking purpose of session_id. Could improve by noting privacy implications or side effects of classification.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
Uses efficient structured format (Parameters/Returns sections) with zero waste. Every line provides essential information not present in structured fields.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Comprehensive for a classification tool: covers input parameters despite poor schema, documents return dict structure even though output schema exists (adds value), and specifies safety categories. Minor gap in explaining relationship to escalation workflow suggested by sibling tools.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters5/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema has 0% description coverage, but the description fully compensates by documenting both parameters: 'message' is defined as 'user message to classify' and 'session_id' as 'Optional session identifier for trajectory tracking' with default behavior implied.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5Does the description clearly state what the tool does and how it differs from similar tools?
Description uses specific verb 'Classify' with clear resource 'message' and explicit categories ('self-harm or criminal intent'), clearly distinguishing from siblings get_session_risk (session-level) and list_recent_escalations (retrospective listing).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines3/5Does the description explain when to use this tool, when not to, or what alternatives exist?
Usage is implied by the specific classification domain (self-harm/criminal intent), but description lacks explicit when-to-use guidance versus get_session_risk or whether to check every message vs sampling.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
GitHub Badge
Glama performs regular codebase and documentation scans to:
- Confirm that the MCP server is working as expected.
- Confirm that there are no obvious security issues.
- Evaluate tool definition quality.
Our badge communicates server capabilities, safety, and installation instructions.
Card Badge
Copy to your README.md:
Score Badge
Copy to your README.md:
How to claim the server?
If you are the author of the server, you simply need to authenticate using GitHub.
However, if the MCP server belongs to an organization, you need to first add glama.json to the root of your repository.
{
"$schema": "https://glama.ai/mcp/schemas/server.json",
"maintainers": [
"your-github-username"
]
}Then, authenticate using GitHub.
Browse examples.
How to make a release?
A "release" on Glama is not the same as a GitHub release. To create a Glama release:
- Claim the server if you haven't already.
- Go to the Dockerfile admin page, configure the build spec, and click Deploy.
- Once the build test succeeds, click Make Release, enter a version, and publish.
This process allows Glama to run security checks on your server and enables users to deploy it.
How to add a LICENSE?
Please follow the instructions in the GitHub documentation.
Once GitHub recognizes the license, the system will automatically detect it within a few hours.
If the license does not appear on the server after some time, you can manually trigger a new scan using the MCP server admin interface.
How to sync the server with GitHub?
Servers are automatically synced at least once per day, but you can also sync manually at any time to instantly update the server profile.
To manually sync the server, click the "Sync Server" button in the MCP server admin interface.
How is the quality score calculated?
The overall quality score combines two components: Tool Definition Quality (70%) and Server Coherence (30%).
Tool Definition Quality measures how well each tool describes itself to AI agents. Every tool is scored 1–5 across six dimensions: Purpose Clarity (25%), Usage Guidelines (20%), Behavioral Transparency (20%), Parameter Semantics (15%), Conciseness & Structure (10%), and Contextual Completeness (10%). The server-level definition quality score is calculated as 60% mean TDQS + 40% minimum TDQS, so a single poorly described tool pulls the score down.
Server Coherence evaluates how well the tools work together as a set, scoring four dimensions equally: Disambiguation (can agents tell tools apart?), Naming Consistency, Tool Count Appropriateness, and Completeness (are there gaps in the tool surface?).
Tiers are derived from the overall score: A (≥3.5), B (≥3.0), C (≥2.0), D (≥1.0), F (<1.0). B and above is considered passing.