PQS - Prompt Quality Score
OfficialServer Quality Checklist
- Disambiguation5/5
Each tool targets a distinct operation: basic scoring (score_prompt), optimization (optimize_prompt), and model comparison (compare_models). While optimize_prompt includes scoring, its primary purpose is improvement, clearly distinguishing it from the free evaluation-only tier. No functional overlap exists.
Naming Consistency5/5Perfect adherence to verb_noun snake_case pattern across all three tools: compare_models, optimize_prompt, score_prompt. Naming is predictable and semantically clear.
Tool Count5/5Three tools is ideal for this narrowly scoped prompt evaluation service. Each tool represents a distinct tier (free evaluation, paid optimization, premium comparison) without bloat. Well within the optimal 3-15 range.
Completeness4/5Covers the core prompt quality lifecycle: evaluation, optimization, and benchmarking. Minor gap: no retrieval of past scores or batch processing capabilities, though the stateless API design suggests these may be out of scope. Core workflows are fully supported.
Average 4.2/5 across 3 of 3 tools scored.
See the tool scores section below for per-tool breakdowns.
This repository includes a README.md file.
This repository includes a LICENSE file.
Latest release: v1.0.1
No tool usage detected in the last 30 days. Usage tracking helps demonstrate server value.
Tip: use the "Try in Browser" feature on the server page to seed initial usage.
This repository includes a glama.json configuration file.
- This server provides 3 tools. View schema
No known security issues or vulnerabilities reported.
This server has been verified by its author.
Add related servers to improve discoverability.
Tool Scores
- Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, description carries full burden. Discloses cost model ('Free tier — no payment required') and output format (grade, score, percentile) which substitutes for missing output schema. However, lacks disclosure on idempotency, data privacy (what happens to submitted prompts), rate limits, or side effects.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences, zero waste. First sentence defines operation and returns, second states cost, third provides usage timing. Front-loaded with core functionality; excellent information density.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Adequately compensates for missing output schema by detailing return structure (grade, numeric score, percentile). Simple 2-parameter tool with 100% schema coverage requires minimal additional context. Lacks only minor behavioral details (rate limits, data retention) that would elevate to 5.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with both 'prompt' and 'vertical' fully documented in the schema. Description adds 'PQS (Prompt Quality Score)' acronym context but does not need to elaborate further given schema completeness. Baseline 3 appropriate for high-coverage schemas.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5Does the description clearly state what the tool does and how it differs from similar tools?
States specific action (Score), resource (LLM prompt), methodology (PQS), and return values (grade A-F, score/40, percentile). Verb clearly distinguishes from siblings 'compare_models' (comparison) and 'optimize_prompt' (modification).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines4/5Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit temporal guidance: 'Use this before sending any prompt to an LLM to check if it is worth running.' Clearly signals pre-validation use case. Does not explicitly name sibling alternatives but strongly implies evaluation-only intent.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Strong disclosure for a paid operation with no annotations: explicitly states cost ($0.50 USDC), payment mechanism (x402), judging methodology, and return structure. Would benefit from mentioning if the charge is immediate/reversible or if the operation is idempotent, but covers critical financial and methodological traits well.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences with zero waste: sentence 1 defines the comparison methodology, sentence 2 specifies outputs, sentence 3 discloses cost. Information is front-loaded and logically ordered. Excellent density without verbosity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Comprehensive for a 3-parameter tool with no output schema: description covers return values (winner, scores, recommendation) that would otherwise be unknown, and discloses financial requirements. Minor gap regarding error handling or async status, but adequate for safe invocation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, so baseline applies. The description mentions 'prompt' in context but does not elaborate on parameter semantics beyond what the schema already documents (prompt content, vertical enum values, API key source). No additional parameter guidance provided in description prose.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5Does the description clearly state what the tool does and how it differs from similar tools?
Excellent clarity: specifies exact action ('Compare'), specific models ('Claude vs GPT-4o'), methodology ('PQS', 'third model judge'), and output types. Clearly distinguishes from siblings optimize_prompt (single model optimization) and score_prompt (single model scoring) by emphasizing the head-to-head nature.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines4/5Does the description explain when to use this tool, when not to, or what alternatives exist?
Implicitly guides usage through return value description ('winner', 'recommendation on which model to use'), indicating this is for decision-making between models. Lacks explicit 'when not to use' or direct comparison to siblings, but context is clear enough to infer appropriate use cases.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. Discloses return structure ('Returns the original score, an optimized version...'), cost destructor, and evaluation frameworks (PEEM, RAGAS, etc.). Lacks rate limiting or retry behavior details.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
Three efficient sentences with zero waste: action/returns first, cost second (critical barrier), usage guideline third. Front-loads the dual capability immediately.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Dense description adequately compensates for missing output schema by detailing return values (score, optimized version, breakdown). Covers cost and methodology. Could briefly define PQS or explain error states for full marks.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, establishing baseline 3. Description focuses on tool behavior rather than expanding parameter semantics, which is appropriate since schema fully documents 'prompt', 'vertical', and 'api_key'.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5Does the description clearly state what the tool does and how it differs from similar tools?
Clear specific verbs ('Score AND optimize') and resource ('LLM prompt'). Capitalized 'AND' effectively distinguishes from sibling 'score_prompt' by emphasizing the dual action. Mentions specific methodology (PQS) and frameworks.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines5/5Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicit when-to-use clause ('Use this when you want to improve a prompt before running it'). Critically includes cost warning ('Costs $0.025 USDC via x402') which acts as a guideline for when NOT to use or to prepare payment.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
GitHub Badge
Glama performs regular codebase and documentation scans to:
- Confirm that the MCP server is working as expected.
- Confirm that there are no obvious security issues.
- Evaluate tool definition quality.
Our badge communicates server capabilities, safety, and installation instructions.
Card Badge
Copy to your README.md:
Score Badge
Copy to your README.md:
How to claim the server?
If you are the author of the server, you simply need to authenticate using GitHub.
However, if the MCP server belongs to an organization, you need to first add glama.json to the root of your repository.
{
"$schema": "https://glama.ai/mcp/schemas/server.json",
"maintainers": [
"your-github-username"
]
}Then, authenticate using GitHub.
Browse examples.
How to make a release?
A "release" on Glama is not the same as a GitHub release. To create a Glama release:
- Claim the server if you haven't already.
- Go to the Dockerfile admin page, configure the build spec, and click Deploy.
- Once the build test succeeds, click Make Release, enter a version, and publish.
This process allows Glama to run security checks on your server and enables users to deploy it.
How to add a LICENSE?
Please follow the instructions in the GitHub documentation.
Once GitHub recognizes the license, the system will automatically detect it within a few hours.
If the license does not appear on the server after some time, you can manually trigger a new scan using the MCP server admin interface.
How to sync the server with GitHub?
Servers are automatically synced at least once per day, but you can also sync manually at any time to instantly update the server profile.
To manually sync the server, click the "Sync Server" button in the MCP server admin interface.
How is the quality score calculated?
The overall quality score combines two components: Tool Definition Quality (70%) and Server Coherence (30%).
Tool Definition Quality measures how well each tool describes itself to AI agents. Every tool is scored 1–5 across six dimensions: Purpose Clarity (25%), Usage Guidelines (20%), Behavioral Transparency (20%), Parameter Semantics (15%), Conciseness & Structure (10%), and Contextual Completeness (10%). The server-level definition quality score is calculated as 60% mean TDQS + 40% minimum TDQS, so a single poorly described tool pulls the score down.
Server Coherence evaluates how well the tools work together as a set, scoring four dimensions equally: Disambiguation (can agents tell tools apart?), Naming Consistency, Tool Count Appropriateness, and Completeness (are there gaps in the tool surface?).
Tiers are derived from the overall score: A (≥3.5), B (≥3.0), C (≥2.0), D (≥1.0), F (<1.0). B and above is considered passing.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/OnChainAIIntel/pqs-mcp-server'
If you have feedback or need assistance with the MCP directory API, please join our Discord server