Skip to main content
Glama

Server Details

The world's first named AI prompt quality score. Score, optimize, and compare LLM prompts before they hit any model. Free tier available. Built on PEEM, RAGAS, G-Eval, and MT-Bench frameworks. x402-native on Base.

Status
Healthy
Last Tested
Transport
Streamable HTTP
URL

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client
Glama
MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.
Tool DescriptionsA

Average 4/5 across 3 of 3 tools scored.

Server CoherenceA
Disambiguation5/5

Each tool has a clearly distinct purpose: compare_models is for head-to-head model comparison, optimize_prompt is for scoring and optimization, and score_prompt is for scoring only. There is no overlap in functionality, making tool selection unambiguous.

Naming Consistency5/5

All tool names follow a consistent verb_noun pattern (compare_models, optimize_prompt, score_prompt) with clear, descriptive verbs. There are no deviations in naming conventions, making the set predictable and readable.

Tool Count3/5

With only 3 tools, the server feels thin for the domain of prompt quality scoring and optimization, as it lacks broader operations like managing prompt history or batch processing. However, the core functions are covered, making it borderline appropriate.

Completeness4/5

The tools cover key workflows: scoring, optimization, and model comparison, with no dead ends. A minor gap exists in lacking tools for managing or storing prompts over time, but agents can work around this for basic use cases.

Available Tools

2 tools
optimize_promptAInspect

Rewrites your prompt to fix the issues score_prompt found. Returns the improved version, what changed, and why. Run score_prompt first (free) to see what is broken, then use this tool to fix it. Requires an API key from https://promptqualityscore.com/?utm_source=mcp&utm_medium=tool_description&utm_campaign=2026-05-mcp-tools

ParametersJSON Schema
NameRequiredDescriptionDefault
promptYesThe prompt to optimize
api_keyYesPQS API key. Get one at https://promptqualityscore.com?utm_source=mcp&utm_medium=schema_description&utm_campaign=2026-05-mcp-tools
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively adds context beyond the input schema by specifying the return values ('score, optimized prompt, and 8-dimension breakdown'), cost details ('Costs $0.025 USDC via x402'), and the frameworks used ('based on PEEM, RAGAS, G-Eval, MT-Bench'). However, it lacks information on rate limits, error handling, or authentication requirements beyond the API key.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is highly concise and front-loaded, packing essential information into two sentences: the core functionality and the cost. Every sentence earns its place by providing critical details without redundancy, making it efficient for an AI agent to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (scoring and optimization with multiple frameworks) and the absence of annotations and output schema, the description does a good job of covering key aspects like return values and cost. However, it could improve by mentioning potential limitations or the format of the 8-dimension breakdown, leaving some gaps in completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters thoroughly. The description does not add any meaningful semantic details beyond what the schema provides (e.g., it doesn't explain the significance of the 'vertical' parameter or provide examples). Baseline 3 is appropriate as the schema handles parameter documentation adequately.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('score AND optimize') and resource ('any LLM prompt using PQS'), distinguishing it from sibling tools like 'score_prompt' (which only scores) and 'compare_models' (which compares models rather than optimizing prompts). It explicitly mentions the dual functionality of scoring and optimization.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context by mentioning 'any LLM prompt' and the cost, but it does not explicitly state when to use this tool versus alternatives like 'score_prompt' or 'compare_models'. No exclusions or prerequisites are provided, leaving the agent to infer usage based on the need for both scoring and optimization.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

score_promptAInspect

Checks prompt quality before Claude answers. Returns an A-F grade in 2 seconds, catches vague instructions, missing context, and ambiguity that produce bad answers. Free, no API key. Ask 'score this prompt' or 'check this before answering' when you want better output.

ParametersJSON Schema
NameRequiredDescriptionDefault
promptYesThe prompt to score
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions the tool is 'free — no payment required' (useful context) and describes the output format (grade, score, percentile), but lacks details on rate limits, error handling, or performance characteristics.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, with three concise sentences that each add value: stating the purpose, describing the output, and providing usage guidance with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (2 parameters, no output schema, no annotations), the description is reasonably complete—covering purpose, output format, and usage context—though it could benefit from more behavioral details like error cases or limitations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters thoroughly. The description adds no additional parameter semantics beyond what the schema provides, maintaining the baseline score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('score any LLM prompt for quality using PQS') and resources ('prompt'), and distinguishes it from siblings by specifying its unique function of quality scoring rather than comparison or optimization.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use the tool ('use before sending any prompt to an LLM'), but does not explicitly mention when not to use it or name alternatives like sibling tools compare_models or optimize_prompt.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Sign in to create a connector for this server.

Resources