PQS - Prompt Quality Score
Server Details
The world's first named AI prompt quality score. Score, optimize, and compare LLM prompts before they hit any model. Free tier available. Built on PEEM, RAGAS, G-Eval, and MT-Bench frameworks. x402-native on Base.
- Status
- Healthy
- Last Tested
- Transport
- Streamable HTTP
- URL
Glama MCP Gateway
Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.
Full call logging
Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.
Tool access control
Enable or disable individual tools per connector, so you decide what your agents can and cannot do.
Managed credentials
Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.
Usage analytics
See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.
Tool Definition Quality
Score is being calculated. Check back soon.
Available Tools
3 toolscompare_modelsAInspect
Compare Claude vs GPT-4o on the same prompt. Scored head-to-head by a third model judge. Returns winner, scores, and recommendation. Costs $0.50 USDC via x402.
| Name | Required | Description | Default |
|---|---|---|---|
| prompt | Yes | The prompt to compare | |
| api_key | Yes | PQS API key. Get one at pqs.onchainintel.net | |
| vertical | No | Domain context. Defaults to general. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes the process (comparison with a third-model judge), output (winner, scores, recommendation), and a critical behavioral trait (cost of $0.50 USDC via x402). However, it does not cover other potential behaviors like rate limits, error handling, or execution time.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is highly concise and front-loaded, with every sentence adding value: it explains the comparison process, the output, and the cost. There is no wasted text, and the structure efficiently communicates essential information in a compact form.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity of a comparative AI evaluation tool with no annotations and no output schema, the description does a good job by covering the process, output, and cost. However, it lacks details on the output format (e.g., structure of scores, what 'recommendation' entails) and any prerequisites beyond the api_key, leaving some gaps for full contextual understanding.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents all parameters thoroughly. The description does not add any additional meaning or context beyond what the schema provides for the parameters, such as explaining the significance of the 'vertical' enum choices or how the 'api_key' is used. Baseline 3 is appropriate when the schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('compare Claude vs GPT-4o on the same prompt'), the resource (the two AI models), and the mechanism ('scored head-to-head by a third model judge'). It distinguishes itself from sibling tools like 'optimize_prompt' and 'score_prompt' by focusing on comparative evaluation rather than optimization or scoring of a single prompt.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage for comparing two specific AI models, but does not explicitly state when to use this tool versus alternatives like 'score_prompt' or 'optimize_prompt'. It mentions the cost, which provides some context, but lacks explicit guidance on scenarios where this comparison is preferred over other methods.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
optimize_promptAInspect
Score AND optimize any LLM prompt using PQS. Returns score, optimized prompt, and 8-dimension breakdown based on PEEM, RAGAS, G-Eval, MT-Bench. Costs $0.025 USDC via x402.
| Name | Required | Description | Default |
|---|---|---|---|
| prompt | Yes | The prompt to optimize | |
| api_key | Yes | PQS API key. Get one at pqs.onchainintel.net | |
| vertical | No | Domain context. Defaults to general. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively adds context beyond the input schema by specifying the return values ('score, optimized prompt, and 8-dimension breakdown'), cost details ('Costs $0.025 USDC via x402'), and the frameworks used ('based on PEEM, RAGAS, G-Eval, MT-Bench'). However, it lacks information on rate limits, error handling, or authentication requirements beyond the API key.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is highly concise and front-loaded, packing essential information into two sentences: the core functionality and the cost. Every sentence earns its place by providing critical details without redundancy, making it efficient for an AI agent to parse.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (scoring and optimization with multiple frameworks) and the absence of annotations and output schema, the description does a good job of covering key aspects like return values and cost. However, it could improve by mentioning potential limitations or the format of the 8-dimension breakdown, leaving some gaps in completeness.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents all parameters thoroughly. The description does not add any meaningful semantic details beyond what the schema provides (e.g., it doesn't explain the significance of the 'vertical' parameter or provide examples). Baseline 3 is appropriate as the schema handles parameter documentation adequately.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose with specific verbs ('score AND optimize') and resource ('any LLM prompt using PQS'), distinguishing it from sibling tools like 'score_prompt' (which only scores) and 'compare_models' (which compares models rather than optimizing prompts). It explicitly mentions the dual functionality of scoring and optimization.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage context by mentioning 'any LLM prompt' and the cost, but it does not explicitly state when to use this tool versus alternatives like 'score_prompt' or 'compare_models'. No exclusions or prerequisites are provided, leaving the agent to infer usage based on the need for both scoring and optimization.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
score_promptAInspect
Score any LLM prompt for quality using PQS. Returns a grade (A-F), score out of 40, and percentile. Free — no payment required. Use before sending any prompt to an LLM.
| Name | Required | Description | Default |
|---|---|---|---|
| prompt | Yes | The prompt to score | |
| vertical | No | Domain context for scoring. Defaults to general. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions the tool is 'free — no payment required' (useful context) and describes the output format (grade, score, percentile), but lacks details on rate limits, error handling, or performance characteristics.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is appropriately sized and front-loaded, with three concise sentences that each add value: stating the purpose, describing the output, and providing usage guidance with no wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's moderate complexity (2 parameters, no output schema, no annotations), the description is reasonably complete—covering purpose, output format, and usage context—though it could benefit from more behavioral details like error cases or limitations.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents both parameters thoroughly. The description adds no additional parameter semantics beyond what the schema provides, maintaining the baseline score.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose with specific verbs ('score any LLM prompt for quality using PQS') and resources ('prompt'), and distinguishes it from siblings by specifying its unique function of quality scoring rather than comparison or optimization.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for when to use the tool ('use before sending any prompt to an LLM'), but does not explicitly mention when not to use it or name alternatives like sibling tools compare_models or optimize_prompt.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:
{
"$schema": "https://glama.ai/mcp/schemas/connector.json",
"maintainers": [{ "email": "your-email@example.com" }]
}The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.
Control your server's listing on Glama, including description and metadata
Access analytics and receive server usage reports
Get monitoring and health status updates for your server
Feature your server to boost visibility and reach more users
For users:
Full audit trail – every tool call is logged with inputs and outputs for compliance and debugging
Granular tool control – enable or disable individual tools per connector to limit what your AI agents can do
Centralized credential management – store and rotate API keys and OAuth tokens in one place
Change alerts – get notified when a connector changes its schema, adds or removes tools, or updates tool definitions, so nothing breaks silently
For server owners:
Proven adoption – public usage metrics on your listing show real-world traction and build trust with prospective users
Tool-level analytics – see which tools are being used most, helping you prioritize development and documentation
Direct user feedback – users can report issues and suggest improvements through the listing, giving you a channel you would not have otherwise
The connector status is unhealthy when Glama is unable to successfully connect to the server. This can happen for several reasons:
The server is experiencing an outage
The URL of the server is wrong
Credentials required to access the server are missing or invalid
If you are the owner of this MCP connector and would like to make modifications to the listing, including providing test credentials for accessing the server, please contact support@glama.ai.
Discussions
No comments yet. Be the first to start the discussion!