Skip to main content
Glama

Server Details

The world's first named AI prompt quality score. Score, optimize, and compare LLM prompts before they hit any model. Free tier available. Built on PEEM, RAGAS, G-Eval, and MT-Bench frameworks. x402-native on Base.

Status
Healthy
Last Tested
Transport
Streamable HTTP
URL

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client
Glama
MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.
Tool DescriptionsA

Average 4.9/5 across 2 of 2 tools scored.

Server CoherenceA
Disambiguation5/5

The two tools have clearly distinct purposes: score_prompt evaluates a prompt's quality, while optimize_prompt rewrites it to improve the score. There is no ambiguity between them.

Naming Consistency5/5

Both tool names follow a consistent verb_noun pattern (score_prompt, optimize_prompt), making them predictable and easy to understand.

Tool Count2/5

With only 2 tools, the set feels very thin for a server dedicated to prompt quality. While the tools themselves are appropriate, users might benefit from additional tools for tasks like comparing multiple prompts or generating prompt templates.

Completeness3/5

The set covers the two primary actions (scoring and optimizing), but lacks ancillary functionality like batch processing, history, or comparison across multiple prompts. This could limit agents in more complex workflows.

Available Tools

2 tools
optimize_promptAInspect

Rewrite a prompt to score higher on the PQS rubric, AND show before/after output comparisons so the user can see the impact. Returns the optimized prompt, the original PQS score, the optimized PQS score, and side-by-side sample outputs from a frontier model using both versions.

USE WHEN:

  • The user got a low score from score_prompt and asks how to improve.

  • The user explicitly asks to "improve" / "rewrite" / "fix" / "optimize" a prompt they pasted.

  • The user is dissatisfied with output quality from a previous prompt and asks how to get better results.

  • score_prompt returned a suggestion to invoke this tool.

DO NOT USE WHEN:

  • The user just asked for a score (use score_prompt only — don't double up).

  • The user wants you to write a new prompt from scratch (write it directly).

REQUIRES: A PQS API key from a Pro subscription ($19.99/month, 1,000 calls/mo, includes batch + A/B comparison). If the user has not provided one, the tool returns a clear subscription URL — pass that response to the user verbatim. Do not invent or guess API keys. There is no free trial of this tool; the user must subscribe before the first call.

COST: Counted against your Pro subscription's monthly call quota.

LATENCY: ~6-8 seconds.

ParametersJSON Schema
NameRequiredDescriptionDefault
promptYesThe prompt to optimize. Max 8000 characters.
api_keyNoPQS API key from a Pro subscription. Required. Format: pqs_live_… (32+ characters). Subscribe at https://promptqualityscore.com/pricing?utm_source=mcp&utm_medium=schema_description_v140&utm_campaign=2026-05-mcp-tools-v140 if you don't have one, or look up an existing key at https://promptqualityscore.com/account?utm_source=mcp&utm_medium=schema_description_v140&utm_campaign=2026-05-mcp-tools-v140.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, but description fully discloses behavior: returns before/after comparison, requires API key with no free trial, cost counted against subscription, latency 6-8 seconds, and action when no API key (returns subscription URL). No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Front-loaded with main action and output, structured with clear sections (USE WHEN, DO NOT USE, REQUIRES, COST, LATENCY). Every sentence adds value; no waste despite length. Efficiently organized for quick scanning.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, description covers return values, prerequisites, cost, latency, and failure mode. Parameter meaning is fully explained. Sufficient for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed parameter descriptions. The description adds value by explaining the tool's purpose (optimizing to score higher) and return values, and provides context on api_key format and subscription URL. Significantly enhances understanding beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool rewrites a prompt to improve PQS score and shows before/after output comparisons. It lists specific return values (optimized prompt, scores, side-by-side outputs) and distinguishes from sibling tool score_prompt.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit USE WHEN conditions (low score from score_prompt, user asks to improve/rewrite/optimize) and DO NOT USE conditions (just score, write from scratch). Also clarifies prerequisite (API key) and subscription requirement.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

score_promptAInspect

Score a prompt's quality across 8 dimensions BEFORE sending it to an expensive model. Returns a 0-80 score, an A-F grade, the per-dimension breakdown (clarity, specificity, context, constraints, output_format, role_definition, examples, cot_structure), and the weakest dimension.

USE WHEN:

  • The user is workshopping a prompt and asks "is this good?" / "will this work?" / "should I add more detail?"

  • The user is about to send a long or expensive prompt to GPT-4, Claude Opus, or any frontier model, especially in a batch or automation context where rework is costly.

  • The user mentions iterating on a prompt that produced poor output and wants to diagnose what's missing.

  • The user pastes a prompt and asks for feedback on it.

DO NOT USE WHEN:

  • The user is asking you to write a prompt for them (write it yourself first, then optionally call score_prompt to verify).

  • The prompt is conversational chat (this scores task-shaped prompts).

COST: Free, no API key required. Rate-limited per IP: 5/min, 10/day, 100/month. If the user exceeds the limit, the response will include a structured upgrade path with subscribe and account URLs.

LATENCY: ~2 seconds.

ParametersJSON Schema
NameRequiredDescriptionDefault
promptYesThe prompt text to score. Single prompt, not a conversation. Max 8000 characters.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Without annotations, the description fully discloses behavioral traits: it's free, with rate limits (5/min, 10/day, 100/month), latency (~2 seconds), and a structured upgrade path when limits are exceeded. No annotation contradiction exists, and the description adds value beyond what annotations would provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured, starting with the main function and output details, then use cases and limitations. It is front-loaded and every sentence contributes meaning. While comprehensive, it could be slightly more concise, but overall it efficiently conveys necessary information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (8 dimensions, rate limits, free nature), the description is complete. It covers what the tool does, what it returns, when to use, when not to use, cost, rate limits, latency, and behavior on limit exceedance. No output schema is provided, but the description adequately describes the return format.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% for the single parameter `prompt`. The description adds context beyond the schema: it specifies that the input should be a single prompt, not a conversation, and reiterates the max length (8000 characters). This adds enough value to raise the baseline from 3 to 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that the tool scores a prompt's quality across 8 dimensions before sending it to an expensive model. It specifies the output format (0-80 score, A-F grade, per-dimension breakdown, weakest dimension) and distinguishes itself from the sibling tool `optimize_prompt` by focusing on scoring rather than optimization.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit USE WHEN and DO NOT USE WHEN sections, offering clear guidance on appropriate scenarios (e.g., workshopping a prompt, about to send expensive prompt) and when to avoid (e.g., asking to write a prompt, conversational chat). It also suggests an alternative action for excluded cases, such as writing the prompt first then optionally scoring.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Sign in to create a connector for this server.

Resources