Skip to main content
Glama
LumabyteCo

Clarifyprompt-MCP

critique_prompt

Score a prompt on clarity, specificity, and other dimensions, receive per-criterion rationale and concrete suggestions, and get an improved version if the score falls below threshold.

Instructions

LLM-as-judge for a prompt. Scores it 0–10 across 5 default dimensions (clarity, specificity, intent_alignment, format_fitness, length_appropriateness) — or your own custom criteria — and returns per-dimension rationale + concrete suggestions, an overall score, and a verdict (accept / revise / reject). When the score is below revise_threshold (default 7.0), the tool also returns an improvedPrompt you can use as a drop-in replacement. Use it pre-flight (is this prompt good enough for the expensive model?), postmortem (was the prompt the cause of a bad output?), or to A/B-pick the best of N optimization variants. Pass original_prompt when critiquing an optimized version so the judge can verify intent was preserved.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
promptYesThe candidate prompt to critique.
original_promptNoIf `prompt` is an optimized version, the user's original ask. Used for the intent_alignment dimension.
categoryNo
cwdNo
file_pathNo
file_languageNo
file_excerptNo
user_localeNo
criteriaNoOverride the default 5 criteria. Up to ~8 dimensions; more bloats the judge call.
revise_thresholdNoOverall score below this triggers the rewrite pass. Default 7.0.
skip_rewriteNoSkip the rewrite pass even when below threshold (faster; just returns scores).
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description fully discloses key behaviors: it returns per-dimension rationale, concrete suggestions, an overall score, and a verdict. It explains that when below 'revise_threshold' (default 7.0), it returns an 'improvedPrompt'. It also mentions custom criteria and skip_rewrite. It does not discuss side effects or costs, but for a critique tool the disclosure is thorough.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core action and results, then explains use cases and special parameters. Each sentence adds value without redundancy. It is appropriately sized for the complexity—neither too terse nor verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 11 parameters and no output schema, the description covers the main function, return values (rationale, suggestions, score, verdict, improvedPrompt), and key optional parameters. It does not explain every parameter, but the core functionality is well-documented. The output structure is sufficiently described for an agent to understand what to expect.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 45%, so the description must add meaning. It does for key parameters: 'prompt' (candidate prompt), 'original_prompt' (intent preservation for optimized versions), 'criteria' (custom dimensions), 'revise_threshold', and 'skip_rewrite'. However, parameters like 'cwd', 'file_path', 'file_language', 'file_excerpt', and 'user_locale' are not explained in the description, leaving gaps.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description starts with 'LLM-as-judge for a prompt', clearly stating the tool's core purpose. It specifies it scores 0–10 across dimensions, returns rationale, suggestions, overall score, and a verdict (accept/revise/reject). The name and verb 'critique' align, and the description distinguishes from siblings by mentioning pre-flight, postmortem, and A/B testing use cases.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use the tool: 'pre-flight', 'postmortem', or 'to A/B-pick the best of N optimization variants'. It also advises passing 'original_prompt' when critiquing an optimized version. However, it does not mention when not to use it or provide explicit alternatives among the sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/LumabyteCo/clarifyprompt-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server