Provides direct API integration for prompt refinement using Google's Gemini models (2.0 Flash, 1.5 Pro), enabling LLM-powered prompt optimization and analysis.
Provides direct API integration for prompt refinement using OpenAI models (GPT-4o, GPT-4o-mini, GPT-4 Turbo), enabling LLM-powered prompt optimization and analysis.
Provides session store support for distributed multi-instance deployments and caching of refined prompts to improve performance.
PromptTuner MCP
An MCP server that helps you write better prompts for AI assistants. It analyzes, refines, and optimizes prompts to improve AI understanding and response quality.
Performance: LLM refinement 1-5s • Batch processing 100+ prompts/min
🔑 API Key Required
PromptTuner uses direct API integration with LLM providers. You'll need an API key from one of:
OpenAI (gpt-4o, gpt-4o-mini, gpt-4-turbo) - Get API key
Anthropic (Claude 3.5 Sonnet/Haiku) - Get API key
Google (Gemini 2.0 Flash, Gemini 1.5 Pro) - Get API key
Set environment variables:
Why Use PromptTuner?
Poor prompts lead to poor AI responses. PromptTuner helps by:
✅ Fixing typos and grammar - Catches 50+ common misspellings
✅ Improving clarity - Removes vague language, adds specificity
✅ Applying best practices - Chain-of-thought, few-shot, role-based prompting
✅ Scoring your prompts - Get actionable feedback with 0-100 scores
✅ Multi-provider support - Works with OpenAI, Anthropic, and Google
🎯 Production Ready
New in v1.0.0:
✅ Security Hardening: Request timeouts, X-Forwarded-For validation, LLM output validation
✅ Performance: Parallel technique application (60% faster multi-technique optimization)
✅ Testing: Comprehensive test suite with 70%+ coverage
✅ Distributed: Redis session store for multi-instance deployments
✅ Observability: Structured JSON logging, health checks, ready probes
✅ Docker: Production-ready containers with health checks
Quick Example
Before:
After (with
Installation
Usage
With Claude Desktop
Add to your claude_desktop_config.json:
Note: Replace OPENAI_API_KEY with ANTHROPIC_API_KEY or GOOGLE_API_KEY depending on your provider choice.
With MCP Inspector
HTTP Mode (Experimental)
For testing or integration with HTTP-based clients:
Custom port/host:
Docker (Recommended for Production)
Run with Docker for easy deployment and Redis caching:
The Docker setup includes:
PromptTuner MCP server on port 3000
Redis cache for improved performance
Automatic health checks
Volume persistence
Configure via environment variables in .env (see .env.example).
Tools
refine_prompt
Fix grammar, improve clarity, and apply optimization techniques to any prompt. Includes intelligent caching to speed up repeated refinements.
Parameter | Type | Default | Description |
| string | required | Prompt text to improve (plain text, Markdown, or XML) |
| string |
| Technique to apply |
| string |
| Output format |
Performance:
Caching: Identical refinements are cached (LRU, 500 entries, 1-hour TTL)
Cache Key: Based on prompt + technique + format (SHA-256 hash)
fromCache: Response includes
fromCache: truewhen served from cache
Techniques:
Technique | Description | Best For |
| Grammar/clarity | Quick fixes |
| Step-by-step reasoning | Math, logic, analysis |
| Examples | Classification, translation |
| Persona | Domain-specific tasks |
| Formatting (XML/Markdown) | Complex instructions |
| All techniques | Maximum improvement |
Target Formats:
Format | Description | Best For |
| Detect | Unknown target |
| XML tags | Claude models |
| Markdown | GPT models |
| Schema | Data extraction |
Example:
analyze_prompt
Score prompt quality across 5 dimensions and get actionable improvement suggestions.
Input:
prompt(string, required): Prompt text to improve
Returns:
Score (0-100): clarity, specificity, completeness, structure, effectiveness, overall
Characteristics: detected format, word count, complexity level
Suggestions: actionable improvements
Flags: hasTypos, isVague, missingContext
optimize_prompt
Apply multiple techniques sequentially for comprehensive prompt improvement. Returns before/after scores and diff.
Parameter | Type | Default | Description |
| string | required | Prompt text to improve |
| string[] |
| Techniques to apply in order |
| string |
| Output format |
Example:
Returns: Before/after scores, diff of changes.
detect_format
Identify target AI format (Claude XML, GPT Markdown, JSON) with confidence score.
Returns:
detectedFormat: claude, gpt, json, or autoconfidence: 0-100recommendation: Format-specific advice
compare_prompts
Compare two prompt versions side-by-side with scoring, diff, and recommendations.
Input:
Parameter | Type | Default | Description |
| string | required | First prompt |
| string | required | Second prompt |
| string |
| Label for first |
| string |
| Label for second |
Returns:
Scores: Both prompts scored across 5 dimensions (clarity, specificity, completeness, structure, effectiveness)
Winner: Which prompt is better (A, B, or tie)
Score Deltas: Numerical differences for each dimension
Improvements: What got better in Prompt B vs A
Regressions: What got worse in Prompt B vs A
Recommendation: Actionable advice on which to use
Diff: Character-level comparison
Example:
Use Cases:
A/B testing prompts
Evaluating refinement effectiveness
Tracking prompt iterations
Choosing between versions
validate_prompt
Pre-flight validation: check for issues, estimate tokens, detect anti-patterns and security risks before using a prompt.
Input:
Parameter | Type | Default | Description |
| string | required | Prompt to validate |
| string |
| AI model (claude/gpt/gemini) |
| boolean |
| Check for prompt injection attacks |
Returns:
Is Valid: Boolean (true if no errors)
Token Estimate: Approximate token count (1 token ≈ 4 chars)
Issues: Array of validation issues (error/warning/info)
Type: error, warning, or info
Message: What the issue is
Suggestion: How to fix it
Checks Performed:
Anti-patterns (vague language, missing context)
Token limits (model-specific)
Security (prompt injection patterns)
Typos (common misspellings)
Token Limits by Model:
Model | Limit |
| 200,000 |
| 128,000 |
| 1,000,000 |
| 8,000 |
Example:
Use Cases:
Pre-flight checks before sending prompts to LLMs
Security audits for user-provided prompts
Token budget planning
Quality assurance in prompt pipelines
Resources
Browse and use prompt templates:
URI | Description |
| List all available templates |
| Code review template |
| Debugging template |
| Summarization template |
| Pro/con analysis template |
| Expert persona template |
| JSON extraction template |
Categories: coding, writing, analysis, system-prompts, data-extraction
Prompts (Workflows)
Pre-built workflows for common tasks:
Prompt | Description |
| One-step optimization with single technique |
| Comprehensive optimization with all techniques |
| Score quality and get improvement suggestions |
| Educational feedback against best practices |
| Identify top 3 issues and fix iteratively |
| Suggest best techniques for prompt + task |
| Detect common prompt mistakes |
Scoring Explained
Dimension | What It Measures |
Clarity | Clear language, no vague terms ("something", "stuff") |
Specificity | Concrete details, examples, numbers |
Completeness | Role context, output format, all requirements |
Structure | Organization, formatting, sections |
Effectiveness | Overall likelihood of good AI response |
Score Interpretation:
80-100: Excellent - Minor refinements only
60-79: Good - Some improvements recommended
40-59: Fair - Notable gaps to address
0-39: Needs Work - Significant improvements needed
LLM Sampling vs Rule-Based
PromptTuner works in two modes:
LLM Sampling (when available): Uses the MCP client's LLM for intelligent refinement
Rule-Based Fallback (automatic): Uses pattern matching and dictionaries when sampling unavailable
The tool automatically falls back to rule-based refinement if your MCP client doesn't support sampling.
Development
Troubleshooting
"LLM sampling is not supported"
This is normal! The tool automatically uses rule-based refinement. For full LLM-powered refinement, use Claude Desktop or another MCP client that supports sampling.
"Prompt too long"
Maximum prompt length is 10,000 characters. Split longer prompts into sections.
HTTP mode not connecting
Check that:
Port 3000 (default) is not in use
You're using POST to
/mcpendpointHeaders include
Content-Type: application/json
Contributing
Contributions welcome! Please:
Run
npm run lint && npm run type-checkbefore committingAdd tests for new features
Update README for user-facing changes
License
MIT
Credits
Built with the Model Context Protocol SDK.