enhance_prompt
Transforms brief text descriptions into detailed image generation prompts, supporting realistic, anime, or illustration styles.
Instructions
Transform a simple idea into a professional image generation prompt. Use when the user provides a brief description (e.g., "a cat in a garden") and needs a detailed, high-quality prompt. Combine with gallery inspiration for best results. Free, no API key needed.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| prompt | Yes | The simple prompt to enhance (e.g., "a cat in a garden") | |
| style | No | Target visual style: realistic (photorealistic), anime (2D/Japanese), illustration (concept art). Use "realistic" for general/photorealistic generation (GPT Image, Nanobanana, Seedream, Midjourney V8.1 in default mode, etc.). Use "anime" when the user wants anime/illustration output — V8.1 and most general-purpose models follow the prompt and benefit from explicit anime trigger words; the default "realistic" produces prompts poorly suited for stylized output. | realistic |
Implementation Reference
- src/tools/enhance-prompt.ts:16-32 (handler)The registerEnhancePrompt function registers the 'enhance_prompt' tool on the MCP server. The handler (lines 22-31) receives a prompt string and style, calls getSystemPrompt() to get a style-specific system prompt, and returns a text response instructing the host LLM to enhance the user's prompt using those guidelines.
export function registerEnhancePrompt(server: McpServer) { server.tool( 'enhance_prompt', 'Transform a simple idea into a professional image generation prompt. Use when the user provides a brief description (e.g., "a cat in a garden") and needs a detailed, high-quality prompt. Combine with gallery inspiration for best results. Free, no API key needed.', enhancePromptSchema, { readOnlyHint: true }, async ({ prompt, style }) => { const systemPrompt = getSystemPrompt(style as PromptStyle) return { content: [{ type: 'text' as const, text: `Please enhance the following prompt using these guidelines:\n\n---\n${systemPrompt}\n---\n\nUser's prompt to enhance:\n"${prompt}"\n\nGenerate the enhanced prompt now. Then show it to the user and ask if they'd like to generate an image with it (call generate_image if they confirm).`, }], } } ) - src/tools/enhance-prompt.ts:10-14 (schema)Input schema for enhance_prompt using Zod. Defines 'prompt' (string describing the simple idea) and 'style' (enum: realistic, anime, illustration, defaults to 'realistic') with detailed descriptions.
export const enhancePromptSchema = { prompt: z.string().describe('The simple prompt to enhance (e.g., "a cat in a garden")'), style: z.enum(['realistic', 'anime', 'illustration']).optional().default('realistic') .describe('Target visual style: realistic (photorealistic), anime (2D/Japanese), illustration (concept art). Use "realistic" for general/photorealistic generation (GPT Image, Nanobanana, Seedream, Midjourney V8.1 in default mode, etc.). Use "anime" when the user wants anime/illustration output — V8.1 and most general-purpose models follow the prompt and benefit from explicit anime trigger words; the default "realistic" produces prompts poorly suited for stylized output.'), } - src/server.ts:264-265 (registration)Registration call in server.ts: registerEnhancePrompt(server) is invoked as a free feature requiring no configuration.
// Free features (no configuration required) registerEnhancePrompt(server) - src/lib/prompts.ts:109-119 (helper)The getSystemPrompt helper function selects and returns the appropriate prompt enhancement system prompt (REALISTIC_SYSTEM_PROMPT, ANIME_SYSTEM_PROMPT, or ILLUSTRATION_SYSTEM_PROMPT) based on the requested style.
export function getSystemPrompt(style: PromptStyle): string { switch (style) { case 'anime': return ANIME_SYSTEM_PROMPT case 'illustration': return ILLUSTRATION_SYSTEM_PROMPT case 'realistic': default: return REALISTIC_SYSTEM_PROMPT } } - src/lib/prompts.ts:11-56 (helper)REALISTIC_SYSTEM_PROMPT — the default/realistic style system prompt for enhancing prompts. Provides detailed guidelines for photorealistic/general-purpose image generation models.
export const REALISTIC_SYSTEM_PROMPT = `# Role You are a Senior Visual Logic Analyst specializing in reverse-engineering imagery for next-generation, high-reasoning AI models (like Gemini 3 Pro Image). # The Paradigm Shift (Crucial) Unlike older models (e.g., Midjourney) that rely on "vibe tags," next-gen models require **logical, coherent, and physically accurate specifications.** Your goal is not just to describe *what* is in the image, but to explain the **visual logic** of *how* the scene is constructed. # Analysis Protocol (The "Blueprint" Method) When analyzing an image, apply these four dimensions derived from professional prompt engineering logic: 1. **Technical Precision over Feeling (Rule 1):** * *Avoid vague vibes:* Don't just say "cinematic" or "sad." * *Describe the technical cause:* Translate vibes into lighting and composition techniques. (e.g., instead of "sad," use "overcast diffused lighting, desaturated cool color palette, isolated composition"). * *Use Terminology:* Use specific terms like "chiaroscuro," "atmospheric haze," "subsurface scattering," "photorealistic rendering." 2. **Quantifiable & Spatial Logic (Rule 2):** * Define spatial relationships clearly (foreground, middle ground, background). * Estimate technical parameters: "Shot on a 50mm prime lens at f/1.4" (if shallow depth of field), "Iso-metric view," "Three-point lighting setup." 3. **Material & Sensory Physics (Rule 4):** * Describe how materials interact with light and environment. * *Stack senses:* Not just "wet ground," but "asphalt slick with rain, reflecting distorted neon signs, paved texture visible." * *Describe textures:* "Brushed aluminum," "worn leather patina," "translucent biological membrane." 4. **Cohesive Narrative Structure:** * The final prompt must read like a coherent, detailed paragraph from a novel or a director's script, ensuring the reasoning model understands the *context* of every element. # Output Structure (The Hybrid Blueprint) To maximize clarity for a reasoning model, output the prompt in two parts: a dense narrative, followed by a structured technical breakdown. **Part 1: The Narrative Specification (A detailed, coherent paragraph):** [Describe the main subject, action, and their immediate interaction with the environment. Detail the textures, the specific lighting source and its effect on the materials, and the overall mood created by these technical choices. Ensure logical flow between sentences.] **Part 2: Structured Technical Metadata (The "Cheat Sheet"):** * **Visual Style:** [e.g., Photorealistic, 3D Render (Octane), Oil Painting] * **Key Elements:** [List 3-5 crucial objects/subjects] * **Lighting & Color:** [e.g., Softbox side-lighting, warm tungsten palette] * **Composition/Camera:** [e.g., Low-angle, 35mm lens, high detail] # Strict Output Protocol 1. Output **ONLY** the structured response as shown above. 2. Do NOT add any conversational filler text. 3. Start directly with the Narrative Specification paragraph.`