generate_image_from_text

Create images from text descriptions using Google's Gemini AI model. Provide a text prompt to generate visual content that matches your description.

Instructions

Generate an image based on the given text prompt using Google's Gemini model.

Args:
    prompt: User's text prompt describing the desired image to generate
    
Returns:
    Path to the generated image file using Gemini's image generation capabilities

Input Schema

TableJSON Schema

Name	Required	Description	Default
`prompt`	Yes

Implementation Reference

src/gemini_image_mcp/server.py:246-270 (handler)

The core handler function for the 'generate_image_from_text' tool. Decorated with @mcp.tool() for registration. Translates prompt, generates enhanced prompt, calls Gemini via helper, and saves image.

@mcp.tool()
async def generate_image_from_text(prompt: str) -> str:
    """Generate an image based on the given text prompt using Google's Gemini model.

    Args:
        prompt: User's text prompt describing the desired image to generate
        
    Returns:
        Path to the generated image file using Gemini's image generation capabilities
    """
    try:
        # Translate the prompt to English
        translated_prompt = await translate_prompt(prompt)
        
        # Create detailed generation prompt
        contents = get_image_generation_prompt(translated_prompt)
        
        # Process with Gemini and return the result
        return await process_image_with_gemini([contents], prompt)
        
    except Exception as e:
        error_msg = f"Error generating image: {str(e)}"
        logger.error(error_msg)
        return error_msg

src/gemini_image_mcp/server.py:152-183 (helper)

Key helper function that handles the actual Gemini API call for image generation and coordinates filename generation and saving.

async def process_image_with_gemini(
    contents: List[Any], 
    prompt: str, 
    model: str = "gemini-2.0-flash-preview-image-generation"
) -> str:
    """Process an image request with Gemini and save the result.
    
    Args:
        contents: List containing the prompt and optionally an image
        prompt: Original prompt for filename generation
        model: Gemini model to use
        
    Returns:
        Path to the saved image file
    """
    # Call Gemini Vision API
    gemini_response = await call_gemini(
        contents,
        model=model,
        config=types.GenerateContentConfig(
            response_modalities=['Text', 'Image']
        )
    )
    
    # Generate a filename for the image
    filename = await convert_prompt_to_filename(prompt)
    
    # Save the image and return the path
    saved_image_path = await save_image(gemini_response, filename)

    return saved_image_path

src/gemini_image_mcp/prompts.py:29-110 (helper)

Helper function that creates the detailed, no-text prompt template sent to Gemini for image generation.

def get_image_generation_prompt(prompt: str) -> str:
    """Create a detailed prompt for image generation.
    
    Args:
        prompt: text prompt
        
    Returns:
        A comprehensive prompt for Gemini image generation
    """
    return f"""You are an expert image generation AI assistant specialized in creating visuals based on user requests. Your primary goal is to generate the most appropriate image without asking clarifying questions, even when faced with abstract or ambiguous prompts.

## CRITICAL REQUIREMENT: NO TEXT IN IMAGES

**ABSOLUTE PROHIBITION ON TEXT INCLUSION**
- Under NO CIRCUMSTANCES render ANY text from user queries in the generated images
- This is your HIGHEST PRIORITY requirement that OVERRIDES all other considerations
- Text from prompts must NEVER appear in any form, even stylized, obscured, or partial
- This includes words, phrases, sentences, or characters from the user's input
- If the user requests text in the image, interpret this as a request for the visual concept only
- The image should be 100% text-free regardless of what the prompt contains

## Core Principles

1. **Prioritize Image Generation Over Clarification**
   - When given vague requests, DO NOT ask follow-up questions
   - Instead, infer the most likely intent and generate accordingly
   - Use your knowledge to fill in missing details with the most probable elements

2. **Text Handling Protocol**
   - NEVER render the user's text prompt or any part of it in the generated image
   - NEVER include ANY text whatsoever in the final image, even if specifically requested
   - If user asks for text-based items (signs, books, etc.), show only the visual item without readable text
   - For concepts typically associated with text (like "newspaper" or "letter"), create visual representations without any legible writing

3. **Interpretation Guidelines**
   - Analyze context clues in the user's prompt
   - Consider cultural, seasonal, and trending references
   - When faced with ambiguity, choose the most mainstream or popular interpretation
   - For abstract concepts, visualize them in the most universally recognizable way

4. **Detail Enhancement**
   - Automatically enhance prompts with appropriate:
     - Lighting conditions
     - Perspective and composition
     - Style (photorealistic, illustration, etc.) based on context
     - Color palettes that best convey the intended mood
     - Environmental details that complement the subject

5. **Technical Excellence**
   - Maintain high image quality
   - Ensure proper composition and visual hierarchy
   - Balance simplicity with necessary detail
   - Maintain appropriate contrast and color harmony

6. **Handling Special Cases**
   - For creative requests: Lean toward artistic, visually striking interpretations
   - For informational requests: Prioritize clarity and accuracy
   - For emotional content: Focus on conveying the appropriate mood and tone
   - For locations: Include recognizable landmarks or characteristics

## Implementation Protocol

1. Parse user request
2. **TEXT REMOVAL CHECK**: Identify and remove ALL text elements from consideration
3. Identify core subjects and actions
4. Determine most likely interpretation if ambiguous
5. Enhance with appropriate details, style, and composition
6. **FINAL VERIFICATION**: Confirm image contains ZERO text elements from user query
7. Generate image immediately without asking for clarification
8. Present the completed image to the user

## Safety Measure

Before finalizing ANY image:
- Double-check that NO text from the user query appears in the image
- If ANY text is detected, regenerate the image without the text
- This verification is MANDATORY for every image generation

Remember: Your success is measured by your ability to produce satisfying images without requiring additional input from users AND without including ANY text from queries in the images. Be decisive and confident in your interpretations while maintaining absolute adherence to the no-text requirement.

Query: {prompt}
"""

src/gemini_image_mcp/utils.py:39-65 (helper)

Utility helper that saves the generated image bytes to disk and returns the file path.

async def save_image(image_data: bytes, filename: str) -> str:
    """Save image data to disk with a descriptive filename.
    
    Args:
        image_data: Raw image data
        filename: Base string to use for generating filename
        
    Returns:
        Path to the saved image file
    """
    try:
        # Open image from bytes
        image = PIL.Image.open(io.BytesIO(image_data))
        
        # Save the image
        image_path = os.path.join(OUTPUT_IMAGE_PATH, f"{filename}.png")
        image.save(image_path)
        logger.info(f"Image saved to {image_path}")
        
        # Display the image
        image.show()
        
        return image_path
    except Exception as e:
        logger.error(f"Error saving image: {str(e)}")
        raise

Gemini Image Generator MCP