Skip to main content
Glama

fetch_images

Retrieve images from URLs or local files and convert them into LLM-compatible formats, with automatic compression for large files.

Instructions

Fetch and process images from URLs or local file paths, returning them in a format suitable for LLMs.

This tool accepts a list of image sources which can be either:
1. URLs pointing to web-hosted images (http:// or https://)
2. Local file paths pointing to images stored on the local filesystem (e.g., "C:/images/photo1.jpg")

For a single image, provide a one-element list. The function will process images in parallel
when multiple sources are provided. Images that exceed the size limit (1MB) will be automatically 
compressed while maintaining aspect ratio and reasonable quality.

Args:
    image_sources: A list of image URLs or local file paths. For a single image, provide a one-element list.
    
Returns:
    A list of Image objects or None values (if processing failed) in the same order as the input sources.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
image_sourcesYes

Implementation Reference

  • The main handler function for the 'fetch_images' tool. It validates input, calls process_images_async to handle URLs and local files concurrently, extracts Image objects, and returns a list of processed Images or None on failure. Includes decorator for registration and schema docstring.
    @mcp.tool()
    async def fetch_images(image_sources: List[str], ctx: Context) -> List[Image | None]:
        """
        Fetch and process images from URLs or local file paths, returning them in a format suitable for LLMs.
        
        This tool accepts a list of image sources which can be either:
        1. URLs pointing to web-hosted images (http:// or https://)
        2. Local file paths pointing to images stored on the local filesystem (e.g., "C:/images/photo1.jpg")
        
        For a single image, provide a one-element list. The function will process images in parallel
        when multiple sources are provided. Images that exceed the size limit (1MB) will be automatically 
        compressed while maintaining aspect ratio and reasonable quality.
        
        Args:
            image_sources: A list of image URLs or local file paths. For a single image, provide a one-element list.
            
        Returns:
            A list of Image objects or None values (if processing failed) in the same order as the input sources.
        """
        try:
            start_time = asyncio.get_event_loop().time()
            
            # Validate input
            if not image_sources:
                ctx.error("No image sources provided")
                logger.error("fetch_images called with empty source list")
                return []
            
            # Log the types of sources we're processing
            url_count = sum(1 for src in image_sources if is_url(src))
            local_count = len(image_sources) - url_count
            logger.debug(f"Processing {len(image_sources)} image sources: {url_count} URLs and {local_count} local files")
            
            # Process all images
            results = await process_images_async(image_sources, ctx)
            
            # Extract just the Image objects or None values
            image_results = []
            for result in results:
                if "image" in result:
                    image_results.append(result["image"])
                else:
                    image_results.append(None)
            
            elapsed = asyncio.get_event_loop().time() - start_time
            success_count = sum(1 for r in image_results if r is not None)
            
            logger.debug(
                f"Processed {len(image_sources)} images in {elapsed:.2f} seconds. "
                f"Success: {success_count}, Failed: {len(image_sources) - success_count}"
            )
            
            return image_results
        except Exception as e:
            logger.exception("Error in fetch_images")
            ctx.error(f"Failed to process images: {str(e)}")
            return [None] * len(image_sources)
  • Docstring defining the tool schema, including description, parameters (image_sources: List[str]), and return type (List[Image | None]).
    """
    Fetch and process images from URLs or local file paths, returning them in a format suitable for LLMs.
    
    This tool accepts a list of image sources which can be either:
    1. URLs pointing to web-hosted images (http:// or https://)
    2. Local file paths pointing to images stored on the local filesystem (e.g., "C:/images/photo1.jpg")
    
    For a single image, provide a one-element list. The function will process images in parallel
    when multiple sources are provided. Images that exceed the size limit (1MB) will be automatically 
    compressed while maintaining aspect ratio and reasonable quality.
    
    Args:
        image_sources: A list of image URLs or local file paths. For a single image, provide a one-element list.
        
    Returns:
        A list of Image objects or None values (if processing failed) in the same order as the input sources.
  • Key helper function that separates URLs from local paths, processes them concurrently using asyncio.gather (fetch_single_image for URLs, process_local_image for locals), and preserves input order in results.
    async def process_images_async(image_sources: List[str], ctx: Context) -> List[Dict[str, Any]]:
        """Process multiple images (URLs or local files) concurrently."""
        if not image_sources:
            raise ValueError("No image sources provided")
        
        # Separate URLs from local file paths
        urls = [src for src in image_sources if is_url(src)]
        local_paths = [src for src in image_sources if not is_url(src)]
        
        results = []
        
        # Process URLs if any
        if urls:
            logger.debug(f"Processing {len(urls)} URLs")
            async with httpx.AsyncClient() as client:
                url_tasks = [fetch_single_image(url, client, ctx) for url in urls]
                url_results = await asyncio.gather(*url_tasks)
                results.extend(url_results)
        
        # Process local files if any
        if local_paths:
            logger.debug(f"Processing {len(local_paths)} local files")
            local_tasks = [process_local_image(path, ctx) for path in local_paths]
            local_results = await asyncio.gather(*local_tasks)
            results.extend(local_results)
        
        # Ensure results are in the same order as input sources
        ordered_results = []
        for src in image_sources:
            for result in results:
                if (src == result.get("url", None)) or (src == result.get("path", None)):
                    ordered_results.append(result)
                    break
        
        return ordered_results
  • Core helper for processing image bytes: logs dimensions, handles small images directly, for large ones compresses iteratively by reducing quality or scaling to fit under 800KB, converts to JPEG.
    async def process_image_data(data: bytes, content_type: str, image_source: str, ctx: Context) -> Image | None:
        """Process image data and return an MCP Image object."""
        try:
            # If image is not large, try to log dimensions without processing
            if len(data) <= 1048576:
                try:
                    with PILImage.open(BytesIO(data)) as img:
                        width, height = img.size
                        logger.debug(f"Original image dimensions from {image_source}: {width}x{height}")
                        logger.debug(f"Image format from PIL: {img.format}, mode: {img.mode}")
                except Exception as e:
                    logger.debug(f"Could not determine dimensions for {image_source}: {e}")
                
                # Ensure content_type is valid and doesn't include 'image/'
                if content_type.startswith('image/'):
                    content_type = content_type.split('/')[-1]
                
                logger.debug(f"Creating Image object with format: {content_type}")
                return Image(data=data, format=content_type)
    
            # For large images, save to temp file and process
            temp_path = os.path.join(TEMP_DIR, f"temp_image_{hash(image_source)}." + content_type.split('/')[-1])
            with open(temp_path, "wb") as f:
                f.write(data)
            
            try:
                # First pass: get dimensions and basic info
                with PILImage.open(temp_path) as img:
                    orig_width, orig_height = img.size
                    orig_format = img.format
                    orig_mode = img.mode
                    logger.debug(f"Original image dimensions from {image_source}: {orig_width}x{orig_height}")
                    logger.debug(f"Large image format from PIL: {orig_format}, mode: {orig_mode}")
                
                # Calculate optimal resize factor if image is very large
                max_dimension = max(orig_width, orig_height)
                initial_scale = 1.0
                if max_dimension > 3000:
                    initial_scale = 3000 / max_dimension
                    logger.debug(f"Very large image detected ({max_dimension}px), will start with scale factor: {initial_scale}")
                
                # Second pass: process the image
                with PILImage.open(temp_path) as img:
                    if img.mode in ('RGBA', 'P'):
                        img = img.convert('RGB')
                    
                    # Apply initial scale if needed
                    if initial_scale < 1.0:
                        width = int(orig_width * initial_scale)
                        height = int(orig_height * initial_scale)
                        img = img.resize((width, height), PILImage.LANCZOS)
                    else:
                        width, height = img.size
                    
                    quality = 85
                    scale_factor = 1.0
                    
                    while True:
                        img_byte_arr = BytesIO()
                        
                        # Create a copy for this iteration to avoid accumulating transforms
                        if scale_factor < 1.0:
                            current_width = int(width * scale_factor)
                            current_height = int(height * scale_factor)
                            current_img = img.resize((current_width, current_height), PILImage.LANCZOS)
                        else:
                            current_img = img
                            current_width, current_height = width, height
                        
                        current_img.save(img_byte_arr, format='JPEG', quality=quality, optimize=True)
                        processed_data = img_byte_arr.getvalue()
                        
                        # Clean up the temporary image if we created one
                        if scale_factor < 1.0 and hasattr(current_img, 'close'):
                            current_img.close()
                        
                        # Target 800KB to leave buffer for any MCP overhead
                        if len(processed_data) <= 819200:  # 800KB
                            logger.debug(f"Processed image dimensions from {image_source}: {current_width}x{current_height} (quality={quality})")
                            logger.debug(f"Returning processed image with format: jpeg, size: {len(processed_data)} bytes")
                            return Image(data=processed_data, format='jpeg')
                        
                        # Try reducing quality first
                        if quality > 20:
                            quality -= 10
                            logger.debug(f"Reducing quality to {quality} for {image_source}, current size: {len(processed_data)} bytes")
                        else:
                            # Then try scaling down
                            scale_factor *= 0.8
                            if current_width * scale_factor < 200 or current_height * scale_factor < 200:
                                ctx.error("Unable to compress image to acceptable size while maintaining quality")
                                logger.error(f"Failed processing image from {image_source}: dimensions too small")
                                return None
                            logger.debug(f"Applying scale factor {scale_factor} to image from {image_source}")
                            quality = 85  # Reset quality when changing size
            except MemoryError as e:
                ctx.error(f"Out of memory processing large image: {str(e)}")
                logger.error(f"MemoryError processing image from {image_source}: {str(e)}")
                return None
            except Exception as e:
                ctx.error(f"Image processing error: {str(e)}")
                logger.exception(f"Exception processing image from {image_source}")
                return None
            finally:
                if os.path.exists(temp_path):
                    os.remove(temp_path)
                    
        except Exception as e:
            ctx.error(f"Error processing image: {str(e)}")
            logger.exception(f"Unexpected error processing {image_source}")
            return None
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and discloses key behavioral traits: parallel processing for multiple images, automatic compression for images over 1MB with aspect ratio and quality preservation, and failure handling (returns None for failed processing). It doesn't cover aspects like rate limits or authentication needs, but provides substantial operational context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the core purpose, followed by detailed input specifications, processing behavior, and return values. Every sentence adds value without redundancy, and it's appropriately sized for the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (1 parameter, no output schema, no annotations), the description is largely complete: it covers purpose, input semantics, processing behavior, and return format. However, it lacks details on the 'Image objects' structure (e.g., format, metadata) and any error specifics, which would enhance completeness for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 0%, so the description must compensate fully. It clearly explains the single parameter 'image_sources' as a list of URLs or file paths, specifies format examples (http/https URLs, local paths like 'C:/images/photo1.jpg'), and clarifies handling for single images (one-element list). This adds comprehensive meaning beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('fetch and process images') and resources ('from URLs or local file paths'), and distinguishes its output format ('suitable for LLMs'). With no sibling tools, it fully defines its scope without redundancy.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage by specifying input types (URLs or file paths) and handling of single vs. multiple images, but lacks explicit guidance on when to use this tool versus alternatives (e.g., other image tools or direct file handling). With no siblings, this is less critical but still a gap.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/IA-Programming/mcp-images'

If you have feedback or need assistance with the MCP directory API, please join our Discord server