Skip to main content
Glama

process_images

Process images in a directory with operations like enhancement, OCR text extraction, resizing, and deduplication to organize and extract information from visual content.

Instructions

Process images in a directory with various operations.

Args:
    image_dir: Directory containing images to process
    operations: List of operations (enhance, ocr, resize, deduplicate)
    ocr_language: Language for OCR processing (default: eng)

Returns:
    JSON string with processing results.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
image_dirYes
operationsNo
ocr_languageNoeng

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • The primary handler for the 'process_images' MCP tool, decorated with @mcp.tool() for automatic registration and schema inference from type hints. Implements the tool logic by delegating to ImageProcessor.process_batch.
    @mcp.tool()
    async def process_images(
        image_dir: str,
        operations: List[str] = ["enhance"],
        ocr_language: str = "eng"
    ) -> str:
        """
        Process images in a directory with various operations.
        
        Args:
            image_dir: Directory containing images to process
            operations: List of operations (enhance, ocr, resize, deduplicate)
            ocr_language: Language for OCR processing (default: eng)
        
        Returns:
            JSON string with processing results.
        """
        try:
            ip = get_image_processor()
            results = ip.process_batch(image_dir, operations)
            
            # Add OCR language info to results if OCR was performed
            if "ocr" in operations:
                for ocr_result in results.get("ocr_results", []):
                    ocr_result["language"] = ocr_language
            
            result = {
                "status": "success",
                "image_directory": image_dir,
                "operations": operations,
                "results": results
            }
            
            return json.dumps(result, indent=2)
            
        except Exception as e:
            logger.error(f"Failed to process images: {e}")
            return json.dumps({
                "status": "error",
                "error": str(e),
                "image_directory": image_dir,
                "operations": operations
            })
  • Core implementation of batch image processing (enhance, OCR, resize, deduplicate) in ImageProcessor.process_batch, directly called by the tool handler to perform the specified operations on images in a directory.
    def process_batch(self, image_dir: str, operations: List[str] = None) -> Dict[str, List[str]]:
        """
        Process a batch of images in a directory.
        
        Args:
            image_dir: Directory containing images
            operations: List of operations ('enhance', 'ocr', 'resize', 'deduplicate')
        
        Returns:
            Dictionary with results of operations
        """
        if operations is None:
            operations = ['enhance']
        
        image_paths = []
        for ext in self.supported_formats:
            image_paths.extend(Path(image_dir).glob(f"*{ext}"))
            image_paths.extend(Path(image_dir).glob(f"*{ext.upper()}"))
        
        image_paths = [str(p) for p in image_paths]
        
        results = {
            'processed_files': [],
            'enhanced_files': [],
            'ocr_results': [],
            'resized_files': [],
            'duplicates': {}
        }
        
        try:
            # Find duplicates first
            if 'deduplicate' in operations:
                results['duplicates'] = self.find_duplicates(image_paths)
            
            for image_path in image_paths:
                results['processed_files'].append(image_path)
                
                try:
                    # Enhance image
                    if 'enhance' in operations:
                        enhanced_path = self.enhance_image(image_path)
                        results['enhanced_files'].append(enhanced_path)
                    
                    # Extract text
                    if 'ocr' in operations:
                        text = self.extract_text(image_path)
                        results['ocr_results'].append({
                            'file': image_path,
                            'text': text
                        })
                    
                    # Resize image
                    if 'resize' in operations:
                        resized_path = self.resize_image(image_path)
                        results['resized_files'].append(resized_path)
                        
                except Exception as e:
                    logger.error(f"Failed to process {image_path}: {e}")
                    continue
            
            logger.info(f"Batch processed {len(results['processed_files'])} images")
            return results
            
        except Exception as e:
            logger.error(f"Failed to process batch in {image_dir}: {e}")
            raise
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions operations like 'enhance', 'ocr', 'resize', and 'deduplicate', but doesn't explain what these entail (e.g., how enhancement works, what deduplication criteria are). It also lacks details on permissions, rate limits, or side effects (e.g., whether images are modified in-place or copied). The return format is mentioned but not elaborated.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded with a clear purpose statement, followed by structured sections for Args and Returns. Every sentence adds value, such as listing operations and specifying defaults, with no redundant information. However, it could be slightly more concise by integrating the operations list into the main sentence.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (3 parameters, no annotations, but has an output schema), the description is moderately complete. It covers parameters well but lacks behavioral context and usage guidelines. The output schema existence means the description doesn't need to detail return values, but it should still address operational nuances. For a tool with multiple operations, more completeness on behavior is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds significant meaning beyond the input schema, which has 0% schema description coverage. It explains that 'image_dir' contains images to process, 'operations' is a list of specific operations with examples, and 'ocr_language' is for OCR processing with a default. This compensates well for the schema's lack of descriptions, though it could provide more detail on operation specifics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool 'processes images in a directory with various operations', which provides a basic purpose but is vague about what 'process' entails. It doesn't differentiate from sibling tools like 'convert_to_pdf' or 'directory_to_pdf', which might involve similar image handling. The verb 'process' is generic, and while it lists operations, it lacks specificity about the overall goal.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. For example, it doesn't specify if this is for batch processing versus single images, or how it compares to sibling tools like 'convert_to_pdf' for PDF conversion or 'full_document_workflow' for more comprehensive tasks. The description implies usage through the operations list but offers no explicit context or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/PovedaAqui/auto-snap-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server