Skip to main content
Glama
snussik
by snussik

process_batch_local_files

Process multiple local files concurrently to extract text and tables into structured markdown and HTML formats using optimized OCR processing.

Instructions

Process multiple local files concurrently.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
argumentsYes

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • The `process_batch_local_files` handler is implemented in `src/mcp_mistral_ocr_opt/main.py` using `@app.tool("process_batch_local_files")`. It handles file discovery, determines whether to use inline/file batch mode or concurrent processing based on file count, and performs the OCR task.
    @app.tool("process_batch_local_files")
    async def process_batch_local_files(arguments: Dict[str, Any]) -> List[TextContent]:
        """Process multiple local files concurrently."""
        patterns = arguments.get("patterns")
        if not patterns or not isinstance(patterns, list):
            raise McpError(
                ErrorData(code=INVALID_PARAMS, message="patterns array is required")
            )
    
        try:
            files = await discover_files(
                directory=config.ocr_dir,
                patterns=patterns,
                max_files=arguments.get("max_files"),
            )
    
            if not files:
                return [
                    TextContent(
                        type="text",
                        text=json.dumps(
                            {"message": "No files found matching patterns"}, indent=2
                        ),
                    )
                ]
    
            mode = config.select_processing_mode(len(files))
            if mode in {"inline", "file"}:
                # Use batch processing
                from mistralai import Mistral
    
                client = Mistral(api_key=config.api_key)
                batch_proc = BatchProcessor(client=client, config=config)
    
                if mode == "inline":
                    # Inline batch
                    requests = await batch_proc.prepare_inline_batch(
                        files=files,
                        table_format=arguments.get("table_format"),
                        extract_header=arguments.get("extract_header", False),
                        extract_footer=arguments.get("extract_footer", False),
                        include_images=arguments.get("include_images", False),
                    )
                    job_id = await batch_proc.process_inline_batch(requests)
    
                    result = {
                        "mode": "batch",
                        "batch_type": "inline",
                        "job_id": job_id,
                        "files_processed": len(files),
                        "message": f"Batch job created with {len(files)} files. Use check_batch_status to monitor progress.",
                    }
                else:
                    # File batch
                    batch_file_id = await batch_proc.prepare_file_batch(
                        files=files,
                        table_format=arguments.get("table_format"),
                        extract_header=arguments.get("extract_header", False),
                        extract_footer=arguments.get("extract_footer", False),
                        include_images=arguments.get("include_images", False),
                    )
                    job_id = await batch_proc.process_file_batch(batch_file_id)
    
                    result = {
                        "mode": "batch",
                        "batch_type": "file",
                        "job_id": job_id,
                        "files_processed": len(files),
                        "message": f"Batch job created with {len(files)} files. Use check_batch_status to monitor progress.",
                    }
            else:
                # Use concurrent processing
                results = await ocr_processor.process_batch_local_files(
                    file_paths=files,
                    table_format=arguments.get("table_format"),
                    extract_header=arguments.get("extract_header", False),
                    extract_footer=arguments.get("extract_footer", False),
                    include_images=arguments.get("include_images", False),
                )
    
                successful = sum(1 for r in results if "error" not in r)
                failed = len(results) - successful
    
                result = {
                    "mode": "concurrent",
                    "files_processed": len(files),
                    "successful": successful,
                    "failed": failed,
                    "results": results,
                }
    
            return [
                TextContent(
                    type="text", text=json.dumps(result, indent=2, ensure_ascii=False)
                )
            ]
    
        except ValueError as e:
            raise McpError(ErrorData(code=INVALID_PARAMS, message=str(e)))
        except Exception as e:
            raise McpError(
                ErrorData(code=INTERNAL_ERROR, message=f"Error processing batch: {str(e)}")
            )
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Mentions 'concurrently' indicating parallel execution behavior. However, with no annotations provided, the description fails to disclose error handling (partial failure vs atomic), resource costs, rate limits, or what 'processing' actually entails. Missing safety/profile info that annotations would normally cover.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely terse (6 words) and front-loaded, but given the high complexity (nested objects, opaque schema, batch operation), this constitutes under-specification rather than appropriate conciseness. Critical information is missing.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

While an output schema exists (reducing the need to describe returns), the combination of zero annotation coverage and a completely undocumented parameter object leaves critical gaps. Does not explain relationship to batch job lifecycle siblings or required setup.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema coverage and a completely opaque 'arguments' object (additionalProperties: true), the description provides zero compensation. No indication of required keys, structure, or what data to pass in the arguments object.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

States the resource (local files) and quantity (multiple) with a distinguishing behavioral trait (concurrently), which helps differentiate from siblings like process_local_file. However, the verb 'Process' is vague and doesn't specify what operation is performed (transform, validate, upload, etc.).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The term 'multiple' implies usage for batch operations versus single-file processing, but provides no explicit when-to-use guidance, prerequisites, or named alternatives. No mention of when to prefer create_batch_job or other batch siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/snussik/mcp_mistral_ocr_opt'

If you have feedback or need assistance with the MCP directory API, please join our Discord server