Skip to main content
Glama
snussik
by snussik

create_batch_job

Process large document sets efficiently by creating batch OCR jobs to extract text and tables into structured formats.

Instructions

Create a batch processing job for large file sets.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
argumentsYes

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • The implementation of the create_batch_job tool, which processes file batches based on provided patterns.
    @app.tool("create_batch_job")
    async def create_batch_job(arguments: Dict[str, Any]) -> List[TextContent]:
        """Create a batch processing job for large file sets."""
        patterns = arguments.get("patterns")
        if not patterns or not isinstance(patterns, list):
            raise McpError(
                ErrorData(code=INVALID_PARAMS, message="patterns array is required")
            )
    
        try:
            files = await discover_files(
                directory=config.ocr_dir,
                patterns=patterns,
            )
    
            if not files:
                return [
                    TextContent(
                        type="text",
                        text=json.dumps(
                            {"message": "No files found matching patterns"}, indent=2
                        ),
                    )
                ]
    
            batch_proc = await get_batch_processor()
    
            use_inline = arguments.get(
                "use_inline", len(files) < config.inline_batch_threshold
            )
    
            if use_inline:
                # Inline batch
                requests = await batch_proc.prepare_inline_batch(
                    files=files,
                    table_format=arguments.get("table_format"),
                    extract_header=arguments.get("extract_header", False),
                    extract_footer=arguments.get("extract_footer", False),
                    include_images=arguments.get("include_images", False),
                )
                job_id = await batch_proc.process_inline_batch(requests)
    
                result = {
                    "batch_type": "inline",
                    "job_id": job_id,
                    "files_queued": len(files),
                    "message": f"Inline batch job created with {len(files)} files. Use check_batch_status to monitor progress.",
                }
            else:
                # File batch
                batch_file_id = await batch_proc.prepare_file_batch(
                    files=files,
                    table_format=arguments.get("table_format"),
                    extract_header=arguments.get("extract_header", False),
                    extract_footer=arguments.get("extract_footer", False),
                    include_images=arguments.get("include_images", False),
                )
                job_id = await batch_proc.process_file_batch(batch_file_id)
    
                result = {
                    "batch_type": "file",
                    "job_id": job_id,
                    "batch_file_id": batch_file_id,
                    "files_queued": len(files),
                    "message": f"File batch job created with {len(files)} files. Use check_batch_status to monitor progress.",
                }
    
            return [
                TextContent(
                    type="text", text=json.dumps(result, indent=2, ensure_ascii=False)
                )
            ]
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full disclosure burden. 'Create' implies mutation but lacks critical behavioral details: job persistence duration, polling requirements, failure modes, or lifecycle relationship to cancel_batch_job/check_batch_status siblings. With output schema present, return values needn't be described, but operational semantics are missing.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence is efficient and front-loaded, but severely undersized for tool complexity. With 7 sibling tools in batch ecosystem and completely unspecified parameter schema, the description needs expansion to earn full conciseness credit (conciseness requires completeness).

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Inadequate for a workflow-anchored tool with lifecycle implications. While output schema presence reduces need for return value documentation, the combination of 0% schema coverage, catch-all parameter pattern, and multiple related lifecycle tools demands more contextual scaffolding than 7 words provide.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0% and the single parameter uses additionalProperties: true (catch-all pattern). Description provides zero guidance on expected argument structure, required fields, or validation rules for the 'arguments' object. Agent cannot determine what keys/values to populate without external knowledge.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

States specific verb (Create) and resource (batch processing job) with scope modifier (large file sets). Implicitly distinguishes from immediate-processing siblings (process_local_file, process_url_file) by using 'job' terminology suggesting deferred/async execution. However, 'large file sets' is vague and doesn't explicitly clarify relationship to process_batch_local_files.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit when-to-use guidance despite significant sibling ambiguity. The tool process_batch_local_files appears to perform similar batch operations, but the description provides no criteria for choosing between immediate processing versus creating a job. No prerequisites or workflow context provided (e.g., that check_batch_status follows creation).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/snussik/mcp_mistral_ocr_opt'

If you have feedback or need assistance with the MCP directory API, please join our Discord server