Skip to main content
Glama
Unstructured-IO

Unstructured API MCP Server

Official

create_workflow

Configure automated document processing pipelines by defining source, destination, and custom nodes like partitioners, chunkers, prompters, and embedders.

Instructions

Create a new workflow.

Args:
    workflow_config: A Typed Dictionary containing required fields (destination_id - should be a
    valid UUID, name, source_id - should be a valid UUID, workflow_type) and non-required fields
    (schedule, and workflow_nodes). Note workflow_nodes is only enabled when workflow_type
    is `custom` and is a list of WorkflowNodeTypedDict: partition, prompter,chunk, embed
    Below is an example of a partition workflow node:
        {
            "name": "vlm-partition",
            "type": "partition",
            "sub_type": "vlm",
            "settings": {
                        "provider": "your favorite provider",
                        "model": "your favorite model"
                        }
        }


Returns:
    String containing the created workflow information

Custom workflow DAG nodes

  • If WorkflowType is set to custom, you must also specify the settings for the workflow’s directed acyclic graph (DAG) nodes. These nodes’ settings are specified in the workflow_nodes array.

  • A Source node is automatically created when you specify the source_id value outside of the workflow_nodes array.

  • A Destination node is automatically created when you specify the destination_id value outside of the workflow_nodes array.

  • You can specify Partitioner, Chunker, Prompter, and Embedder nodes.

  • The order of the nodes in the workflow_nodes array will be the same order that these nodes appear in the DAG, with the first node in the array added directly after the Source node. The Destination node follows the last node in the array.

  • Be sure to specify nodes in the allowed order. The following DAG placements are all allowed:

    • Source -> Partitioner -> Destination,

    • Source -> Partitioner -> Chunker -> Destination,

    • Source -> Partitioner -> Chunker -> Embedder -> Destination,

    • Source -> Partitioner -> Prompter -> Chunker -> Destination,

    • Source -> Partitioner -> Prompter -> Chunker -> Embedder -> Destination

Partitioner node A Partitioner node has a type of partition and a subtype of auto, vlm, hi_res, or fast.

Examples:

  • auto strategy: { "name": "Partitioner", "type": "partition", "subtype": "vlm", "settings": { "provider": "anthropic", (required) "model": "claude-sonnet-4-20250514", (required) "output_format": "text/html", "user_prompt": null, "format_html": true, "unique_element_ids": true, "is_dynamic": true, "allow_fast": true } }

  • vlm strategy: Allowed values are provider and model. Below are examples: - "provider": "anthropic" "model": "claude-sonnet-4-20250514", - "provider": "openai" "model": "gpt-4o"

  • hi_res strategy: { "name": "Partitioner", "type": "partition", "subtype": "unstructured_api", "settings": { "strategy": "hi_res", "include_page_breaks": <true|false>, "pdf_infer_table_structure": <true|false>, "exclude_elements": [ "", "" ], "xml_keep_tags": <true|false>, "encoding": "", "ocr_languages": [ "", "" ], "extract_image_block_types": [ "image", "table" ], "infer_table_structure": <true|false> } }

  • fast strategy { "name": "Partitioner", "type": "partition", "subtype": "unstructured_api", "settings": { "strategy": "fast", "include_page_breaks": <true|false>, "pdf_infer_table_structure": <true|false>, "exclude_elements": [ "", "" ], "xml_keep_tags": <true|false>, "encoding": "", "ocr_languages": [ "", "" ], "extract_image_block_types": [ "image", "table" ], "infer_table_structure": <true|false> } }

Chunker node A Chunker node has a type of chunk and subtype of chunk_by_character or chunk_by_title.

  • chunk_by_character { "name": "Chunker", "type": "chunk", "subtype": "chunk_by_character", "settings": { "include_orig_elements": <true|false>, "new_after_n_chars": , (required, if not provided set same as max_characters) "max_characters": , (required) "overlap": , (required, if not provided set default to 0) "overlap_all": <true|false>, "contextual_chunking_strategy": "v1" } }

  • chunk_by_title { "name": "Chunker", "type": "chunk", "subtype": "chunk_by_title", "settings": { "multipage_sections": <true|false>, "combine_text_under_n_chars": , "include_orig_elements": <true|false>, "new_after_n_chars": , (required, if not provided set same as max_characters) "max_characters": , (required) "overlap": , (required, if not provided set default to 0) "overlap_all": <true|false>, "contextual_chunking_strategy": "v1" } }

Prompter node An Prompter node has a type of prompter and subtype of:

  • openai_image_description,

  • anthropic_image_description,

  • bedrock_image_description,

  • vertexai_image_description,

  • openai_table_description,

  • anthropic_table_description,

  • bedrock_table_description,

  • vertexai_table_description,

  • openai_table2html,

  • openai_ner

Example: { "name": "Prompter", "type": "prompter", "subtype": "", "settings": {} }

Embedder node An Embedder node has a type of embed

Allowed values for subtype and model_name include:

  • "subtype": "azure_openai"

    • "model_name": "text-embedding-3-small"

    • "model_name": "text-embedding-3-large"

    • "model_name": "text-embedding-ada-002"

  • "subtype": "bedrock"

    • "model_name": "amazon.titan-embed-text-v2:0"

    • "model_name": "amazon.titan-embed-text-v1"

    • "model_name": "amazon.titan-embed-image-v1"

    • "model_name": "cohere.embed-english-v3"

    • "model_name": "cohere.embed-multilingual-v3"

  • "subtype": "togetherai":

    • "model_name": "togethercomputer/m2-bert-80M-2k-retrieval"

    • "model_name": "togethercomputer/m2-bert-80M-8k-retrieval"

    • "model_name": "togethercomputer/m2-bert-80M-32k-retrieval"

Example: { "name": "Embedder", "type": "embed", "subtype": "", "settings": { "model_name": "" } }

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
workflow_configYes

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • The handler function for the 'create_workflow' tool, registered via @mcp.tool() decorator. It receives workflow_config as input, constructs a CreateWorkflow object, calls the UnstructuredClient to create the workflow, and returns detailed information about the created workflow.
    @mcp.tool()
    @add_custom_node_examples  # Note: This documentation is added due to lack of typing in
    # WorkflowNode.settings. It can be safely deleted when typing is added.
    async def create_workflow(ctx: Context, workflow_config: CreateWorkflowTypedDict) -> str:
        """Create a new workflow.
    
        Args:
            workflow_config: A Typed Dictionary containing required fields (destination_id - should be a
            valid UUID, name, source_id - should be a valid UUID, workflow_type) and non-required fields
            (schedule, and workflow_nodes). Note workflow_nodes is only enabled when workflow_type
            is `custom` and is a list of WorkflowNodeTypedDict: partition, prompter,chunk, embed
            Below is an example of a partition workflow node:
                {
                    "name": "vlm-partition",
                    "type": "partition",
                    "sub_type": "vlm",
                    "settings": {
                                "provider": "your favorite provider",
                                "model": "your favorite model"
                                }
                }
    
    
        Returns:
            String containing the created workflow information
        """
        client = ctx.request_context.lifespan_context.client
    
        try:
            workflow = CreateWorkflow(**workflow_config)
            response = await client.workflows.create_workflow_async(
                request=CreateWorkflowRequest(create_workflow=workflow),
            )
    
            info = response.workflow_information
            return await get_workflow_info(ctx, info.id)
        except Exception as e:
            return f"Error creating workflow: {str(e)}"
  • Import of CreateWorkflowTypedDict, which serves as the input schema/type definition for the workflow_config parameter in create_workflow.
    from unstructured_client.models.shared.createworkflow import CreateWorkflowTypedDict
  • The @mcp.tool() decorator registers the create_workflow function as an MCP tool.
    @mcp.tool()
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. While it explains what happens when creating workflows (automatic source/destination nodes, DAG ordering), it lacks critical information about permissions required, whether this is a mutating operation, error handling, rate limits, or what happens to existing workflows. The description doesn't contradict annotations since none exist, but fails to provide sufficient behavioral context for a creation tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is excessively long (over 1500 words) with redundant examples and formatting issues. While the initial section is reasonably structured, the extensive examples for partitioner strategies, chunker types, prompter subtypes, and embedder models could be summarized more concisely. The description front-loads key information but then buries the reader in repetitive examples that don't all earn their place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of workflow creation with multiple node types and configurations, the description provides substantial context about parameter usage, DAG construction, and node specifications. With an output schema present, it doesn't need to explain return values. However, for a creation tool with no annotations, it should ideally include more about behavioral aspects like permissions, idempotency, or error cases to be fully complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage and only 1 parameter (workflow_config), the description provides extensive semantic information beyond the bare schema. It explains required vs optional fields, UUID requirements, conditional dependencies (workflow_nodes only for custom type), detailed examples for different node types, and DAG ordering rules. This fully compensates for the lack of schema descriptions and adds substantial value for understanding parameter usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool creates a new workflow with specific configuration requirements. It distinguishes itself from siblings like 'update_workflow' and 'delete_workflow' by focusing on creation rather than modification or deletion. However, it doesn't explicitly contrast with 'run_workflow' which executes existing workflows versus creating new ones.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage through parameter explanations and examples, particularly for custom workflows with DAG nodes. It mentions that 'workflow_nodes' is only enabled when workflow_type is 'custom', providing some conditional guidance. However, there's no explicit guidance on when to use this tool versus alternatives like 'update_workflow' or 'run_workflow', nor any prerequisites or error conditions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Unstructured-IO/UNS-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server