Skip to main content
Glama
Unstructured-IO

Unstructured API MCP Server

Official

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault
FIRECRAWL_API_KEYNoAPI key for using Firecrawl web crawling API features
UNSTRUCTURED_API_KEYYesYour Unstructured API key fetched from https://platform.unstructured.io/app/account/api-keys

Capabilities

Server capabilities have not been inspected yet.

Tools

Functions exposed to the LLM to take actions

NameDescription
create_source_connector

Create a source connector based on type. Args: ctx: Context object with the request and lifespan context name: A unique name for this connector source_type: The type of source being created (e.g., 'azure', 'onedrive', 'salesforce', 'gdrive', 's3', 'sharepoint')

    type_specific_config:
        azure:
            remote_url: The Azure Storage remote URL with the format
                        az://<container-name>/<path/to/file/or/folder/in/container/as/needed>
            recursive: (Optional[bool]) Whether to access subfolders
        gdrive:
            drive_id: The Drive ID for the Google Drive source
            recursive: (Optional[bool]) Whether to access subfolders
            extensions: (Optional[list[str]]) File extensions to filter
        onedrive:
            path: The path to the target folder in the OneDrive account
            user_pname: The User Principal Name (UPN) for the OneDrive user account
            recursive: (Optional[bool]) Whether to access subfolders
            authority_url: (Optional[str]) The authentication token provider URL
        s3:
            remote_url: The S3 URI to the bucket or folder (e.g., s3://my-bucket/)
            recursive: (Optional[bool]) Whether to access subfolders
        salesforce:
            username: The Salesforce username
            categories: (Optional[list[str]]) Optional Salesforce domain,the names of the
                        Salesforce categories (objects) that you want to access, specified as
                        a comma-separated list. Available categories include Account, Campaign,
                        Case, EmailMessage, and Lead.
        sharepoint:
            site: The SharePoint site to connect to
            user_pname: The username for the SharePoint site
            path: (Optional) The path within the SharePoint site
            recursive: (Optional[bool]) Whether to access subfolders
            authority_url: (Optional[str]) The authority URL for authentication

Returns:
    String containing the created source connector information
update_source_connector

Update a source connector based on type.

Args:
    ctx: Context object with the request and lifespan context
    source_id: ID of the source connector to update
    source_type: The type of source being updated (e.g., 'azure', 'onedrive',
                 'salesforce', 'gdrive', 's3', 'sharepoint')

    type_specific_config:
        azure:
            remote_url: (Optional[str]) The Azure Storage remote URL with the format
                        az://<container-name>/<path/to/file/or/folder/in/container/as/needed>
            recursive: (Optional[bool]) Whether to access subfolders
        gdrive:
            drive_id: (Optional[str]) The Drive ID for the Google Drive source
            recursive: (Optional[bool]) Whether to access subfolders
            extensions: (Optional[list[str]]) File extensions to filter
        onedrive:
            path: (Optional[str]) The path to the target folder in the OneDrive account
            user_pname: (Optional[str]) The User Principal Name (UPN) for the OneDrive
                        user account
            recursive: (Optional[bool]) Whether to access subfolders
            authority_url: (Optional[str]) The authentication token provider URL
        s3:
            remote_url: (Optional[str]) The S3 URI to the bucket or folder
                        (e.g., s3://my-bucket/)
            recursive: (Optional[bool]) Whether to access subfolders
        salesforce:
            username: (Optional[str]) The Salesforce username
            categories: (Optional[list[str]]) Optional Salesforce domain,the names of the
                        Salesforce categories (objects) that you want to access, specified as
                        a comma-separated list. Available categories include Account, Campaign,
                        Case, EmailMessage, and Lead.
        sharepoint:
            site: Optional([str]) The SharePoint site to connect to
            user_pname: Optional([str]) The username for the SharePoint site
            path: (Optional) The path within the SharePoint site
            recursive: (Optional[bool]) Whether to access subfolders
            authority_url: (Optional[str]) The authority URL for authentication

Returns:
    String containing the updated source connector information
delete_source_connector

Delete a source connector.

Args:
    source_id: ID of the source connector to delete

Returns:
    String containing the result of the deletion
create_destination_connector

Create a destination connector based on type.

Args:
    ctx: Context object with the request and lifespan context
    name: A unique name for this connector
    destination_type: The type of destination being created

    type_specific_config:
        astradb:
            collection_name: The AstraDB collection name
            keyspace: The AstraDB keyspace
            batch_size: (Optional[int]) The batch size for inserting documents
        databricks_delta_table:
            catalog: Name of the catalog in Databricks Unity Catalog
            database: The database in Unity Catalog
            http_path: The cluster’s or SQL warehouse’s HTTP Path value
            server_hostname: The Databricks cluster’s or SQL warehouse’s Server Hostname value
            table_name: The name of the table in the schema
            volume: Name of the volume associated with the schema.
            schema: (Optional[str]) Name of the schema associated with the volume
            volume_path: (Optional[str]) Any target folder path within the volume, starting
                        from the root of the volume.
        databricks_volumes:
            catalog: Name of the catalog in Databricks
            host: The Databricks host URL
            volume: Name of the volume associated with the schema
            schema: (Optional[str]) Name of the schema associated with the volume. The default
                     value is "default".
            volume_path: (Optional[str]) Any target folder path within the volume,
                        starting from the root of the volume.
        mongodb:
            database: The name of the MongoDB database
            collection: The name of the MongoDB collection
        neo4j:
            database: The Neo4j database, e.g. "neo4j"
            uri: The Neo4j URI e.g. neo4j+s://<neo4j_instance_id>.databases.neo4j.io
            batch_size: (Optional[int]) The batch size for the connector
        pinecone:
            index_name: The Pinecone index name
            namespace: (Optional[str]) The pinecone namespace, a folder inside the
                       pinecone index
            batch_size: (Optional[int]) The batch size
        s3:
            remote_url: The S3 URI to the bucket or folder
        weaviate:
            cluster_url: URL of the Weaviate cluster
            collection: Name of the collection in the Weaviate cluster

            Note: Minimal schema is required for the collection, e.g. record_id: Text

Returns:
    String containing the created destination connector information
update_destination_connector

Update a destination connector based on type.

Args:
    ctx: Context object with the request and lifespan context
    destination_id: ID of the destination connector to update
    destination_type: The type of destination being updated

    type_specific_config:
        astradb:
            collection_name: (Optional[str]): The AstraDB collection name
            keyspace: (Optional[str]): The AstraDB keyspace
            batch_size: (Optional[int]) The batch size for inserting documents
        databricks_delta_table:
            catalog: (Optional[str]): Name of the catalog in Databricks Unity Catalog
            database: (Optional[str]): The database in Unity Catalog
            http_path: (Optional[str]): The cluster’s or SQL warehouse’s HTTP Path value
            server_hostname: (Optional[str]): The Databricks cluster’s or SQL warehouse’s
                             Server Hostname value
            table_name: (Optional[str]): The name of the table in the schema
            volume: (Optional[str]): Name of the volume associated with the schema.
            schema: (Optional[str]) Name of the schema associated with the volume
            volume_path: (Optional[str]) Any target folder path within the volume, starting
                        from the root of the volume.
        databricks_volumes:
            catalog: (Optional[str]): Name of the catalog in Databricks
            host: (Optional[str]): The Databricks host URL
            volume: (Optional[str]): Name of the volume associated with the schema
            schema: (Optional[str]) Name of the schema associated with the volume. The default
                     value is "default".
            volume_path: (Optional[str]) Any target folder path within the volume,
                        starting from the root of the volume.
        mongodb:
            database: (Optional[str]): The name of the MongoDB database
            collection: (Optional[str]): The name of the MongoDB collection
        neo4j:
            database: (Optional[str]): The Neo4j database, e.g. "neo4j"
            uri: (Optional[str]): The Neo4j URI
                  e.g. neo4j+s://<neo4j_instance_id>.databases.neo4j.io
            batch_size: (Optional[int]) The batch size for the connector
        pinecone:
            index_name: (Optional[str]): The Pinecone index name
            namespace: (Optional[str]) The pinecone namespace, a folder inside the
                       pinecone index
            batch_size: (Optional[int]) The batch size
        s3:
            remote_url: (Optional[str]): The S3 URI to the bucket or folder
        weaviate:
            cluster_url: (Optional[str]): URL of the Weaviate cluster
            collection: (Optional[str]): Name of the collection in the Weaviate cluster

            Note: Minimal schema is required for the collection, e.g. record_id: Text

Returns:
    String containing the updated destination connector information
delete_destination_connector

Delete a destination connector.

Args:
    destination_id: ID of the destination connector to delete

Returns:
    String containing the result of the deletion
invoke_firecrawl_crawlhtml

Start an asynchronous web crawl job using Firecrawl to retrieve HTML content.

Args:
    url: URL to crawl
    s3_uri: S3 URI where results will be uploaded
    limit: Maximum number of pages to crawl (default: 100)

Returns:
    Dictionary with crawl job information including the job ID
check_crawlhtml_status

Check the status of an existing Firecrawl HTML crawl job.

Args:
    crawl_id: ID of the crawl job to check

Returns:
    Dictionary containing the current status of the crawl job
invoke_firecrawl_llmtxt

Start an asynchronous llmfull.txt generation job using Firecrawl. This file is a standardized markdown file containing information to help LLMs use a website at inference time. The llmstxt endpoint leverages Firecrawl to crawl your website and extracts data using gpt-4o-mini Args: url: URL to crawl s3_uri: S3 URI where results will be uploaded max_urls: Maximum number of pages to crawl (1-100, default: 10)

Returns:
    Dictionary with job information including the job ID
check_llmtxt_status

Check the status of an existing llmfull.txt generation job.

Args:
    job_id: ID of the llmfull.txt generation job to check

Returns:
    Dictionary containing the current status of the job and text content if completed
cancel_crawlhtml_job

Cancel an in-progress Firecrawl HTML crawl job.

Args:
    crawl_id: ID of the crawl job to cancel

Returns:
    Dictionary containing the result of the cancellation
partition_local_file
Transform a local file into structured data using the Unstructured API.

Args:
    input_file_path: The absolute path to the file.
    output_file_dir: The absolute path to the directory where the output file should be saved.
    strategy: The strategy for transformation.
        Available strategies:
            VLM - most advanced transformation suitable for difficult PDFs and Images
            hi_res - high resolution transformation suitable for most document types
            fast - fast transformation suitable for PDFs with extractable text
            auto - automatically choose the best strategy based on the input file
    vlm_model: The VLM model to use for the transformation.
    vlm_model_provider: The VLM model provider to use for the transformation.
    output_type: The type of output to generate. Options: 'json' for json
                 or 'md' for markdown.

Returns:
    A string containing the structured data or a message indicating the output file
    path with the structured data.
list_sources
List available sources from the Unstructured API.

Args:
    source_type: Optional source connector type to filter by

Returns:
    String containing the list of sources
get_source_info

Get detailed information about a specific source connector.

Args:
    source_id: ID of the source connector to get information for, should be valid UUID

Returns:
    String containing the source connector information
list_destinations

List available destinations from the Unstructured API.

Args:
    destination_type: Optional destination connector type to filter by

Returns:
    String containing the list of destinations
get_destination_info

Get detailed information about a specific destination connector.

Args:
    destination_id: ID of the destination connector to get information for

Returns:
    String containing the destination connector information
list_workflows
List workflows from the Unstructured API.

Args:
    destination_id: Optional destination connector ID to filter by
    source_id: Optional source connector ID to filter by
    status: Optional workflow status to filter by

Returns:
    String containing the list of workflows
get_workflow_info

Get detailed information about a specific workflow.

Args:
    workflow_id: ID of the workflow to get information for

Returns:
    String containing the workflow information
create_workflow

Create a new workflow.

Args:
    workflow_config: A Typed Dictionary containing required fields (destination_id - should be a
    valid UUID, name, source_id - should be a valid UUID, workflow_type) and non-required fields
    (schedule, and workflow_nodes). Note workflow_nodes is only enabled when workflow_type
    is `custom` and is a list of WorkflowNodeTypedDict: partition, prompter,chunk, embed
    Below is an example of a partition workflow node:
        {
            "name": "vlm-partition",
            "type": "partition",
            "sub_type": "vlm",
            "settings": {
                        "provider": "your favorite provider",
                        "model": "your favorite model"
                        }
        }


Returns:
    String containing the created workflow information

Custom workflow DAG nodes

  • If WorkflowType is set to custom, you must also specify the settings for the workflow’s directed acyclic graph (DAG) nodes. These nodes’ settings are specified in the workflow_nodes array.

  • A Source node is automatically created when you specify the source_id value outside of the workflow_nodes array.

  • A Destination node is automatically created when you specify the destination_id value outside of the workflow_nodes array.

  • You can specify Partitioner, Chunker, Prompter, and Embedder nodes.

  • The order of the nodes in the workflow_nodes array will be the same order that these nodes appear in the DAG, with the first node in the array added directly after the Source node. The Destination node follows the last node in the array.

  • Be sure to specify nodes in the allowed order. The following DAG placements are all allowed:

    • Source -> Partitioner -> Destination,

    • Source -> Partitioner -> Chunker -> Destination,

    • Source -> Partitioner -> Chunker -> Embedder -> Destination,

    • Source -> Partitioner -> Prompter -> Chunker -> Destination,

    • Source -> Partitioner -> Prompter -> Chunker -> Embedder -> Destination

Partitioner node A Partitioner node has a type of partition and a subtype of auto, vlm, hi_res, or fast.

Examples:

  • auto strategy: { "name": "Partitioner", "type": "partition", "subtype": "vlm", "settings": { "provider": "anthropic", (required) "model": "claude-sonnet-4-20250514", (required) "output_format": "text/html", "user_prompt": null, "format_html": true, "unique_element_ids": true, "is_dynamic": true, "allow_fast": true } }

  • vlm strategy: Allowed values are provider and model. Below are examples: - "provider": "anthropic" "model": "claude-sonnet-4-20250514", - "provider": "openai" "model": "gpt-4o"

  • hi_res strategy: { "name": "Partitioner", "type": "partition", "subtype": "unstructured_api", "settings": { "strategy": "hi_res", "include_page_breaks": <true|false>, "pdf_infer_table_structure": <true|false>, "exclude_elements": [ "", "" ], "xml_keep_tags": <true|false>, "encoding": "", "ocr_languages": [ "", "" ], "extract_image_block_types": [ "image", "table" ], "infer_table_structure": <true|false> } }

  • fast strategy { "name": "Partitioner", "type": "partition", "subtype": "unstructured_api", "settings": { "strategy": "fast", "include_page_breaks": <true|false>, "pdf_infer_table_structure": <true|false>, "exclude_elements": [ "", "" ], "xml_keep_tags": <true|false>, "encoding": "", "ocr_languages": [ "", "" ], "extract_image_block_types": [ "image", "table" ], "infer_table_structure": <true|false> } }

Chunker node A Chunker node has a type of chunk and subtype of chunk_by_character or chunk_by_title.

  • chunk_by_character { "name": "Chunker", "type": "chunk", "subtype": "chunk_by_character", "settings": { "include_orig_elements": <true|false>, "new_after_n_chars": , (required, if not provided set same as max_characters) "max_characters": , (required) "overlap": , (required, if not provided set default to 0) "overlap_all": <true|false>, "contextual_chunking_strategy": "v1" } }

  • chunk_by_title { "name": "Chunker", "type": "chunk", "subtype": "chunk_by_title", "settings": { "multipage_sections": <true|false>, "combine_text_under_n_chars": , "include_orig_elements": <true|false>, "new_after_n_chars": , (required, if not provided set same as max_characters) "max_characters": , (required) "overlap": , (required, if not provided set default to 0) "overlap_all": <true|false>, "contextual_chunking_strategy": "v1" } }

Prompter node An Prompter node has a type of prompter and subtype of:

  • openai_image_description,

  • anthropic_image_description,

  • bedrock_image_description,

  • vertexai_image_description,

  • openai_table_description,

  • anthropic_table_description,

  • bedrock_table_description,

  • vertexai_table_description,

  • openai_table2html,

  • openai_ner

Example: { "name": "Prompter", "type": "prompter", "subtype": "", "settings": {} }

Embedder node An Embedder node has a type of embed

Allowed values for subtype and model_name include:

  • "subtype": "azure_openai"

    • "model_name": "text-embedding-3-small"

    • "model_name": "text-embedding-3-large"

    • "model_name": "text-embedding-ada-002"

  • "subtype": "bedrock"

    • "model_name": "amazon.titan-embed-text-v2:0"

    • "model_name": "amazon.titan-embed-text-v1"

    • "model_name": "amazon.titan-embed-image-v1"

    • "model_name": "cohere.embed-english-v3"

    • "model_name": "cohere.embed-multilingual-v3"

  • "subtype": "togetherai":

    • "model_name": "togethercomputer/m2-bert-80M-2k-retrieval"

    • "model_name": "togethercomputer/m2-bert-80M-8k-retrieval"

    • "model_name": "togethercomputer/m2-bert-80M-32k-retrieval"

Example: { "name": "Embedder", "type": "embed", "subtype": "", "settings": { "model_name": "" } }

run_workflow

Run a specific workflow.

Args:
    workflow_id: ID of the workflow to run

Returns:
    String containing the response from the workflow execution
update_workflow

Update an existing workflow.

Args:
    workflow_id: ID of the workflow to update
    workflow_config: A Typed Dictionary containing required fields (destination_id,
    name, source_id, workflow_type) and non-required fields (schedule, and workflow_nodes)

Returns:
    String containing the updated workflow information
delete_workflow

Delete a specific workflow.

Args:
    workflow_id: ID of the workflow to delete

Returns:
    String containing the response from the workflow deletion
list_jobs
List jobs via the Unstructured API.

Args:
    workflow_id: Optional workflow ID to filter by
    status: Optional job status to filter by

Returns:
    String containing the list of jobs
get_job_info

Get detailed information about a specific job.

Args:
    job_id: ID of the job to get information for

Returns:
    String containing the job information
cancel_job

Delete a specific job.

Args:
    job_id: ID of the job to cancel

Returns:
    String containing the response from the job cancellation
list_workflows_with_finished_jobs
List workflows with finished jobs via the Unstructured API.
Args:
    source_type: Optional source connector type to filter by
    destination_type: Optional destination connector type to filter by
Returns:
    String containing the list of workflows with finished jobs and source and destination
    details

Prompts

Interactive templates invoked by user choice

NameDescription

No prompts

Resources

Contextual data attached and managed by the client

NameDescription

No resources

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Unstructured-IO/UNS-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server