Skip to main content
Glama
Unstructured-IO

Unstructured API MCP Server

Official

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault
FIRECRAWL_API_KEYNoAPI key for using Firecrawl web crawling API features
UNSTRUCTURED_API_KEYYesYour Unstructured API key fetched from https://platform.unstructured.io/app/account/api-keys

Tools

Functions exposed to the LLM to take actions

NameDescription
create_source_connector

Create a source connector based on type. Args: ctx: Context object with the request and lifespan context name: A unique name for this connector source_type: The type of source being created (e.g., 'azure', 'onedrive', 'salesforce', 'gdrive', 's3', 'sharepoint')

type_specific_config: azure: remote_url: The Azure Storage remote URL with the format az://<container-name>/<path/to/file/or/folder/in/container/as/needed> recursive: (Optional[bool]) Whether to access subfolders gdrive: drive_id: The Drive ID for the Google Drive source recursive: (Optional[bool]) Whether to access subfolders extensions: (Optional[list[str]]) File extensions to filter onedrive: path: The path to the target folder in the OneDrive account user_pname: The User Principal Name (UPN) for the OneDrive user account recursive: (Optional[bool]) Whether to access subfolders authority_url: (Optional[str]) The authentication token provider URL s3: remote_url: The S3 URI to the bucket or folder (e.g., s3://my-bucket/) recursive: (Optional[bool]) Whether to access subfolders salesforce: username: The Salesforce username categories: (Optional[list[str]]) Optional Salesforce domain,the names of the Salesforce categories (objects) that you want to access, specified as a comma-separated list. Available categories include Account, Campaign, Case, EmailMessage, and Lead. sharepoint: site: The SharePoint site to connect to user_pname: The username for the SharePoint site path: (Optional) The path within the SharePoint site recursive: (Optional[bool]) Whether to access subfolders authority_url: (Optional[str]) The authority URL for authentication Returns: String containing the created source connector information
update_source_connector

Update a source connector based on type.

Args: ctx: Context object with the request and lifespan context source_id: ID of the source connector to update source_type: The type of source being updated (e.g., 'azure', 'onedrive', 'salesforce', 'gdrive', 's3', 'sharepoint') type_specific_config: azure: remote_url: (Optional[str]) The Azure Storage remote URL with the format az://<container-name>/<path/to/file/or/folder/in/container/as/needed> recursive: (Optional[bool]) Whether to access subfolders gdrive: drive_id: (Optional[str]) The Drive ID for the Google Drive source recursive: (Optional[bool]) Whether to access subfolders extensions: (Optional[list[str]]) File extensions to filter onedrive: path: (Optional[str]) The path to the target folder in the OneDrive account user_pname: (Optional[str]) The User Principal Name (UPN) for the OneDrive user account recursive: (Optional[bool]) Whether to access subfolders authority_url: (Optional[str]) The authentication token provider URL s3: remote_url: (Optional[str]) The S3 URI to the bucket or folder (e.g., s3://my-bucket/) recursive: (Optional[bool]) Whether to access subfolders salesforce: username: (Optional[str]) The Salesforce username categories: (Optional[list[str]]) Optional Salesforce domain,the names of the Salesforce categories (objects) that you want to access, specified as a comma-separated list. Available categories include Account, Campaign, Case, EmailMessage, and Lead. sharepoint: site: Optional([str]) The SharePoint site to connect to user_pname: Optional([str]) The username for the SharePoint site path: (Optional) The path within the SharePoint site recursive: (Optional[bool]) Whether to access subfolders authority_url: (Optional[str]) The authority URL for authentication Returns: String containing the updated source connector information
delete_source_connector

Delete a source connector.

Args: source_id: ID of the source connector to delete Returns: String containing the result of the deletion
create_destination_connector

Create a destination connector based on type.

Args: ctx: Context object with the request and lifespan context name: A unique name for this connector destination_type: The type of destination being created type_specific_config: astradb: collection_name: The AstraDB collection name keyspace: The AstraDB keyspace batch_size: (Optional[int]) The batch size for inserting documents databricks_delta_table: catalog: Name of the catalog in Databricks Unity Catalog database: The database in Unity Catalog http_path: The cluster’s or SQL warehouse’s HTTP Path value server_hostname: The Databricks cluster’s or SQL warehouse’s Server Hostname value table_name: The name of the table in the schema volume: Name of the volume associated with the schema. schema: (Optional[str]) Name of the schema associated with the volume volume_path: (Optional[str]) Any target folder path within the volume, starting from the root of the volume. databricks_volumes: catalog: Name of the catalog in Databricks host: The Databricks host URL volume: Name of the volume associated with the schema schema: (Optional[str]) Name of the schema associated with the volume. The default value is "default". volume_path: (Optional[str]) Any target folder path within the volume, starting from the root of the volume. mongodb: database: The name of the MongoDB database collection: The name of the MongoDB collection neo4j: database: The Neo4j database, e.g. "neo4j" uri: The Neo4j URI e.g. neo4j+s://<neo4j_instance_id>.databases.neo4j.io batch_size: (Optional[int]) The batch size for the connector pinecone: index_name: The Pinecone index name namespace: (Optional[str]) The pinecone namespace, a folder inside the pinecone index batch_size: (Optional[int]) The batch size s3: remote_url: The S3 URI to the bucket or folder weaviate: cluster_url: URL of the Weaviate cluster collection: Name of the collection in the Weaviate cluster Note: Minimal schema is required for the collection, e.g. record_id: Text Returns: String containing the created destination connector information
update_destination_connector

Update a destination connector based on type.

Args: ctx: Context object with the request and lifespan context destination_id: ID of the destination connector to update destination_type: The type of destination being updated type_specific_config: astradb: collection_name: (Optional[str]): The AstraDB collection name keyspace: (Optional[str]): The AstraDB keyspace batch_size: (Optional[int]) The batch size for inserting documents databricks_delta_table: catalog: (Optional[str]): Name of the catalog in Databricks Unity Catalog database: (Optional[str]): The database in Unity Catalog http_path: (Optional[str]): The cluster’s or SQL warehouse’s HTTP Path value server_hostname: (Optional[str]): The Databricks cluster’s or SQL warehouse’s Server Hostname value table_name: (Optional[str]): The name of the table in the schema volume: (Optional[str]): Name of the volume associated with the schema. schema: (Optional[str]) Name of the schema associated with the volume volume_path: (Optional[str]) Any target folder path within the volume, starting from the root of the volume. databricks_volumes: catalog: (Optional[str]): Name of the catalog in Databricks host: (Optional[str]): The Databricks host URL volume: (Optional[str]): Name of the volume associated with the schema schema: (Optional[str]) Name of the schema associated with the volume. The default value is "default". volume_path: (Optional[str]) Any target folder path within the volume, starting from the root of the volume. mongodb: database: (Optional[str]): The name of the MongoDB database collection: (Optional[str]): The name of the MongoDB collection neo4j: database: (Optional[str]): The Neo4j database, e.g. "neo4j" uri: (Optional[str]): The Neo4j URI e.g. neo4j+s://<neo4j_instance_id>.databases.neo4j.io batch_size: (Optional[int]) The batch size for the connector pinecone: index_name: (Optional[str]): The Pinecone index name namespace: (Optional[str]) The pinecone namespace, a folder inside the pinecone index batch_size: (Optional[int]) The batch size s3: remote_url: (Optional[str]): The S3 URI to the bucket or folder weaviate: cluster_url: (Optional[str]): URL of the Weaviate cluster collection: (Optional[str]): Name of the collection in the Weaviate cluster Note: Minimal schema is required for the collection, e.g. record_id: Text Returns: String containing the updated destination connector information
delete_destination_connector

Delete a destination connector.

Args: destination_id: ID of the destination connector to delete Returns: String containing the result of the deletion
invoke_firecrawl_crawlhtml

Start an asynchronous web crawl job using Firecrawl to retrieve HTML content.

Args: url: URL to crawl s3_uri: S3 URI where results will be uploaded limit: Maximum number of pages to crawl (default: 100) Returns: Dictionary with crawl job information including the job ID
check_crawlhtml_status

Check the status of an existing Firecrawl HTML crawl job.

Args: crawl_id: ID of the crawl job to check Returns: Dictionary containing the current status of the crawl job
invoke_firecrawl_llmtxt

Start an asynchronous llmfull.txt generation job using Firecrawl. This file is a standardized markdown file containing information to help LLMs use a website at inference time. The llmstxt endpoint leverages Firecrawl to crawl your website and extracts data using gpt-4o-mini Args: url: URL to crawl s3_uri: S3 URI where results will be uploaded max_urls: Maximum number of pages to crawl (1-100, default: 10)

Returns: Dictionary with job information including the job ID
check_llmtxt_status

Check the status of an existing llmfull.txt generation job.

Args: job_id: ID of the llmfull.txt generation job to check Returns: Dictionary containing the current status of the job and text content if completed
cancel_crawlhtml_job

Cancel an in-progress Firecrawl HTML crawl job.

Args: crawl_id: ID of the crawl job to cancel Returns: Dictionary containing the result of the cancellation
partition_local_file
Transform a local file into structured data using the Unstructured API. Args: input_file_path: The absolute path to the file. output_file_dir: The absolute path to the directory where the output file should be saved. strategy: The strategy for transformation. Available strategies: VLM - most advanced transformation suitable for difficult PDFs and Images hi_res - high resolution transformation suitable for most document types fast - fast transformation suitable for PDFs with extractable text auto - automatically choose the best strategy based on the input file vlm_model: The VLM model to use for the transformation. vlm_model_provider: The VLM model provider to use for the transformation. output_type: The type of output to generate. Options: 'json' for json or 'md' for markdown. Returns: A string containing the structured data or a message indicating the output file path with the structured data.
list_sources
List available sources from the Unstructured API. Args: source_type: Optional source connector type to filter by Returns: String containing the list of sources
get_source_info

Get detailed information about a specific source connector.

Args: source_id: ID of the source connector to get information for, should be valid UUID Returns: String containing the source connector information
list_destinations

List available destinations from the Unstructured API.

Args: destination_type: Optional destination connector type to filter by Returns: String containing the list of destinations
get_destination_info

Get detailed information about a specific destination connector.

Args: destination_id: ID of the destination connector to get information for Returns: String containing the destination connector information
list_workflows
List workflows from the Unstructured API. Args: destination_id: Optional destination connector ID to filter by source_id: Optional source connector ID to filter by status: Optional workflow status to filter by Returns: String containing the list of workflows
get_workflow_info

Get detailed information about a specific workflow.

Args: workflow_id: ID of the workflow to get information for Returns: String containing the workflow information
create_workflow

Create a new workflow.

Args: workflow_config: A Typed Dictionary containing required fields (destination_id - should be a valid UUID, name, source_id - should be a valid UUID, workflow_type) and non-required fields (schedule, and workflow_nodes). Note workflow_nodes is only enabled when workflow_type is `custom` and is a list of WorkflowNodeTypedDict: partition, prompter,chunk, embed Below is an example of a partition workflow node: { "name": "vlm-partition", "type": "partition", "sub_type": "vlm", "settings": { "provider": "your favorite provider", "model": "your favorite model" } } Returns: String containing the created workflow information

Custom workflow DAG nodes

  • If WorkflowType is set to custom, you must also specify the settings for the workflow’s directed acyclic graph (DAG) nodes. These nodes’ settings are specified in the workflow_nodes array.

  • A Source node is automatically created when you specify the source_id value outside of the workflow_nodes array.

  • A Destination node is automatically created when you specify the destination_id value outside of the workflow_nodes array.

  • You can specify Partitioner, Chunker, Prompter, and Embedder nodes.

  • The order of the nodes in the workflow_nodes array will be the same order that these nodes appear in the DAG, with the first node in the array added directly after the Source node. The Destination node follows the last node in the array.

  • Be sure to specify nodes in the allowed order. The following DAG placements are all allowed:

    • Source -> Partitioner -> Destination,

    • Source -> Partitioner -> Chunker -> Destination,

    • Source -> Partitioner -> Chunker -> Embedder -> Destination,

    • Source -> Partitioner -> Prompter -> Chunker -> Destination,

    • Source -> Partitioner -> Prompter -> Chunker -> Embedder -> Destination

Partitioner node A Partitioner node has a type of partition and a subtype of auto, vlm, hi_res, or fast.

Examples:

  • auto strategy: { "name": "Partitioner", "type": "partition", "subtype": "vlm", "settings": { "provider": "anthropic", (required) "model": "claude-sonnet-4-20250514", (required) "output_format": "text/html", "user_prompt": null, "format_html": true, "unique_element_ids": true, "is_dynamic": true, "allow_fast": true } }

  • vlm strategy: Allowed values are provider and model. Below are examples: - "provider": "anthropic" "model": "claude-sonnet-4-20250514", - "provider": "openai" "model": "gpt-4o"

  • hi_res strategy: { "name": "Partitioner", "type": "partition", "subtype": "unstructured_api", "settings": { "strategy": "hi_res", "include_page_breaks": <true|false>, "pdf_infer_table_structure": <true|false>, "exclude_elements": [ "", "" ], "xml_keep_tags": <true|false>, "encoding": "", "ocr_languages": [ "", "" ], "extract_image_block_types": [ "image", "table" ], "infer_table_structure": <true|false> } }

  • fast strategy { "name": "Partitioner", "type": "partition", "subtype": "unstructured_api", "settings": { "strategy": "fast", "include_page_breaks": <true|false>, "pdf_infer_table_structure": <true|false>, "exclude_elements": [ "", "" ], "xml_keep_tags": <true|false>, "encoding": "", "ocr_languages": [ "", "" ], "extract_image_block_types": [ "image", "table" ], "infer_table_structure": <true|false> } }

Chunker node A Chunker node has a type of chunk and subtype of chunk_by_character or chunk_by_title.

  • chunk_by_character { "name": "Chunker", "type": "chunk", "subtype": "chunk_by_character", "settings": { "include_orig_elements": <true|false>, "new_after_n_chars": , (required, if not provided set same as max_characters) "max_characters": , (required) "overlap": , (required, if not provided set default to 0) "overlap_all": <true|false>, "contextual_chunking_strategy": "v1" } }

  • chunk_by_title { "name": "Chunker", "type": "chunk", "subtype": "chunk_by_title", "settings": { "multipage_sections": <true|false>, "combine_text_under_n_chars": , "include_orig_elements": <true|false>, "new_after_n_chars": , (required, if not provided set same as max_characters) "max_characters": , (required) "overlap": , (required, if not provided set default to 0) "overlap_all": <true|false>, "contextual_chunking_strategy": "v1" } }

Prompter node An Prompter node has a type of prompter and subtype of:

  • openai_image_description,

  • anthropic_image_description,

  • bedrock_image_description,

  • vertexai_image_description,

  • openai_table_description,

  • anthropic_table_description,

  • bedrock_table_description,

  • vertexai_table_description,

  • openai_table2html,

  • openai_ner

Example: { "name": "Prompter", "type": "prompter", "subtype": "", "settings": {} }

Embedder node An Embedder node has a type of embed

Allowed values for subtype and model_name include:

  • "subtype": "azure_openai"

    • "model_name": "text-embedding-3-small"

    • "model_name": "text-embedding-3-large"

    • "model_name": "text-embedding-ada-002"

  • "subtype": "bedrock"

    • "model_name": "amazon.titan-embed-text-v2:0"

    • "model_name": "amazon.titan-embed-text-v1"

    • "model_name": "amazon.titan-embed-image-v1"

    • "model_name": "cohere.embed-english-v3"

    • "model_name": "cohere.embed-multilingual-v3"

  • "subtype": "togetherai":

    • "model_name": "togethercomputer/m2-bert-80M-2k-retrieval"

    • "model_name": "togethercomputer/m2-bert-80M-8k-retrieval"

    • "model_name": "togethercomputer/m2-bert-80M-32k-retrieval"

Example: { "name": "Embedder", "type": "embed", "subtype": "", "settings": { "model_name": "" } }

run_workflow

Run a specific workflow.

Args: workflow_id: ID of the workflow to run Returns: String containing the response from the workflow execution
update_workflow

Update an existing workflow.

Args: workflow_id: ID of the workflow to update workflow_config: A Typed Dictionary containing required fields (destination_id, name, source_id, workflow_type) and non-required fields (schedule, and workflow_nodes) Returns: String containing the updated workflow information
delete_workflow

Delete a specific workflow.

Args: workflow_id: ID of the workflow to delete Returns: String containing the response from the workflow deletion
list_jobs
List jobs via the Unstructured API. Args: workflow_id: Optional workflow ID to filter by status: Optional job status to filter by Returns: String containing the list of jobs
get_job_info

Get detailed information about a specific job.

Args: job_id: ID of the job to get information for Returns: String containing the job information
cancel_job

Delete a specific job.

Args: job_id: ID of the job to cancel Returns: String containing the response from the job cancellation
list_workflows_with_finished_jobs
List workflows with finished jobs via the Unstructured API. Args: source_type: Optional source connector type to filter by destination_type: Optional destination connector type to filter by Returns: String containing the list of workflows with finished jobs and source and destination details

Prompts

Interactive templates invoked by user choice

NameDescription

No prompts

Resources

Contextual data attached and managed by the client

NameDescription

No resources

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Unstructured-IO/UNS-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server