Server Configuration
Describes the environment variables required to run the server.
| Name | Required | Description | Default |
|---|---|---|---|
| FIRECRAWL_API_KEY | No | API key for using Firecrawl web crawling API features | |
| UNSTRUCTURED_API_KEY | Yes | Your Unstructured API key fetched from https://platform.unstructured.io/app/account/api-keys |
Tools
Functions exposed to the LLM to take actions
| Name | Description |
|---|---|
| create_source_connector | Create a source connector based on type. Args: ctx: Context object with the request and lifespan context name: A unique name for this connector source_type: The type of source being created (e.g., 'azure', 'onedrive', 'salesforce', 'gdrive', 's3', 'sharepoint') type_specific_config:
azure:
remote_url: The Azure Storage remote URL with the format
az://<container-name>/<path/to/file/or/folder/in/container/as/needed>
recursive: (Optional[bool]) Whether to access subfolders
gdrive:
drive_id: The Drive ID for the Google Drive source
recursive: (Optional[bool]) Whether to access subfolders
extensions: (Optional[list[str]]) File extensions to filter
onedrive:
path: The path to the target folder in the OneDrive account
user_pname: The User Principal Name (UPN) for the OneDrive user account
recursive: (Optional[bool]) Whether to access subfolders
authority_url: (Optional[str]) The authentication token provider URL
s3:
remote_url: The S3 URI to the bucket or folder (e.g., s3://my-bucket/)
recursive: (Optional[bool]) Whether to access subfolders
salesforce:
username: The Salesforce username
categories: (Optional[list[str]]) Optional Salesforce domain,the names of the
Salesforce categories (objects) that you want to access, specified as
a comma-separated list. Available categories include Account, Campaign,
Case, EmailMessage, and Lead.
sharepoint:
site: The SharePoint site to connect to
user_pname: The username for the SharePoint site
path: (Optional) The path within the SharePoint site
recursive: (Optional[bool]) Whether to access subfolders
authority_url: (Optional[str]) The authority URL for authentication
Returns:
String containing the created source connector information |
| update_source_connector | Update a source connector based on type. Args:
ctx: Context object with the request and lifespan context
source_id: ID of the source connector to update
source_type: The type of source being updated (e.g., 'azure', 'onedrive',
'salesforce', 'gdrive', 's3', 'sharepoint')
type_specific_config:
azure:
remote_url: (Optional[str]) The Azure Storage remote URL with the format
az://<container-name>/<path/to/file/or/folder/in/container/as/needed>
recursive: (Optional[bool]) Whether to access subfolders
gdrive:
drive_id: (Optional[str]) The Drive ID for the Google Drive source
recursive: (Optional[bool]) Whether to access subfolders
extensions: (Optional[list[str]]) File extensions to filter
onedrive:
path: (Optional[str]) The path to the target folder in the OneDrive account
user_pname: (Optional[str]) The User Principal Name (UPN) for the OneDrive
user account
recursive: (Optional[bool]) Whether to access subfolders
authority_url: (Optional[str]) The authentication token provider URL
s3:
remote_url: (Optional[str]) The S3 URI to the bucket or folder
(e.g., s3://my-bucket/)
recursive: (Optional[bool]) Whether to access subfolders
salesforce:
username: (Optional[str]) The Salesforce username
categories: (Optional[list[str]]) Optional Salesforce domain,the names of the
Salesforce categories (objects) that you want to access, specified as
a comma-separated list. Available categories include Account, Campaign,
Case, EmailMessage, and Lead.
sharepoint:
site: Optional([str]) The SharePoint site to connect to
user_pname: Optional([str]) The username for the SharePoint site
path: (Optional) The path within the SharePoint site
recursive: (Optional[bool]) Whether to access subfolders
authority_url: (Optional[str]) The authority URL for authentication
Returns:
String containing the updated source connector information |
| delete_source_connector | Delete a source connector. Args:
source_id: ID of the source connector to delete
Returns:
String containing the result of the deletion |
| create_destination_connector | Create a destination connector based on type. Args:
ctx: Context object with the request and lifespan context
name: A unique name for this connector
destination_type: The type of destination being created
type_specific_config:
astradb:
collection_name: The AstraDB collection name
keyspace: The AstraDB keyspace
batch_size: (Optional[int]) The batch size for inserting documents
databricks_delta_table:
catalog: Name of the catalog in Databricks Unity Catalog
database: The database in Unity Catalog
http_path: The cluster’s or SQL warehouse’s HTTP Path value
server_hostname: The Databricks cluster’s or SQL warehouse’s Server Hostname value
table_name: The name of the table in the schema
volume: Name of the volume associated with the schema.
schema: (Optional[str]) Name of the schema associated with the volume
volume_path: (Optional[str]) Any target folder path within the volume, starting
from the root of the volume.
databricks_volumes:
catalog: Name of the catalog in Databricks
host: The Databricks host URL
volume: Name of the volume associated with the schema
schema: (Optional[str]) Name of the schema associated with the volume. The default
value is "default".
volume_path: (Optional[str]) Any target folder path within the volume,
starting from the root of the volume.
mongodb:
database: The name of the MongoDB database
collection: The name of the MongoDB collection
neo4j:
database: The Neo4j database, e.g. "neo4j"
uri: The Neo4j URI e.g. neo4j+s://<neo4j_instance_id>.databases.neo4j.io
batch_size: (Optional[int]) The batch size for the connector
pinecone:
index_name: The Pinecone index name
namespace: (Optional[str]) The pinecone namespace, a folder inside the
pinecone index
batch_size: (Optional[int]) The batch size
s3:
remote_url: The S3 URI to the bucket or folder
weaviate:
cluster_url: URL of the Weaviate cluster
collection: Name of the collection in the Weaviate cluster
Note: Minimal schema is required for the collection, e.g. record_id: Text
Returns:
String containing the created destination connector information |
| update_destination_connector | Update a destination connector based on type. Args:
ctx: Context object with the request and lifespan context
destination_id: ID of the destination connector to update
destination_type: The type of destination being updated
type_specific_config:
astradb:
collection_name: (Optional[str]): The AstraDB collection name
keyspace: (Optional[str]): The AstraDB keyspace
batch_size: (Optional[int]) The batch size for inserting documents
databricks_delta_table:
catalog: (Optional[str]): Name of the catalog in Databricks Unity Catalog
database: (Optional[str]): The database in Unity Catalog
http_path: (Optional[str]): The cluster’s or SQL warehouse’s HTTP Path value
server_hostname: (Optional[str]): The Databricks cluster’s or SQL warehouse’s
Server Hostname value
table_name: (Optional[str]): The name of the table in the schema
volume: (Optional[str]): Name of the volume associated with the schema.
schema: (Optional[str]) Name of the schema associated with the volume
volume_path: (Optional[str]) Any target folder path within the volume, starting
from the root of the volume.
databricks_volumes:
catalog: (Optional[str]): Name of the catalog in Databricks
host: (Optional[str]): The Databricks host URL
volume: (Optional[str]): Name of the volume associated with the schema
schema: (Optional[str]) Name of the schema associated with the volume. The default
value is "default".
volume_path: (Optional[str]) Any target folder path within the volume,
starting from the root of the volume.
mongodb:
database: (Optional[str]): The name of the MongoDB database
collection: (Optional[str]): The name of the MongoDB collection
neo4j:
database: (Optional[str]): The Neo4j database, e.g. "neo4j"
uri: (Optional[str]): The Neo4j URI
e.g. neo4j+s://<neo4j_instance_id>.databases.neo4j.io
batch_size: (Optional[int]) The batch size for the connector
pinecone:
index_name: (Optional[str]): The Pinecone index name
namespace: (Optional[str]) The pinecone namespace, a folder inside the
pinecone index
batch_size: (Optional[int]) The batch size
s3:
remote_url: (Optional[str]): The S3 URI to the bucket or folder
weaviate:
cluster_url: (Optional[str]): URL of the Weaviate cluster
collection: (Optional[str]): Name of the collection in the Weaviate cluster
Note: Minimal schema is required for the collection, e.g. record_id: Text
Returns:
String containing the updated destination connector information |
| delete_destination_connector | Delete a destination connector. Args:
destination_id: ID of the destination connector to delete
Returns:
String containing the result of the deletion |
| invoke_firecrawl_crawlhtml | Start an asynchronous web crawl job using Firecrawl to retrieve HTML content. Args:
url: URL to crawl
s3_uri: S3 URI where results will be uploaded
limit: Maximum number of pages to crawl (default: 100)
Returns:
Dictionary with crawl job information including the job ID |
| check_crawlhtml_status | Check the status of an existing Firecrawl HTML crawl job. Args:
crawl_id: ID of the crawl job to check
Returns:
Dictionary containing the current status of the crawl job |
| invoke_firecrawl_llmtxt | Start an asynchronous llmfull.txt generation job using Firecrawl. This file is a standardized markdown file containing information to help LLMs use a website at inference time. The llmstxt endpoint leverages Firecrawl to crawl your website and extracts data using gpt-4o-mini Args: url: URL to crawl s3_uri: S3 URI where results will be uploaded max_urls: Maximum number of pages to crawl (1-100, default: 10) Returns:
Dictionary with job information including the job ID |
| check_llmtxt_status | Check the status of an existing llmfull.txt generation job. Args:
job_id: ID of the llmfull.txt generation job to check
Returns:
Dictionary containing the current status of the job and text content if completed |
| cancel_crawlhtml_job | Cancel an in-progress Firecrawl HTML crawl job. Args:
crawl_id: ID of the crawl job to cancel
Returns:
Dictionary containing the result of the cancellation |
| partition_local_file | Transform a local file into structured data using the Unstructured API.
Args:
input_file_path: The absolute path to the file.
output_file_dir: The absolute path to the directory where the output file should be saved.
strategy: The strategy for transformation.
Available strategies:
VLM - most advanced transformation suitable for difficult PDFs and Images
hi_res - high resolution transformation suitable for most document types
fast - fast transformation suitable for PDFs with extractable text
auto - automatically choose the best strategy based on the input file
vlm_model: The VLM model to use for the transformation.
vlm_model_provider: The VLM model provider to use for the transformation.
output_type: The type of output to generate. Options: 'json' for json
or 'md' for markdown.
Returns:
A string containing the structured data or a message indicating the output file
path with the structured data. |
| list_sources | List available sources from the Unstructured API.
Args:
source_type: Optional source connector type to filter by
Returns:
String containing the list of sources |
| get_source_info | Get detailed information about a specific source connector. Args:
source_id: ID of the source connector to get information for, should be valid UUID
Returns:
String containing the source connector information |
| list_destinations | List available destinations from the Unstructured API. Args:
destination_type: Optional destination connector type to filter by
Returns:
String containing the list of destinations |
| get_destination_info | Get detailed information about a specific destination connector. Args:
destination_id: ID of the destination connector to get information for
Returns:
String containing the destination connector information |
| list_workflows | List workflows from the Unstructured API.
Args:
destination_id: Optional destination connector ID to filter by
source_id: Optional source connector ID to filter by
status: Optional workflow status to filter by
Returns:
String containing the list of workflows |
| get_workflow_info | Get detailed information about a specific workflow. Args:
workflow_id: ID of the workflow to get information for
Returns:
String containing the workflow information |
| create_workflow | Create a new workflow. Args:
workflow_config: A Typed Dictionary containing required fields (destination_id - should be a
valid UUID, name, source_id - should be a valid UUID, workflow_type) and non-required fields
(schedule, and workflow_nodes). Note workflow_nodes is only enabled when workflow_type
is `custom` and is a list of WorkflowNodeTypedDict: partition, prompter,chunk, embed
Below is an example of a partition workflow node:
{
"name": "vlm-partition",
"type": "partition",
"sub_type": "vlm",
"settings": {
"provider": "your favorite provider",
"model": "your favorite model"
}
}
Returns:
String containing the created workflow information Custom workflow DAG nodes
Partitioner node A Partitioner node has a type of partition and a subtype of auto, vlm, hi_res, or fast. Examples:
Chunker node A Chunker node has a type of chunk and subtype of chunk_by_character or chunk_by_title.
Prompter node An Prompter node has a type of prompter and subtype of:
Example: { "name": "Prompter", "type": "prompter", "subtype": "", "settings": {} } Embedder node An Embedder node has a type of embed Allowed values for subtype and model_name include:
Example: { "name": "Embedder", "type": "embed", "subtype": "", "settings": { "model_name": "" } } |
| run_workflow | Run a specific workflow. Args:
workflow_id: ID of the workflow to run
Returns:
String containing the response from the workflow execution |
| update_workflow | Update an existing workflow. Args:
workflow_id: ID of the workflow to update
workflow_config: A Typed Dictionary containing required fields (destination_id,
name, source_id, workflow_type) and non-required fields (schedule, and workflow_nodes)
Returns:
String containing the updated workflow information |
| delete_workflow | Delete a specific workflow. Args:
workflow_id: ID of the workflow to delete
Returns:
String containing the response from the workflow deletion |
| list_jobs | List jobs via the Unstructured API.
Args:
workflow_id: Optional workflow ID to filter by
status: Optional job status to filter by
Returns:
String containing the list of jobs |
| get_job_info | Get detailed information about a specific job. Args:
job_id: ID of the job to get information for
Returns:
String containing the job information |
| cancel_job | Delete a specific job. Args:
job_id: ID of the job to cancel
Returns:
String containing the response from the job cancellation |
| list_workflows_with_finished_jobs | List workflows with finished jobs via the Unstructured API.
Args:
source_type: Optional source connector type to filter by
destination_type: Optional destination connector type to filter by
Returns:
String containing the list of workflows with finished jobs and source and destination
details |
Prompts
Interactive templates invoked by user choice
| Name | Description |
|---|---|
No prompts | |
Resources
Contextual data attached and managed by the client
| Name | Description |
|---|---|
No resources | |