Server Configuration
Describes the environment variables required to run the server.
Name | Required | Description | Default |
---|---|---|---|
FIRECRAWL_API_KEY | No | API key for using Firecrawl web crawling API features | |
UNSTRUCTURED_API_KEY | Yes | Your Unstructured API key fetched from https://platform.unstructured.io/app/account/api-keys |
Schema
Prompts
Interactive templates invoked by user choice
Name | Description |
---|---|
No prompts |
Resources
Contextual data attached and managed by the client
Name | Description |
---|---|
No resources |
Tools
Functions exposed to the LLM to take actions
Name | Description |
---|---|
create_s3_source | Create an S3 source connector. Args:
name: A unique name for this connector
remote_url: The S3 URI to the bucket or folder (e.g., s3://my-bucket/)
recursive: Whether to access subfolders within the bucket
Returns:
String containing the created source connector information |
update_s3_source | Update an S3 source connector. Args:
source_id: ID of the source connector to update
remote_url: The S3 URI to the bucket or folder
recursive: Whether to access subfolders within the bucket
Returns:
String containing the updated source connector information |
delete_s3_source | Delete an S3 source connector. Args:
source_id: ID of the source connector to delete
Returns:
String containing the result of the deletion |
create_azure_source | Create an Azure source connector. Args:
name: A unique name for this connector
remote_url: The Azure Storage remote URL,
with the format az://<container-name>/<path/to/file/or/folder/in/container/as/needed>
recursive: Whether to access subfolders within the bucket
Returns:
String containing the created source connector information |
update_azure_source | Update an azure source connector. Args:
source_id: ID of the source connector to update
remote_url: The Azure Storage remote URL, with the format
az://<container-name>/<path/to/file/or/folder/in/container/as/needed>
recursive: Whether to access subfolders within the bucket
Returns:
String containing the updated source connector information |
delete_azure_source | Delete an azure source connector. Args:
source_id: ID of the source connector to delete
Returns:
String containing the result of the deletion |
create_gdrive_source | Create an gdrive source connector. Args:
name: A unique name for this connector
remote_url: The gdrive URI to the bucket or folder (e.g., gdrive://my-bucket/)
recursive: Whether to access subfolders within the bucket
Returns:
String containing the created source connector information |
update_gdrive_source | Update an gdrive source connector. Args:
source_id: ID of the source connector to update
remote_url: The gdrive URI to the bucket or folder
recursive: Whether to access subfolders within the bucket
Returns:
String containing the updated source connector information |
delete_gdrive_source | Delete an gdrive source connector. Args:
source_id: ID of the source connector to delete
Returns:
String containing the result of the deletion |
create_s3_destination | Create an S3 destination connector. Args:
name: A unique name for this connector
remote_url: The S3 URI to the bucket or folder
key: The AWS access key ID
secret: The AWS secret access key
token: The AWS STS session token for temporary access (optional)
endpoint_url: Custom URL if connecting to a non-AWS S3 bucket
Returns:
String containing the created destination connector information |
update_s3_destination | Update an S3 destination connector. Args:
destination_id: ID of the destination connector to update
remote_url: The S3 URI to the bucket or folder
Returns:
String containing the updated destination connector information |
delete_s3_destination | Delete an S3 destination connector. Args:
destination_id: ID of the destination connector to delete
Returns:
String containing the result of the deletion |
create_weaviate_destination | Create an weaviate vector database destination connector. Args:
cluster_url: URL of the weaviate cluster
collection : Name of the collection to use in the weaviate cluster
Note: The collection is a table in the weaviate cluster.
In platform, there are dedicated code to generate collection for users
here, due to the simplicity of the server, we are not generating it for users.
Returns:
String containing the created destination connector information |
update_weaviate_destination | Update an weaviate destination connector. Args:
destination_id: ID of the destination connector to update
cluster_url (optional): URL of the weaviate cluster
collection (optional): Name of the collection(like a file) to use in the weaviate cluster
Returns:
String containing the updated destination connector information |
delete_weaviate_destination | Delete an weaviate destination connector. Args:
destination_id: ID of the destination connector to delete
Returns:
String containing the result of the deletion |
create_astradb_destination | Create an AstraDB destination connector. Args:
name: A unique name for this connector
collection_name: The name of the collection to use
keyspace: The AstraDB keyspace
batch_size: The batch size for inserting documents, must be positive (default: 20)
Note: A collection in AstraDB is a schemaless document store optimized for NoSQL workloads,
equivalent to a table in traditional databases.
A keyspace is the top-level namespace in AstraDB that groups multiple collections.
We require the users to create their own collection and keyspace before creating the connector.
Returns:
String containing the created destination connector information |
update_astradb_destination | Update an AstraDB destination connector. Args:
destination_id: ID of the destination connector to update
collection_name: The name of the collection to use (optional)
keyspace: The AstraDB keyspace (optional)
batch_size: The batch size for inserting documents (optional)
Note: We require the users to create their own collection and keyspace before creating the connector.
Returns:
String containing the updated destination connector information |
delete_astradb_destination | Delete an AstraDB destination connector. Args:
destination_id: ID of the destination connector to delete
Returns:
String containing the result of the deletion |
create_neo4j_destination | Create an neo4j destination connector. Args:
name: A unique name for this connector
database: The neo4j database, e.g. "neo4j"
uri: The neo4j URI, e.g. neo4j+s://<neo4j_instance_id>.databases.neo4j.io
username: The neo4j username
Returns:
String containing the created destination connector information |
update_neo4j_destination | Update an neo4j destination connector. Args:
destination_id: ID of the destination connector to update
database: The neo4j database, e.g. "neo4j"
uri: The neo4j URI, e.g. neo4j+s://<neo4j_instance_id>.databases.neo4j.io
username: The neo4j username
Returns:
String containing the updated destination connector information |
delete_neo4j_destination | Delete an neo4j destination connector. Args:
destination_id: ID of the destination connector to delete
Returns:
String containing the result of the deletion |
invoke_firecrawl_crawlhtml | Start an asynchronous web crawl job using Firecrawl to retrieve HTML content. Args:
url: URL to crawl
s3_uri: S3 URI where results will be uploaded
limit: Maximum number of pages to crawl (default: 100)
Returns:
Dictionary with crawl job information including the job ID |
check_crawlhtml_status | Check the status of an existing Firecrawl HTML crawl job. Args:
crawl_id: ID of the crawl job to check
Returns:
Dictionary containing the current status of the crawl job |
invoke_firecrawl_llmtxt | Start an asynchronous llmfull.txt generation job using Firecrawl. This file is a standardized markdown file containing information to help LLMs use a website at inference time. The llmstxt endpoint leverages Firecrawl to crawl your website and extracts data using gpt-4o-mini Args: url: URL to crawl s3_uri: S3 URI where results will be uploaded max_urls: Maximum number of pages to crawl (1-100, default: 10) Returns:
Dictionary with job information including the job ID |
check_llmtxt_status | Check the status of an existing llmfull.txt generation job. Args:
job_id: ID of the llmfull.txt generation job to check
Returns:
Dictionary containing the current status of the job and text content if completed |
cancel_crawlhtml_job | Cancel an in-progress Firecrawl HTML crawl job. Args:
crawl_id: ID of the crawl job to cancel
Returns:
Dictionary containing the result of the cancellation |
list_sources | List available sources from the Unstructured API.
Args:
source_type: Optional source connector type to filter by
Returns:
String containing the list of sources |
get_source_info | Get detailed information about a specific source connector. Args:
source_id: ID of the source connector to get information for, should be valid UUID
Returns:
String containing the source connector information |
list_destinations | List available destinations from the Unstructured API. Args:
destination_type: Optional destination connector type to filter by
Returns:
String containing the list of destinations |
get_destination_info | Get detailed information about a specific destination connector. Args:
destination_id: ID of the destination connector to get information for
Returns:
String containing the destination connector information |
list_workflows | List workflows from the Unstructured API.
Args:
destination_id: Optional destination connector ID to filter by
source_id: Optional source connector ID to filter by
status: Optional workflow status to filter by
Returns:
String containing the list of workflows |
get_workflow_info | Get detailed information about a specific workflow. Args:
workflow_id: ID of the workflow to get information for
Returns:
String containing the workflow information |
create_workflow | Create a new workflow. Args:
workflow_config: A Typed Dictionary containing required fields (destination_id - should be a
valid UUID, name, source_id - should be a valid UUID, workflow_type) and non-required fields
(schedule, and workflow_nodes). Note workflow_nodes is only enabled when workflow_type
is `custom` and is a list of WorkflowNodeTypedDict: partition, prompter,chunk, embed
Below is an example of a partition workflow node:
{
"name": "vlm-partition",
"type": "partition",
"sub_type": "vlm",
"settings": {
"provider": "your favorite provider",
"model": "your favorite model"
}
}
Returns:
String containing the created workflow information Custom workflow DAG nodes
Partitioner node A Partitioner node has a type of partition and a subtype of auto, vlm, hi_res, or fast. Examples:
Chunker node A Chunker node has a type of chunk and subtype of chunk_by_character or chunk_by_title.
Prompter node An Prompter node has a type of prompter and subtype of:
Example: { "name": "Prompter", "type": "prompter", "subtype": "", "settings": {} } Embedder node An Embedder node has a type of embed Allowed values for subtype and model_name include:
Example: { "name": "Embedder", "type": "embed", "subtype": "", "settings": { "model_name": "" } } |
run_workflow | Run a specific workflow. Args:
workflow_id: ID of the workflow to run
Returns:
String containing the response from the workflow execution |
update_workflow | Update an existing workflow. Args:
workflow_id: ID of the workflow to update
workflow_config: A Typed Dictionary containing required fields (destination_id,
name, source_id, workflow_type) and non-required fields (schedule, and workflow_nodes)
Returns:
String containing the updated workflow information Custom workflow DAG nodes
Partitioner node A Partitioner node has a type of partition and a subtype of auto, vlm, hi_res, or fast. Examples:
Chunker node A Chunker node has a type of chunk and subtype of chunk_by_character or chunk_by_title.
Prompter node An Prompter node has a type of prompter and subtype of:
Example: { "name": "Prompter", "type": "prompter", "subtype": "", "settings": {} } Embedder node An Embedder node has a type of embed Allowed values for subtype and model_name include:
Example: { "name": "Embedder", "type": "embed", "subtype": "", "settings": { "model_name": "" } } |
delete_workflow | Delete a specific workflow. Args:
workflow_id: ID of the workflow to delete
Returns:
String containing the response from the workflow deletion |
list_jobs | List jobs via the Unstructured API.
Args:
workflow_id: Optional workflow ID to filter by
status: Optional job status to filter by
Returns:
String containing the list of jobs |
get_job_info | Get detailed information about a specific job. Args:
job_id: ID of the job to get information for
Returns:
String containing the job information |
cancel_job | Delete a specific job. Args:
job_id: ID of the job to cancel
Returns:
String containing the response from the job cancellation |