Skip to main content
Glama
IBM

IBM watsonx.data MCP Server

Official
by IBM

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault
WATSONX_DATA_API_KEYYesIBM Cloud API key for authentication
WATSONX_DATA_BASE_URLYesThe base URL of the watsonx.data instance, e.g., https://us-south.lakehouse.cloud.ibm.com/lakehouse/api
WATSONX_DATA_INSTANCE_IDYesThe CRN (Cloud Resource Name) of the watsonx.data instance

Capabilities

Features and capabilities supported by this server

CapabilityDetails
tools
{
  "listChanged": true
}
logging
{}
prompts
{
  "listChanged": false
}
resources
{
  "subscribe": false,
  "listChanged": false
}
extensions
{
  "io.modelcontextprotocol/ui": {}
}
experimental
{}

Tools

Functions exposed to the LLM to take actions

NameDescription
add_columnsA

Add one or more columns to a table in a watsonx.data schema.

Note: Data types are typically specified in lowercase (e.g., "varchar", "int", "decimal"). While the API pattern allows both cases, lowercase is recommended for compatibility.

Args: catalog_name: Catalog containing the table (e.g., "iceberg_data") schema_name: Schema containing the table table_name: Table to add columns to columns: List of column definitions, each with: name (required), type (required, lowercase recommended), comment (optional), extra (optional), precision (optional), scale (optional) engine_id: Engine ID to use for the operation (from list_engines)

Returns: Dict with: - columns: List of added column details - total_count: Number of columns added

create_schemaA

Create a new schema in a watsonx.data catalog.

IMPORTANT RESTRICTIONS: - Engine must be a Presto or Prestissimo engine (Spark engines are not supported) - Catalog must be an object storage catalog (Iceberg or Hive format) - Datasource catalogs (e.g., from external databases) are NOT supported for schema creation

Args: catalog_id: Catalog identifier - must be an object storage catalog (e.g., "iceberg_data", "hive_data") schema_name: Name for the new schema engine_id: Engine ID to use for the operation - must be a Presto or Prestissimo engine (from list_engines) custom_path: Path within bucket where schema will be created (REQUIRED, must be at least 1 character). If unsure, use the schema_name as the custom_path. storage_name: Storage/bucket name for the schema (REQUIRED for object storage catalogs)

Returns: Dict with: - name: Name of the created schema - catalog_name: Parent catalog name - custom_path: Custom path used - storage_name: Storage name if specified

Raises: Error if: - Engine is not Presto or Prestissimo type - Catalog is a datasource catalog (not object storage) - custom_path is empty or storage_name is not provided

describe_tableA

Get detailed schema and metadata for a watsonx.data table.

Args: catalog_name: Catalog containing the table (e.g., "iceberg_data", "tpch") schema_name: Schema containing the table (from list_schemas) table_name: Table to describe (from list_tables) engine_id: Engine ID for metadata retrieval (from list_engines)

Returns: Dict with: - name: Table name - catalog_name, schema_name: Echo of inputs - columns: List of column objects with: - name: Column name (required) - type: SQL data type (required) - comment: Optional comment/description - extra: Optional extra attributes (e.g., AUTO_INCREMENT) - length: Optional length for VARCHAR/CHAR types - precision: Optional precision for DECIMAL types - scale: Optional scale for DECIMAL types - column_count: Total number of columns - engine_id: Echo of input

list_schemasA

List database schemas in a watsonx.data catalog.

Args: catalog_name: Catalog to list schemas from (e.g., "iceberg_data", "hive_data", "tpch") engine_id: Engine ID for metadata queries (from list_engines)

Returns: Dict with: - schemas: List of schema objects with schema_name, catalog_name - total_count: Number of schemas found - catalog_name, engine_id: Echo of inputs

list_tablesA

List tables in a watsonx.data schema.

Args: catalog_name: Catalog containing the schema (e.g., "iceberg_data", "hive_data") schema_name: Schema/database containing tables (from list_schemas) engine_id: Engine ID for metadata queries (from list_engines)

Returns: Dict with: - tables: List of table names (strings) - total_count: Number of tables in schema - catalog_name, schema_name, engine_id: Echo of inputs

rename_columnA

Rename a column in a table in a watsonx.data schema.

Args: catalog_name: Catalog containing the table (e.g., "iceberg_data") schema_name: Schema containing the table table_name: Table containing the column column_name: Current column name new_column_name: New name for the column engine_id: Engine ID to use for the operation (from list_engines)

Returns: Dict with column details including the new name

rename_tableA

Rename a table in a watsonx.data schema.

Args: catalog_name: Catalog containing the table (e.g., "iceberg_data") schema_name: Schema containing the table table_name: Current table name new_table_name: New name for the table engine_id: Engine ID to use for the operation (from list_engines)

Returns: Dict with table details including the new name

create_presto_engineA

Create a new Presto engine in watsonx.data.

EXAMPLE PAYLOAD: { "origin": "native", "display_name": "My-Presto-Engine", "description": "Presto engine with autoscaling", "tags": [], "associated_catalogs": [], "configuration": { "size_config": "custom", "coordinator": { "node_type": "starter", "quantity": 1 }, "worker": { "node_type": "starter", "quantity": 1 }, "autoscaling_enabled": true, "autoscaling_config": { "type": "cpu", "target": 40, "min_worker_quantity": 1, "max_worker_quantity": 18, "query_termination_grace_period_min": 1, "scale_in_stabilization_window_min": 5, "scaling_step_size": 1 } } }

Args: origin: (required) "native" display_name: (required) Display name for the engine configuration: (required) Engine configuration with required fields: - size_config: (required) "custom" (recommended) or predefined options (may be supported) - coordinator: (required) {"node_type": typically "starter" or "cache_optimized", "quantity": 1} - worker: (required) {"node_type": typically "starter" or "cache_optimized", "quantity": 1-18 recommended} - autoscaling_enabled: (optional) boolean to enable autoscaling - autoscaling_config: (required if autoscaling_enabled is true) autoscaling configuration object (see AUTOSCALING section) associated_catalogs: (optional) List of catalog names to associate description: (optional) Engine description 50 characters max engine_id: (optional) Custom engine ID (must match pattern: presto-0 through presto-1000) tags: (optional) Tags for the engine

AUTOSCALING (OPTIONAL): To enable autoscaling, include these fields in the configuration:

  • autoscaling_enabled: true (boolean)

  • autoscaling_config: { "type": "cpu" or "memory", "target": 1-100 (target utilization percentage, e.g., 40), "min_worker_quantity": 1-18 (minimum workers), "max_worker_quantity": 1-18 (maximum workers), "query_termination_grace_period_min": 1-120 (grace period before terminating queries), "scale_in_stabilization_window_min": 5-60 (stabilization window for scale-in), "scaling_step_size": 1-18 (nodes to add/remove per scaling action) }

PREDEFINED SIZE CONFIGS: If using predefined configs, exact node types and quantities must match:

  • starter: 1 coordinator + 1 worker (both bx2.48x192)

  • small: 1 coordinator + 3 workers (both ox2.16x128)

  • medium: 1 coordinator + 6 workers (both ox2.16x128)

  • large: 1 coordinator + 12 workers (both ox2.16x128)

Returns: Dict with created engine details including engine_id

create_prestissimo_engineA

Create a new Prestissimo engine in watsonx.data.

EXAMPLE PAYLOAD: { "origin": "native", "display_name": "My-Prestissimo-Engine", "description": "Prestissimo engine for fast queries", "tags": [], "associated_catalogs": [], "configuration": { "size_config": "custom", "coordinator": { "node_type": "starter", "quantity": 1 }, "worker": { "node_type": "starter", "quantity": 1 } } }

Args: origin: "native" display_name: (required) Display name for the engine configuration: (required) Engine configuration with required fields: - size_config: (required) "custom" (recommended) or predefined options (may be supported) - coordinator: (required) {"node_type": typically "starter" or "cache_optimized", "quantity": 1} - worker: (required) {"node_type": typically "starter" or "cache_optimized", "quantity": 1-18 recommended} associated_catalogs: (optional) List of catalog names to associate description: (optional) Engine description 50 characters max engine_id: (optional) Custom engine ID (must match pattern: prestissimo-0 to prestissimo-1000) tags: (optional) Tags for the engine

CUSTOM SIZE CONFIG (RECOMMENDED):

  • Coordinator: 1 node (always), node_type: "starter" or "cache_optimized"

  • Worker: 1-18 nodes (recommended), node_type: "starter" or "cache_optimized"

  • Node types do NOT need to match (e.g., starter coordinator + cache_optimized worker is allowed)

PREDEFINED SIZE CONFIGS: If using predefined configs, exact node types and quantities must match:

  • starter: 1 coordinator + 1 worker (both bx2.48x192)

  • small: 1 coordinator + 3 workers (both ox2.16x128)

  • medium: 1 coordinator + 6 workers (both ox2.16x128)

  • large: 1 coordinator + 12 workers (both ox2.16x128)

Returns: Dict with created engine details including engine_id

create_spark_engineB

Create a new Spark engine in watsonx.data.

Args: origin: Engine origin - "native", "external", or "discover" display_name: Display name for the engine storage_name: Storage/bucket name for engine_home (REQUIRED) associated_catalogs: List of catalog IDs to associate description: Engine description default_version: Spark version (default: "3.5") default_config: Additional engine configuration tags: Tags for the engine

Returns: Dict with created engine details including engine_id

list_enginesA

List available Presto, Prestissimo, and Spark compute engines in watsonx.data.

Args: engine_type: Optional filter - "presto", "prestissimo", "spark", or None for all engines

Returns: Dict with: - engines: List of engine objects with engine_id, display_name, type, status, size (if applicable), created_on, created_by, associated_catalogs - summary: Counts by type and status (total_count, presto_count, prestissimo_count, spark_count, by_status)

pause_presto_engineB

Pause a running Presto engine in watsonx.data.

Args: engine_id: Engine identifier

Returns: Dict with pause operation status and engine state transition

pause_prestissimo_engineB

Pause a running Prestissimo engine in watsonx.data.

Args: engine_id: Engine identifier

Returns: Dict with pause operation status and engine state transition

pause_spark_engineA

Pause a running Spark engine in watsonx.data (SAAS only).

Args: engine_id: Engine identifier force: Force pause even if applications are running (default: False)

Returns: Dict with pause operation status

restart_presto_engineC

Restart a Presto engine in watsonx.data.

Args: engine_id: Engine identifier

Returns: Dict with engine_id, status, message, response

restart_prestissimo_engineC

Restart a Prestissimo engine in watsonx.data.

Args: engine_id: Engine identifier

Returns: Dict with engine_id, status, message, response

resume_presto_engineC

Resume a paused Presto engine in watsonx.data.

Args: engine_id: Engine identifier

Returns: Dict with resume operation status and engine state transition

resume_prestissimo_engineB

Resume a paused Prestissimo engine in watsonx.data.

Args: engine_id: Engine identifier

Returns: Dict with resume operation status and engine state transition

resume_spark_engineB

Resume a paused Spark engine in watsonx.data (SAAS only).

Args: engine_id: Engine identifier

Returns: Dict with resume operation status

scale_presto_engineA

Scale a Presto engine by adjusting coordinator and worker node counts in watsonx.data.

RECOMMENDED NODE TYPES: "starter" or "cache_optimized" (other types may be available)

SCALING CAPABILITIES:

  • Coordinator quantity: Always 1 (cannot be changed)

  • Worker quantity: 1-18 (recommended), up to 50 may be supported

  • Node types CAN be changed during scaling (e.g., from "starter" to "cache_optimized")

  • Coordinator and worker do NOT need to match node types

API REQUIREMENT: Must provide BOTH coordinator AND worker configurations together.

Args: engine_id: Engine identifier coordinator_node_type: Typically "starter" or "cache_optimized". Can be different from current type. coordinator_quantity: Number of coordinator nodes (must be 1 for Presto) worker_node_type: Typically "starter" or "cache_optimized". Can be different from coordinator_node_type. worker_quantity: Number of worker nodes (1-18 recommended, up to 50 may be supported)

Returns: Dict with scaling operation status and new node configuration

scale_prestissimo_engineA

Scale a Prestissimo engine by adjusting coordinator and worker node counts in watsonx.data.

RECOMMENDED NODE TYPES: "starter" or "cache_optimized" (other types may be available)

SCALING CAPABILITIES:

  • Coordinator quantity: Always 1 (cannot be changed)

  • Worker quantity: 1-18 (recommended), up to 50 may be supported

  • Node types CAN be changed during scaling (e.g., from "starter" to "cache_optimized")

  • Coordinator and worker do NOT need to match node types

API REQUIREMENT: Must provide BOTH coordinator AND worker configurations together.

Args: engine_id: Engine identifier coordinator_node_type: Typically "starter" or "cache_optimized". Can be different from current type. coordinator_quantity: Number of coordinator nodes (must be 1 for Prestissimo) worker_node_type: Typically "starter" or "cache_optimized". Can be different from coordinator_node_type. worker_quantity: Number of worker nodes (1-18 recommended, up to 50 may be supported)

Returns: Dict with scaling operation status and new node configuration

scale_spark_engineA

Scale a Spark engine by adjusting node count in watsonx.data (SAAS only).

Args: engine_id: Engine identifier number_of_nodes: Target node count (1-1000)

Returns: Dict with scaling operation status (202 Accepted - asynchronous operation)

update_presto_engineB

Update Presto engine configuration in watsonx.data.

Args: engine_id: Engine identifier description: Updated description display_name: Updated display name engine_properties: Engine configuration properties engine_restart: Set to "force" to trigger restart after update remove_engine_properties: Properties to remove tags: Updated tags

Returns: Dict with updated engine configuration

update_prestissimo_engineA

Update Prestissimo engine configuration in watsonx.data.

IMPORTANT: Due to a known API limitation, the description field cannot be updated on its own. If you want to update the description, you must include at least one additional field in the same call (e.g., display_name, tags, or properties). Calls that modify only description will fail.

Args: engine_id: Engine identifier. description: Updated description (1-50 characters). Must be accompanied by at least one other updatable field in the same request. display_name: Updated display name. properties: Engine configuration properties. restart_type: Set to "force" to trigger a restart after the update. remove_engine_properties: Properties to remove. tags: Updated tags.

Returns: Dict with updated engine configuration

update_spark_engineA

Update Spark engine configuration in watsonx.data.

Args: engine_id: Engine identifier description: Updated description display_name: Updated display name configuration: Engine configuration (default_config, default_version, engine_home) tags: Updated tags

Returns: Dict with updated engine configuration

Note: Spark engines do NOT support engine_restart parameter. Configuration changes may require manual restart.

cancel_ingestion_jobC

Cancel a data ingestion job.

Args: job_id: Job identifier

Returns: Dict with cancellation status

create_ingestion_jobA

Create a data ingestion job to load data into watsonx.data.

Args:
    job_id: Unique job identifier (e.g., "ingestion-1234567890")
    catalog: Target catalog name
    schema: Target schema name
    table: Target table name
    file_paths: Source file path (e.g., "s3://bucket-name/file.csv")
    file_type: Source file type - "csv", "parquet", "json", "orc", "avro" (default: "csv")
    bucket_name: S3 bucket name (extracted from file_paths if not provided)
    bucket_type: Bucket type - "amazon_s3", "aws_s3", "minio", "ibm_cos", "ibm_ceph",
                 "adls_gen1", "adls_gen2", "google_cs", "ibm_storage_scale", "ozone" (default: "ibm_cos")
    write_mode: Write mode - "append", "overwrite" (default: "append")
    engine_id: Spark engine ID to use for ingestion
    field_delimiter: CSV field delimiter (default: ",")
    line_delimiter: CSV line delimiter (default: "

") escape_character: CSV escape character (default: "") header: Whether CSV has header row (default: true) encoding: File encoding (default: "UTF-8") driver_memory: Spark driver memory (default: "2G") driver_cores: Spark driver cores (default: 1) executor_memory: Spark executor memory (default: "2G") executor_cores: Spark executor cores (default: 1) num_executors: Number of Spark executors (default: 1)

Returns:
    Dict with job_id, status, and creation details
get_ingestion_jobB

Get detailed status of a data ingestion job.

Args: job_id: Job identifier

Returns: Dict with detailed job status, configuration, and execution details

list_ingestion_jobsA

List data ingestion jobs on watsonx.data.

Args: start: Offset for pagination (default: 0, 0-based) limit: Number of jobs per page (default: 10, max: 100, -1 for all)

Returns: Dict with ingestion_jobs list containing job details

get_instance_detailsA

Get watsonx.data instance information including status, version, region, and enabled features.

Returns: Dict with instance details: - instance_id: Instance CRN identifier - region: IBM Cloud region (e.g., "us-south") - status: Instance status ("active", "provisioning", "inactive", "failed") - version: watsonx.data version - account_type: Account type ("TRIAL", "ENTERPRISE", "LITE", "STANDARD") - serverless_spark_enabled: Whether Spark is available - public_endpoints_enabled: Public internet access enabled - console_url: Web console URL

execute_insertA

Execute INSERT queries against watsonx.data.

Args: sql: SQL INSERT query to execute (must start with INSERT) catalog_name: Target catalog (e.g., "iceberg_data", "hive_data") schema_name: Default schema for unqualified table names engine_id: Engine to run query on (from list_engines, must be running)

Returns: Dict with: - query_id: Unique query identifier - rows_inserted: Number of rows inserted (if available) - execution_time_ms: Query duration in milliseconds - status: Query execution status - catalog_name, schema_name: Echo of inputs

execute_selectA

Execute read-only SELECT queries against watsonx.data.

Args: sql: SQL SELECT query to execute (must start with SELECT) catalog_name: Target catalog (e.g., "iceberg_data", "tpch") schema_name: Default schema for unqualified table names engine_id: Engine to run query on (from list_engines, must be running) limit: Max rows to return (default: 500 if no LIMIT in query). Note: Using high limits will consume more tokens.

Returns: Dict with: - query_id: Unique query identifier - columns: List of {name, type} objects - rows: List of row data (list of lists) - row_count: Number of rows returned - execution_time_ms: Query duration in milliseconds - catalog_name, schema_name: Echo of inputs

execute_updateA

Execute UPDATE queries against watsonx.data.

Args: sql: SQL UPDATE query to execute (must start with UPDATE) catalog_name: Target catalog (e.g., "iceberg_data", "hive_data") schema_name: Default schema for unqualified table names engine_id: Engine to run query on (from list_engines, must be running)

Returns: Dict with: - query_id: Unique query identifier - rows_updated: Number of rows updated (if available) - execution_time_ms: Query duration in milliseconds - status: Query execution status - catalog_name, schema_name: Echo of inputs Note: Iceberg table updates require at least format version 2 and update mode must be merge-on-read. Use ALTER TABLE to configure: SET PROPERTIES 'format-version' = '2', 'write.update.mode' = 'merge-on-read'

explain_analyze_queryA

Get detailed query analysis with execution statistics in watsonx.data.

Args: engine_id: Presto or Prestissimo engine identifier statement: SQL query to analyze. If query fails, consider using fully qualified table names (catalog.schema.table) engine_type: Engine type - "presto" or "prestissimo" (default: "presto") verbose: Include detailed statistics

Returns: Dict with engine_id, engine_type, statement, analysis, and full response

explain_queryA

Get query execution plan without running the query in watsonx.data.

Args: engine_id: Presto or Prestissimo engine identifier statement: SQL query to explain. If query fails, consider using fully qualified table names (catalog.schema.table) engine_type: Engine type - "presto" or "prestissimo" (default: "presto") format: Output format - "json" or "text" type: Explain type - "logical", "distributed", "validate", or "io"

Returns: Dict with engine_id, engine_type, statement, plan, and full response

get_spark_application_statusA

Get detailed status of a Spark application.

Args: engine_id: Spark engine identifier application_id: Application identifier

Returns: Dict with detailed application status including: - application_id: Application identifier - state: Current state (e.g., "running", "finished", "failed") - start_time: Application start timestamp - end_time: Application end timestamp (if completed) - spark_version: Spark version used - application_details: Configuration and runtime details

list_spark_applicationsB

List Spark applications on a Spark engine.

Args: engine_id: (required) Spark engine identifier state: (optional) Filter by application state (e.g., ["running", "finished", "failed"]) limit: (optional) Maximum number of applications to return (1-1000). Recommended to use smaller values (e.g., 10-50) to avoid exhausting tokens.

Returns: Dict with applications list containing application details

stop_spark_applicationA

Stop and remove a Spark application.

This will terminate a running application and remove it from the engine's history.

Args: engine_id: Spark engine identifier application_id: Application identifier to stop

Returns: Dict with operation status

submit_spark_applicationA

Submit a Spark application for execution on a Spark engine.

Args: engine_id: Spark engine identifier application: Application file path (JAR, Python, R file) arguments: Application arguments array conf: Spark configuration properties (e.g., {"spark.executor.memory": "2g"}) env: Environment variables name: Application name (will be added to conf as spark.app.name) job_endpoint: External job endpoint service_instance_id: Service instance ID - "iae" or "emr" type: Engine type - "spark" or "gluten" context_type: Context type - "project", "git_project", or "space" volumes: Volume mounts (watsonx.data software only). List of dicts with: - name: volume name - mount_path: path in spark cluster (e.g., "/mount/path") - source_sub_path: path in volume to mount (e.g., "/source/path") - read_only: boolean flag

Returns: Dict with application_id, state, and submission details

Examples: Minimal configuration for IBM Cloud Object Storage using cos:// protocol:

{
  "engine_id": "spark398",
  "application": "cos://bucket.instance/app.py",
  "arguments": ["cos://bucket.instance/data.csv"],
  "conf": {
    "spark.hadoop.fs.cos.instance.endpoint": "s3.direct.us-east.cloud-object-storage.appdomain.cloud",
    "spark.hadoop.fs.cos.instance.access.key": "your-access-key",
    "spark.hadoop.fs.cos.instance.secret.key": "your-secret-key"
  }
}

Minimal configuration for IBM Cloud Object Storage using s3a:// protocol:

{
  "engine_id": "spark398",
  "application": "s3a://bucket/app.py",
  "arguments": ["s3a://bucket/data.csv"],
  "conf": {
    "spark.hadoop.fs.s3a.bucket.bucket.access.key": "your-access-key",
    "spark.hadoop.fs.s3a.bucket.bucket.secret.key": "your-secret-key",
    "spark.hadoop.fs.s3a.bucket.bucket.aws.credentials.provider": "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider"
  }
}

Optional conf parameters (uses engine defaults if not specified):
- spark.app.name: Custom application name
- ae.spark.driver.log.level / ae.spark.executor.log.level: Log levels
- spark.driver.cores / spark.driver.memory: Driver resources
- spark.executor.cores / spark.executor.memory: Executor resources

Prompts

Interactive templates invoked by user choice

NameDescription

No prompts

Resources

Contextual data attached and managed by the client

NameDescription

No resources

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/IBM/ibm-watsonxdata-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server