| get_dag_details | Get detailed information about a specific Apache Airflow DAG. Use this tool when the user asks about: "Show me details for DAG X" or "What are the details of DAG Y?" "Tell me about DAG Z" or "Get information for this specific DAG" "What's the schedule for DAG X?" or "When does this DAG run?" "Is DAG Y paused?" or "Show me the configuration of DAG Z" "Who owns this DAG?" or "What are the tags for this workflow?"
Returns complete DAG information including: dag_id: Unique identifier for the DAG is_paused: Whether the DAG is currently paused is_active: Whether the DAG is active is_subdag: Whether this is a SubDAG fileloc: File path where the DAG is defined file_token: Unique token for the DAG file owners: List of DAG owners description: Human-readable description of what the DAG does schedule_interval: Cron expression or timedelta for scheduling tags: List of tags/labels for categorization max_active_runs: Maximum number of concurrent runs max_active_tasks: Maximum number of concurrent tasks has_task_concurrency_limits: Whether task concurrency limits are set has_import_errors: Whether the DAG has import errors next_dagrun: When the next DAG run is scheduled next_dagrun_create_after: Earliest time for next DAG run creation
Args:
dag_id: The ID of the DAG to get details for Returns:
JSON with complete details about the specified DAG |
| list_dags | Get information about all Apache Airflow DAGs (Directed Acyclic Graphs). Use this tool when the user asks about: "What DAGs are available?" or "List all DAGs" "Show me the workflows" or "What pipelines exist?" "Which DAGs are paused/active?" DAG schedules, descriptions, or tags Finding a specific DAG by name
Returns comprehensive DAG metadata including: dag_id: Unique identifier for the DAG is_paused: Whether the DAG is currently paused is_active: Whether the DAG is active schedule_interval: How often the DAG runs description: Human-readable description tags: Labels/categories for the DAG owners: Who maintains the DAG file_token: Location of the DAG file
Returns:
JSON with list of all DAGs and their complete metadata |
| get_dag_source | Get the source code for a specific Apache Airflow DAG. Use this tool when the user asks about: "Show me the code for DAG X" or "What's the source of DAG Y?" "How is DAG Z implemented?" or "What does the DAG file look like?" "Can I see the Python code for this workflow?" "What tasks are defined in the DAG code?"
Returns the DAG source file contents including: Args:
dag_id: The ID of the DAG to get source code for Returns:
JSON with DAG source code and metadata |
| get_dag_stats | Get statistics about DAG runs (success/failure counts by state). Use this tool when the user asks about: "What's the overall health of my DAGs?" or "Show me DAG statistics" "How many DAG runs succeeded/failed?" or "What's the success rate?" "Give me a summary of DAG run states" "How many runs are currently running/queued?" "Show me stats for specific DAGs"
Returns statistics showing counts of DAG runs grouped by state: success: Number of successful runs failed: Number of failed runs running: Number of currently running runs queued: Number of queued runs And other possible states
Args:
dag_ids: Optional list of DAG IDs to filter by. If not provided, returns stats for all DAGs. Returns:
JSON with DAG run statistics organized by DAG and state |
| list_dag_warnings | Get warnings and issues detected in DAG definitions. Use this tool when the user asks about: "Are there any DAG warnings?" or "Show me DAG issues" "What problems exist with my DAGs?" or "Any DAG errors?" "Check DAG health" or "Show me DAG validation warnings" "What's wrong with my workflows?"
Returns warnings about DAG configuration issues including: dag_id: Which DAG has the warning warning_type: Type of warning (e.g., deprecation, configuration issue) message: Description of the warning timestamp: When the warning was detected
Returns:
JSON with list of DAG warnings and their details |
| list_import_errors | Get import errors from DAG files that failed to parse or load. Use this tool when the user asks about: "Are there any import errors?" or "Show me import errors" "Why isn't my DAG showing up?" or "DAG not appearing in Airflow" "What DAG files have errors?" or "Show me broken DAGs" "Check for syntax errors" or "Are there any parsing errors?" "Why is my DAG file failing to load?"
Import errors occur when DAG files have problems that prevent Airflow
from parsing them, such as: Returns import error details including: import_error_id: Unique identifier for the error timestamp: When the error was detected filename: Path to the DAG file with the error stack_trace: Complete error message and traceback
Returns:
JSON with list of import errors and their stack traces |
| get_task | Get detailed information about a specific task definition in a DAG. Use this tool when the user asks about: "Show me details for task X in DAG Y" or "What does task Z do?" "What operator does task A use?" or "What's the configuration of task B?" "Tell me about task C" or "Get task definition for D" "What are the dependencies of task E?" or "Which tasks does F depend on?"
Returns task definition information including: task_id: Unique identifier for the task task_display_name: Human-readable display name owner: Who owns this task start_date: When this task becomes active end_date: When this task becomes inactive (if set) trigger_rule: When this task should run (all_success, one_failed, etc.) depends_on_past: Whether task depends on previous run's success wait_for_downstream: Whether to wait for downstream tasks retries: Number of retry attempts retry_delay: Time between retries execution_timeout: Maximum execution time operator_name: Type of operator (PythonOperator, BashOperator, etc.) pool: Resource pool assignment queue: Queue for executor downstream_task_ids: List of tasks that depend on this task upstream_task_ids: List of tasks this task depends on
Args:
dag_id: The ID of the DAG containing the task
task_id: The ID of the task to get details for Returns:
JSON with complete task definition details |
| list_tasks | Get all tasks defined in a specific DAG. Use this tool when the user asks about: "What tasks are in DAG X?" or "List all tasks for DAG Y" "Show me the tasks in this workflow" or "What's in the DAG?" "What are the steps in DAG Z?" or "Show me the task structure" "What does this DAG do?" or "Explain the workflow steps"
Returns information about all tasks in the DAG including: task_id: Unique identifier for the task task_display_name: Human-readable display name owner: Who owns this task operator_name: Type of operator (PythonOperator, BashOperator, etc.) start_date: When this task becomes active end_date: When this task becomes inactive (if set) trigger_rule: When this task should run retries: Number of retry attempts pool: Resource pool assignment downstream_task_ids: List of tasks that depend on this task upstream_task_ids: List of tasks this task depends on
Args:
dag_id: The ID of the DAG to list tasks for Returns:
JSON with list of all tasks in the DAG and their configurations |
| get_task_instance | Get detailed information about a specific task instance execution. Use this tool when the user asks about: "Show me details for task X in DAG run Y" or "What's the status of task Z?" "Why did task A fail?" or "When did task B start/finish?" "What's the duration of task C?" or "Show me task execution details" "Get logs for task D" or "What operator does task E use?"
Returns detailed task instance information including: task_id: Name of the task state: Current state (success, failed, running, queued, etc.) start_date: When the task started end_date: When the task finished duration: How long the task ran try_number: Which attempt this is max_tries: Maximum retry attempts operator: What operator type (PythonOperator, BashOperator, etc.) executor_config: Executor configuration pool: Resource pool assignment
Args:
dag_id: The ID of the DAG
dag_run_id: The ID of the DAG run (e.g., "manual__2024-01-01T00:00:00+00:00")
task_id: The ID of the task within the DAG Returns:
JSON with complete task instance details |
| get_task_logs | Get logs for a specific task instance execution. Use this tool when the user asks about: "Show me the logs for task X" or "Get logs for task Y" "What did task Z output?" or "Show me task execution logs" "Why did task A fail?" (to see error messages in logs) "What happened during task B execution?" "Show me the stdout/stderr for task C" "Debug task D" or "Troubleshoot task E"
Returns the actual log output from the task execution, which includes: Task execution output (stdout/stderr) Error messages and stack traces (if task failed) Timing information Any logged messages from the task code
This is essential for debugging failed tasks or understanding what
happened during task execution. Args:
dag_id: The ID of the DAG (e.g., "example_dag")
dag_run_id: The ID of the DAG run (e.g., "manual__2024-01-01T00:00:00+00:00")
task_id: The ID of the task within the DAG (e.g., "extract_data")
try_number: The task try/attempt number, 1-indexed (default: 1).
Use higher numbers to get logs from retry attempts.
map_index: For mapped tasks, which map index to get logs for.
Use -1 for non-mapped tasks (default: -1). Returns:
JSON with the task logs content |
| list_dag_runs | Get execution history and status of DAG runs (workflow executions). Use this tool when the user asks about: "What DAG runs have executed?" or "Show me recent runs" "Which runs failed/succeeded?" "What's the status of my workflows?" "When did DAG X last run?" Execution times, durations, or states Finding runs by date or status
Returns execution metadata including: dag_run_id: Unique identifier for this execution dag_id: Which DAG this run belongs to state: Current state (running, success, failed, queued) execution_date: When this run was scheduled to execute start_date: When execution actually started end_date: When execution completed (if finished) run_type: manual, scheduled, or backfill conf: Configuration passed to this run
Returns:
JSON with list of DAG runs across all DAGs, sorted by most recent |
| get_dag_run | Get detailed information about a specific DAG run execution. Use this tool when the user asks about: "Show me details for DAG run X" or "What's the status of run Y?" "When did this run start/finish?" or "How long did run Z take?" "Why did this run fail?" or "Get execution details for run X" "What was the configuration for this run?" or "Show me run metadata" "What's the state of DAG run X?" or "Did run Y succeed?"
Returns detailed information about a specific DAG run execution including: dag_run_id: Unique identifier for this execution dag_id: Which DAG this run belongs to state: Current state (running, success, failed, queued, etc.) execution_date: When this run was scheduled to execute start_date: When execution actually started end_date: When execution completed (if finished) duration: How long the run took (in seconds) run_type: Type of run (manual, scheduled, backfill, etc.) conf: Configuration parameters passed to this run external_trigger: Whether this was triggered externally data_interval_start: Start of the data interval data_interval_end: End of the data interval last_scheduling_decision: Last scheduling decision timestamp note: Optional note attached to the run
Args:
dag_id: The ID of the DAG (e.g., "example_dag")
dag_run_id: The ID of the DAG run (e.g., "manual__2024-01-01T00:00:00+00:00") Returns:
JSON with complete details about the specified DAG run |
| trigger_dag | Trigger a new DAG run (start a workflow execution manually). Use this tool when the user asks to: "Run DAG X" or "Start DAG Y" or "Execute DAG Z" "Trigger a run of DAG X" or "Kick off DAG Y" "Run this workflow" or "Start this pipeline" "Execute DAG X with config Y" or "Trigger DAG with parameters" "Start a manual run" or "Manually execute this DAG"
This creates a new DAG run that will be picked up by the scheduler and executed.
You can optionally pass configuration parameters that will be available to the
DAG during execution via the conf context variable. IMPORTANT: This is a write operation that modifies Airflow state by creating
a new DAG run. Use with caution. Returns information about the newly triggered DAG run including: dag_run_id: Unique identifier for the new execution dag_id: Which DAG was triggered state: Initial state (typically 'queued') execution_date: When this run is scheduled to execute start_date: When execution started (may be null if queued) run_type: Type of run (will be 'manual') conf: Configuration passed to the run external_trigger: Set to true for manual triggers
Args:
dag_id: The ID of the DAG to trigger (e.g., "example_dag")
conf: Optional configuration dictionary to pass to the DAG run.
This will be available in the DAG via context['dag_run'].conf Returns:
JSON with details about the newly triggered DAG run |
| trigger_dag_and_wait | Trigger a DAG run and wait for it to complete before returning. Use this tool when the user asks to: "Run DAG X and wait for it to finish" or "Execute DAG Y and tell me when it's done" "Trigger DAG Z and wait for completion" or "Run this pipeline synchronously" "Start DAG X and let me know the result" or "Execute and monitor DAG Y" "Run DAG X and show me if it succeeds or fails"
This is a BLOCKING operation that will: Trigger the specified DAG Poll for status automatically (interval scales with timeout) Return once the DAG run reaches a terminal state (success, failed, upstream_failed) Include details about any failed tasks if the run was not successful
IMPORTANT: This tool blocks until the DAG completes or times out. For long-running
DAGs, consider using trigger_dag instead and checking status separately with
get_dag_run. Default timeout is 60 minutes. Adjust the timeout parameter for longer DAGs. Returns information about the completed DAG run including: dag_id: Which DAG was run dag_run_id: Unique identifier for this execution state: Final state (success, failed, upstream_failed) start_date: When execution started end_date: When execution completed elapsed_seconds: How long we waited timed_out: Whether we hit the timeout before completion failed_tasks: List of failed task details (only if state != success)
Args:
dag_id: The ID of the DAG to trigger (e.g., "example_dag")
conf: Optional configuration dictionary to pass to the DAG run.
This will be available in the DAG via context['dag_run'].conf
timeout: Maximum time to wait in seconds (default: 3600.0 / 60 minutes) Returns:
JSON with final DAG run status and any failed task details |
| pause_dag | Pause a DAG to prevent new scheduled runs from starting. Use this tool when the user asks to: "Pause DAG X" or "Stop DAG Y from running" "Disable DAG Z" or "Prevent new runs of DAG X" "Turn off DAG scheduling" or "Suspend DAG execution"
When a DAG is paused: No new scheduled runs will be created Currently running tasks will complete Manual triggers are still possible The DAG remains visible in the UI with a paused indicator
IMPORTANT: This is a write operation that modifies Airflow state.
The DAG will remain paused until explicitly unpaused. Args:
dag_id: The ID of the DAG to pause (e.g., "example_dag") Returns:
JSON with updated DAG details showing is_paused=True |
| unpause_dag | Unpause a DAG to allow scheduled runs to resume. Use this tool when the user asks to: "Unpause DAG X" or "Resume DAG Y" "Enable DAG Z" or "Start DAG scheduling again" "Turn on DAG X" or "Activate DAG Y"
When a DAG is unpaused: The scheduler will create new runs based on the schedule Any missed runs (depending on catchup setting) may be created The DAG will appear active in the UI
IMPORTANT: This is a write operation that modifies Airflow state.
New DAG runs will be scheduled according to the DAG's schedule_interval. Args:
dag_id: The ID of the DAG to unpause (e.g., "example_dag") Returns:
JSON with updated DAG details showing is_paused=False |
| list_assets | Get data assets and datasets tracked by Airflow (data lineage). Use this tool when the user asks about: "What datasets exist?" or "List all assets" "What data does this DAG produce/consume?" "Show me data dependencies" or "What's the data lineage?" "Which DAGs use dataset X?" Data freshness or update events
Assets represent datasets or files that DAGs produce or consume.
This enables data-driven scheduling where DAGs wait for data availability. Returns asset information including: uri: Unique identifier for the asset (e.g., s3://bucket/path) id: Internal asset ID created_at: When this asset was first registered updated_at: When this asset was last updated consuming_dags: Which DAGs depend on this asset producing_tasks: Which tasks create/update this asset
Returns:
JSON with list of all assets and their producing/consuming relationships |
| list_asset_events | List asset/dataset events with optional filtering. Use this tool when the user asks about: "What asset events were produced by DAG X?" "Show me dataset events from run Y" "Debug why downstream DAG wasn't triggered" "What assets did this pipeline produce?" "List recent asset update events"
Asset events are produced when a task updates an asset/dataset.
These events can trigger downstream DAGs that depend on those assets
(data-aware scheduling). Returns event information including: uri: The asset that was updated source_dag_id: The DAG that produced this event source_run_id: The DAG run that produced this event source_task_id: The task that produced this event timestamp: When the event was created
Args:
source_dag_id: Filter events by the DAG that produced them
source_run_id: Filter events by the DAG run that produced them
source_task_id: Filter events by the task that produced them
limit: Maximum number of events to return (default: 100) Returns:
JSON with list of asset events |
| get_upstream_asset_events | Get asset events that triggered a specific DAG run. Use this tool when the user asks about: "What triggered this DAG run?" "Which asset events caused this run to start?" "Why did DAG X start running?" "Show me the upstream triggers for this run" "What data changes triggered this pipeline run?"
This is useful for understanding causation in data-aware scheduling.
When a DAG is scheduled based on asset updates, this tool shows which
specific asset events triggered the run. Returns information including: dag_id: The DAG that was triggered dag_run_id: The specific run triggered_by_events: List of asset events that caused this run event_count: Number of triggering events
Each event includes: asset_uri or dataset_uri: The asset that was updated source_dag_id: The DAG that produced the event source_run_id: The run that produced the event timestamp: When the event occurred
Args:
dag_id: The ID of the DAG
dag_run_id: The ID of the DAG run (e.g., "scheduled__2024-01-01T00:00:00+00:00") Returns:
JSON with the asset events that triggered this DAG run |
| list_connections | Get connection configurations for external systems (databases, APIs, services). Use this tool when the user asks about: "What connections are configured?" or "List all connections" "How do I connect to database X?" "What's the connection string for Y?" "Which databases/services are available?" Finding connection details by name or type
Connections store credentials and connection info for external systems
that DAGs interact with (databases, S3, APIs, etc.). Returns connection metadata including: connection_id: Unique name for this connection conn_type: Type (postgres, mysql, s3, http, etc.) description: Human-readable description host: Server hostname or IP port: Port number schema: Database schema or path login: Username (passwords excluded for security) extra: Additional connection parameters as JSON
IMPORTANT: Passwords are NEVER returned for security reasons. Returns:
JSON with list of all connections (credentials excluded) |
| get_pool | Get detailed information about a specific resource pool. Use this tool when the user asks about: "Show me details for pool X" or "What's the status of pool Y?" "How many slots are available in pool Z?" or "Is pool X full?" "What's using pool Y?" or "How many tasks are running in pool X?" "Get information about the default_pool" or "Show me pool details"
Pools are used to limit parallelism for specific sets of tasks. This returns
detailed real-time information about a specific pool's capacity and utilization. Returns detailed pool information including: name: Name of the pool slots: Total number of available slots in the pool occupied_slots: Number of currently occupied slots (running + queued) running_slots: Number of slots with currently running tasks queued_slots: Number of slots with queued tasks waiting to run open_slots: Number of available slots (slots - occupied_slots) description: Human-readable description of the pool's purpose
Args:
pool_name: The name of the pool to get details for (e.g., "default_pool") Returns:
JSON with complete details about the specified pool |
| list_pools | Get resource pools for managing task concurrency and resource allocation. Use this tool when the user asks about: "What pools are configured?" or "List all pools" "Show me the resource pools" or "What pools exist?" "How many slots does pool X have?" or "What's the pool capacity?" "Which pools are available?" or "What's the pool configuration?"
Pools are used to limit parallelism for specific sets of tasks. Each pool
has a certain number of slots, and tasks assigned to a pool will only run
if there are available slots. This is useful for limiting concurrent access
to resources like databases or external APIs. Returns pool information including: name: Name of the pool slots: Total number of available slots in the pool occupied_slots: Number of currently occupied slots running_slots: Number of slots with running tasks queued_slots: Number of slots with queued tasks open_slots: Number of available slots (slots - occupied_slots) description: Human-readable description of the pool's purpose
Returns:
JSON with list of all pools and their current utilization |
| list_plugins | Get information about installed Airflow plugins. Use this tool when the user asks about: "What plugins are installed?" or "List all plugins" "Show me the plugins" or "Which plugins are enabled?" "Is plugin X installed?" or "Do we have any custom plugins?" "What's in the plugins directory?"
Plugins extend Airflow functionality by adding custom operators, hooks,
views, menu items, or other components. This returns information about
all plugins discovered by Airflow's plugin system. Returns information about installed plugins including: name: Name of the plugin hooks: Custom hooks provided by the plugin executors: Custom executors provided by the plugin macros: Custom macros provided by the plugin flask_blueprints: Flask blueprints for custom UI pages appbuilder_views: Flask-AppBuilder views for admin interface appbuilder_menu_items: Custom menu items in the UI
Returns:
JSON with list of all installed plugins and their components |
| list_providers | Get information about installed Airflow provider packages. Use this tool when the user asks about: "What providers are installed?" or "List all providers" "What integrations are available?" or "Show me installed packages" "Do we have the AWS provider?" or "Is the Snowflake provider installed?" "What version of provider X is installed?"
Returns information about installed provider packages including: package_name: Name of the provider package (e.g., "apache-airflow-providers-amazon") version: Version of the provider package description: What the provider does provider_info: Details about operators, hooks, and sensors included
Returns:
JSON with list of all installed provider packages and their details |
| get_variable | Get a specific Airflow variable by key. Use this tool when the user asks about: "What's the value of variable X?" or "Show me variable Y" "Get variable Z" or "What does variable A contain?" "What's stored in variable B?" or "Look up variable C"
Variables are key-value pairs stored in Airflow's metadata database that
can be accessed by DAGs at runtime. They're commonly used for configuration
values, API keys, or other settings that need to be shared across DAGs. Returns variable information including: key: The variable's key/name value: The variable's value (may be masked if marked as sensitive) description: Optional description of the variable's purpose
Args:
variable_key: The key/name of the variable to retrieve Returns:
JSON with the variable's key, value, and metadata |
| list_variables | Get all Airflow variables (key-value configuration pairs). Use this tool when the user asks about: "What variables are configured?" or "List all variables" "Show me the variables" or "What variables exist?" "What configuration variables are available?" "Show me all variable keys"
Variables are key-value pairs stored in Airflow's metadata database that
can be accessed by DAGs at runtime. They're commonly used for configuration
values, environment-specific settings, or other data that needs to be
shared across DAGs without hardcoding in the DAG files. Returns variable information including: key: The variable's key/name value: The variable's value (may be masked if marked as sensitive) description: Optional description of the variable's purpose
IMPORTANT: Sensitive variables (like passwords, API keys) may have their
values masked in the response for security reasons. Returns:
JSON with list of all variables and their values |
| get_airflow_version | Get version information for the Airflow instance. Use this tool when the user asks about: "What version of Airflow is running?" or "Show me the Airflow version" "What's the Airflow version?" or "Which Airflow release is this?" "What version is installed?" or "Check Airflow version" "Is this Airflow 2 or 3?" or "What's the version number?"
Returns version information including: version: The Airflow version string (e.g., "2.8.0", "3.0.0") git_version: Git commit hash if available
This is useful for: Determining API compatibility Checking if features are available in this version Troubleshooting version-specific issues Verifying upgrade success
Returns:
JSON with Airflow version information |
| get_airflow_config | Get Airflow instance configuration and settings. Use this tool when the user asks about: "What's the Airflow configuration?" or "Show me Airflow settings" "What's the executor type?" or "How is Airflow configured?" "What's the parallelism setting?" Database connection, logging, or scheduler settings Finding specific configuration values
Returns all Airflow configuration organized by sections: [core]: Basic Airflow settings (executor, dags_folder, parallelism) [database]: Database connection and settings [webserver]: Web UI configuration (port, workers, auth) [scheduler]: Scheduler behavior and intervals [logging]: Log locations and formatting [api]: REST API configuration [operators]: Default operator settings And many more sections...
Each setting includes: key: Configuration parameter name value: Current value source: Where the value came from (default, env var, config file)
Returns:
JSON with complete Airflow configuration organized by sections |
| explore_dag | Comprehensive investigation of a DAG - get all relevant info in one call. USE THIS TOOL WHEN you need to understand a DAG completely. Instead of making
multiple calls, this returns everything about a DAG in a single response. This is the preferred first tool when: User asks "Tell me about DAG X" or "What is this DAG?" You need to understand a DAG's structure before diagnosing issues You want to know the schedule, tasks, and source code together
Returns combined data: DAG metadata (schedule, owners, tags, paused status) All tasks with their operators and dependencies DAG source code Any import errors or warnings for this DAG
Args:
dag_id: The ID of the DAG to explore Returns:
JSON with comprehensive DAG information |
| diagnose_dag_run | Diagnose issues with a specific DAG run - get run details and failed tasks. USE THIS TOOL WHEN troubleshooting a failed or problematic DAG run. Returns
all the information you need to understand what went wrong. This is the preferred tool when: User asks "Why did this DAG run fail?" User asks "What's wrong with run X?" You need to investigate task failures in a specific run
Returns combined data: DAG run metadata (state, start/end times, trigger type) All task instances for this run with their states Highlighted failed/upstream_failed tasks with details Summary of task states
Args:
dag_id: The ID of the DAG
dag_run_id: The ID of the DAG run (e.g., "manual__2024-01-01T00:00:00+00:00") Returns:
JSON with diagnostic information about the DAG run |
| get_system_health | Get overall Airflow system health - import errors, warnings, and DAG stats. USE THIS TOOL WHEN you need a quick health check of the Airflow system.
Returns a consolidated view of potential issues across the entire system. This is the preferred tool when: User asks "Are there any problems with Airflow?" User asks "Show me the system health" or "Any errors?" You want to do a morning health check You're starting an investigation and want to see the big picture
Returns combined data: Import errors (DAG files that failed to parse) DAG warnings (deprecations, configuration issues) DAG statistics (run counts by state) if available Version information
Returns:
JSON with system health overview |