Skip to main content
Glama
madamak

Apache Airflow MCP Server

by madamak

airflow_get_task_instance_logs

Read-onlyIdempotent

Retrieve and filter Apache Airflow task execution logs by error level, context lines, or tail length, with automatic handling for large files and multi-host output.

Instructions

Fetch task instance logs with optional filtering and truncation.

Large log handling: Logs >100MB automatically tail to last 10,000 lines (sets auto_tailed=true). Host-segmented responses are flattened into a single string using headers of the form --- [worker] ---, ensuring agents can reason about multi-host output. The tool requires an explicit try_number; callers should first retrieve it via airflow_get_task_instance.

Filter order of operations:

  1. Auto-tail: If log >100MB, take last 10,000 lines

  2. tail_lines: Extract last N lines from log

  3. filter_level: Find matching lines by level (content filter)

  4. context_lines: Add surrounding lines around matches (symmetric: N before + N after)

  5. max_bytes: Hard cap on total output (UTF-8 safe truncation)

Parameters

  • instance: Instance key (optional, mutually exclusive with ui_url)

  • ui_url: Airflow UI URL to resolve identifiers (optional)

  • dag_id, dag_run_id, task_id, try_number: Task instance identifiers (required)

  • filter_level: "error" | "warning" | "info" (optional) - Show only lines matching level

    • "error": ERROR, CRITICAL, FATAL, Exception, Traceback

    • "warning": WARN, WARNING + error patterns

    • "info": INFO + warning + error patterns

  • context_lines: N lines before/after each match (optional, clamped to [0, 1000]; accepts int/float/str, coerced to non-negative int, fractional values truncated)

  • tail_lines: Extract last N lines before filtering (optional, clamped to [0, 100000]; accepts int/float/str, coerced to non-negative int, fractional values truncated)

  • max_bytes: Maximum response size in bytes (default: 100KB ≈ 25K tokens, clamped to reasonable limit)

Returns

  • Response dict with fields:

    • log: Normalized/filtered log text (host headers inserted when needed)

    • truncated: true if output exceeded max_bytes

    • auto_tailed: true if original log >100MB triggered auto-tail

    • bytes_returned: Actual byte size of returned log

    • original_lines: Line count before any filtering

    • returned_lines: Line count after all filtering/truncation

    • match_count: Number of lines matching filter_level (before context expansion)

    • meta.try_number: Attempt number for this task instance

    • meta.filters: Echo of effective filters applied (shows clamped values)

    • ui_url: Direct link to log view in Airflow UI

    • request_id: Correlates with server logs

  • Raises: ToolError with compact JSON payload (code, message, request_id, optional context)

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
instanceNo
ui_urlNo
dag_idNo
dag_run_idNo
task_idNo
try_numberNo
filter_levelNo
context_linesNo
tail_linesNo
max_bytesNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true, idempotentHint=true, and destructiveHint=false, covering safety and idempotency. The description adds valuable behavioral context beyond annotations: it explains large log handling (auto-tail for >100MB logs), host-segmented response flattening, filter order of operations, and error handling with ToolError details. This enriches the agent's understanding without contradicting annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with sections for large log handling, host-segmented responses, filter order, parameters, and returns. Each sentence adds value, but it is moderately long due to the tool's complexity. It remains front-loaded with core purpose and key guidelines, avoiding unnecessary repetition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (10 parameters, 0% schema coverage, no nested objects) and the presence of an output schema, the description is highly complete. It explains parameter semantics, behavioral traits, usage prerequisites, and return structure details, ensuring the agent has sufficient context without needing to rely solely on structured fields.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage and 10 parameters, the description fully compensates by detailing each parameter's semantics, including optionality, mutual exclusivity (instance vs. ui_url), required identifiers, filter_level definitions with pattern examples, and clamping rules for numeric parameters. It adds significant meaning beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description begins with a clear verb+resource statement: 'Fetch task instance logs with optional filtering and truncation.' It specifically distinguishes this tool from its sibling 'airflow_get_task_instance' by noting that callers should first retrieve try_number via that tool, making the purpose specific and differentiated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool: 'The tool requires an explicit `try_number`; callers should first retrieve it via `airflow_get_task_instance`.' It also distinguishes it from siblings by specifying this prerequisite, offering clear alternatives and context for usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/madamak/apache-airflow-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server