Skip to main content
Glama

read_spark_driver_log

Retrieve Spark driver logs from S3 for EMR Serverless job runs to diagnose application errors and Python output or Spark framework issues.

Instructions

Read the Spark driver log from S3 for an EMR Serverless job run.

DEFAULT: Reads stdout.gz — this is the PRIMARY log containing Python print statements, row counts, file paths, and application errors. This is what you want 90% of the time.

Use log_type='stderr' only when you need Spark framework logs (executor allocation, memory warnings, shuffle errors).

Use read_both=True to get BOTH logs in one call (stdout first, then stderr filtered to ERROR lines only).

How to find application_id and job_run_id:

  • application_id: from the 'initialise' Airflow task log → 'EMR serverless application created: 00gXXX'

  • job_run_id: from the processing Airflow task log → 'EMR serverless job started: 00gXXX'

  • Or use list_emr_applications() then list_job_runs()

Args: application_id: The EMR Serverless application ID (e.g. '00g16i3marao0c0t'). job_run_id: The job run ID (e.g. '00g16i5g2pm56o0v'). log_type: 'stdout' (default, Python app output) or 'stderr' (Spark framework logs). s3_log_uri: Optional full S3 URI to read directly (e.g. 's3://bucket/path/stdout.gz'). process_name: Optional folder name under spark-logs/ (e.g. 'stackadapt_main'). Speeds up log discovery. tail_lines: Number of lines from the end (default 300). Use -1 for all lines. search_text: Optional text to filter log lines (e.g. 'ERROR', 'Exception'). bucket: S3 bucket override (default from config). read_both: If True, read BOTH stdout and stderr in one call. stdout shown first, stderr filtered to ERROR lines.

Returns the log content, optionally filtered and tailed.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
application_idYes
job_run_idYes
log_typeNostdout
s3_log_uriNo
process_nameNo
tail_linesNo
search_textNo
bucketNo
read_bothNo
envNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well by explaining behavioral aspects: default behavior (reads stdout.gz), what each log type contains, how read_both works (stdout first, stderr filtered to ERROR lines), and that it returns filtered/tailed content. It doesn't mention rate limits or authentication needs, but covers core functionality thoroughly.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (purpose, defaults, usage guidance, parameter explanations, return statement). While comprehensive, some sentences could be more concise, but every section earns its place by adding valuable information for tool selection and usage.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (10 parameters, 0% schema coverage, no annotations) and the existence of an output schema, the description is remarkably complete. It covers purpose, usage scenarios, parameter semantics, and behavioral details, providing everything needed to understand when and how to use this tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description compensates excellently by explaining all 10 parameters in detail. It clarifies the purpose of each parameter, provides examples, explains defaults, and describes interactions (like how s3_log_uri overrides discovery). This adds substantial value beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool reads Spark driver logs from S3 for EMR Serverless job runs. It specifies the resource (Spark driver log), source (S3), and context (EMR Serverless job run), distinguishing it from sibling tools like read_s3_file or get_task_log that handle different resources.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use different log_type options: 'stdout' for primary logs (90% of cases), 'stderr' for Spark framework logs, and read_both=True for combined logs. It also explains how to find required IDs using other tools like list_emr_applications() and list_job_runs().

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/SrujanReddyKallu2024/MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server