Skip to main content
Glama

Get Traces

get_traces
Read-onlyIdempotent

Query stored agent execution traces with filters and pagination to analyze past failures, compute trends, or populate dashboards.

Instructions

Query stored agent-execution traces with filters, pagination, and optional dashboard summary.

Sibling tools — log_trace creates traces, delete_trace removes a single trace, evaluate_output / evaluate_with_llm_judge / verify_citations score them, list_rules / deploy_rule / delete_rule manage the custom-rule lifecycle. get_traces is the READ path for historical agent executions — never mutates anything.

Behavior. Read-only: never mutates storage, never calls external services. Idempotent: repeated calls with the same args return consistent results (new traces logged after the call obviously show up on subsequent calls). Tenant-scoped: queries only the caller's tenant rows (LOCAL_TENANT in OSS). Paginates results (default limit 50, max 1000). Rate-limited to 20 req/min on HTTP MCP, unlimited on stdio.

Output shape. Returns JSON: { "traces": [{...traceRow}], "total": number, "limit": number, "offset": number, "summary"?: { total_traces, avg_latency_ms, total_cost_usd, error_rate, eval_pass_rate, traces_per_hour, top_agents } }. Each trace row includes trace_id, agent_name, framework, input, output, tool_calls, latency_ms, token_usage, cost_usd, metadata, timestamp. summary only included when include_summary: true.

Use when you need historical data: investigating a past failure, computing quality trends, comparing agents, or feeding an analytics job. Set agent_name / framework / since / until to narrow the query. Set min_score / max_score to surface outliers. Set sort_by: "cost_usd" + sort_order: "desc" to find the most expensive traces. Set include_summary: true when you want dashboard-style aggregates in one round-trip.

Don't use to score a trace (use evaluate_output). Don't use to create a trace (use log_trace). Don't use as a live event stream — it's a query, not a subscription; poll with exponential backoff or use the dashboard's SSE endpoint for real-time.

Parameters. limit defaults to 50, max 1000 (anything higher returns 400). offset is zero-based pagination. min_score / max_score filter on the LATEST eval per trace, not all evals (so a trace with one failing + one passing eval may or may not match depending on which landed last). Combining since + sort_by="latency_ms" + sort_order="desc" is the canonical "find slow recent traces" query. include_summary returns dashboard-style aggregates in the SAME response (saves a round-trip; use true for dashboard ingest, false for analytics queries that don't need them). agent_name and framework are exact-match (no wildcards in v0.4). Defaults: limit=50, offset=0, sort_by="timestamp", sort_order="desc", include_summary=false.

Error modes. Returns 400 on invalid sort_by / sort_order (Zod enum). Returns 400 if limit > 1000. Returns 429 when HTTP rate limit exceeded. Storage failures propagate as 500. Empty result with total: 0 on no matches (not an error).

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
agent_nameNoFilter by agent name — exact match (no wildcards in v0.4)
frameworkNoFilter by agent framework — exact match (e.g., langchain, autogen)
sinceNoISO timestamp lower bound — return traces with timestamp >= this
untilNoISO timestamp upper bound — return traces with timestamp < this
min_scoreNoMinimum eval score filter (0..1) — applied to LATEST eval per trace, not all evals
max_scoreNoMaximum eval score filter (0..1) — applied to LATEST eval per trace
limitNoResults per page (default 50, max 1000 — values >1000 return 400)
offsetNoZero-based pagination offset — skip first N results
sort_byNoSort by timestamp | latency_ms | cost_usd (default timestamp)timestamp
sort_orderNoSort order: asc | desc (default desc — most recent / highest first)desc
include_summaryNoInclude dashboard summary stats in same response — saves a round-trip when ingesting for dashboards
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint, destructiveHint, idempotentHint, openWorldHint. The description adds rich behavioral details: read-only, never mutates, idempotent, tenant-scoped, pagination (default 50, max 1000), rate limits (20 req/min HTTP, unlimited stdio), output shape, error modes (400, 429, 500, empty result).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is long but well-structured: summary, sibling context, behavior, output shape, usage scenarios, parameter details, error modes. Every section adds value; it's comprehensive without being verbose. Could be slightly tighter but earns a 4.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 11 parameters and no output schema, the description covers all essential aspects: behavior, output shape (with field list), error modes, usage examples with parameter combinations, and default values. There are no apparent gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds meaningful context beyond schema: explains min_score/max_score apply to latest eval, combining since+sort_by for slow traces, include_summary saves round-trip, limit >1000 returns 400. This justifies an above-baseline score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it queries stored agent-execution traces with filters, pagination, and optional summary. It explicitly identifies the tool as the READ path for historical data, distinguishing it from siblings like log_trace, delete_trace, and evaluate_output.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit guidance on when to use (investigating failures, trends, analytics) and when not to use (for scoring, creating, or live streaming). Includes concrete parameter combinations for specific use cases (e.g., finding slow recent traces, most expensive traces).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/iris-eval/mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server