Skip to main content
Glama

Get Traces

get_traces
Read-onlyIdempotent

Retrieve historical agent-execution traces with filters for agent name, framework, time range, and scores. Supports pagination and optional dashboard summary for analysis of past failures or performance trends.

Instructions

Query stored agent-execution traces with filters, pagination, and optional dashboard summary.

Sibling tools — log_trace creates traces, delete_trace removes a single trace, evaluate_output / evaluate_with_llm_judge / verify_citations score them, list_rules / deploy_rule / delete_rule manage the custom-rule lifecycle. get_traces is the READ path for historical agent executions — never mutates anything.

Behavior. Read-only: never mutates storage, never calls external services. Idempotent: repeated calls with the same args return consistent results (new traces logged after the call obviously show up on subsequent calls). Tenant-scoped: queries only the caller's tenant rows (LOCAL_TENANT in OSS). Paginates results (default limit 50, max 1000). Rate-limited to 20 req/min on HTTP MCP, unlimited on stdio.

Output shape. Returns JSON: { "traces": [{...traceRow}], "total": number, "limit": number, "offset": number, "summary"?: { total_traces, avg_latency_ms, total_cost_usd, error_rate, eval_pass_rate, traces_per_hour, top_agents } }. Each trace row includes trace_id, agent_name, framework, input, output, tool_calls, latency_ms, token_usage, cost_usd, metadata, timestamp. summary only included when include_summary: true.

Use when you need historical data: investigating a past failure, computing quality trends, comparing agents, or feeding an analytics job. Set agent_name / framework / since / until to narrow the query. Set min_score / max_score to surface outliers. Set sort_by: "cost_usd" + sort_order: "desc" to find the most expensive traces. Set include_summary: true when you want dashboard-style aggregates in one round-trip.

Don't use to score a trace (use evaluate_output). Don't use to create a trace (use log_trace). Don't use as a live event stream — it's a query, not a subscription; poll with exponential backoff or use the dashboard's SSE endpoint for real-time.

Parameters. limit defaults to 50, max 1000 (anything higher returns 400). offset is zero-based pagination. min_score / max_score filter on the LATEST eval per trace, not all evals (so a trace with one failing + one passing eval may or may not match depending on which landed last). Combining since + sort_by="latency_ms" + sort_order="desc" is the canonical "find slow recent traces" query. include_summary returns dashboard-style aggregates in the SAME response (saves a round-trip; use true for dashboard ingest, false for analytics queries that don't need them). agent_name and framework are exact-match (no wildcards in v0.4). Defaults: limit=50, offset=0, sort_by="timestamp", sort_order="desc", include_summary=false.

Error modes. Returns 400 on invalid sort_by / sort_order (Zod enum). Returns 400 if limit > 1000. Returns 429 when HTTP rate limit exceeded. Storage failures propagate as 500. Empty result with total: 0 on no matches (not an error).

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
agent_nameNoFilter by agent name — exact match (no wildcards in v0.4)
frameworkNoFilter by agent framework — exact match (e.g., langchain, autogen)
sinceNoISO timestamp lower bound — return traces with timestamp >= this
untilNoISO timestamp upper bound — return traces with timestamp < this
min_scoreNoMinimum eval score filter (0..1) — applied to LATEST eval per trace, not all evals
max_scoreNoMaximum eval score filter (0..1) — applied to LATEST eval per trace
limitNoResults per page (default 50, max 1000 — values >1000 return 400)
offsetNoZero-based pagination offset — skip first N results
sort_byNoSort by timestamp | latency_ms | cost_usd (default timestamp)timestamp
sort_orderNoSort order: asc | desc (default desc — most recent / highest first)desc
include_summaryNoInclude dashboard summary stats in same response — saves a round-trip when ingesting for dashboards
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark readOnlyHint, idempotentHint, destructiveHint. The description adds: 'Read-only: never mutates storage, never calls external services. Idempotent... Tenant-scoped... Paginates... Rate-limited.' No contradiction. It provides helpful context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (purpose, siblings, behavior, output shape, usage, parameters, error modes). Every sentence adds value without redundancy. It is concise given the complexity, and front-loads the key purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 11 parameters, no output schema, and no nested objects, the description is remarkably complete. It covers behavior, output shape, usage patterns, error modes, and parameter nuances (like min_score/max_score applying to latest eval). It essentially provides a full mini-documentation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds substantial meaning: explains limit max and error, offset zero-based, min_score/max_score applied to latest eval, combining since+sort_by, include_summary saves round-trip, agent_name exact-match. It also details defaults and error modes per parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description opens with 'Query stored agent-execution traces with filters, pagination, and optional dashboard summary.' It uses a specific verb (Query) and resource (traces), and distinguishes from sibling tools like log_trace (create), delete_trace (delete), and evaluation tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says when to use the tool (e.g., investigating failures, computing quality trends) and when not to use it (e.g., to score a trace, to create a trace, as a live event stream). It provides alternative sibling tools for those cases and suggests specific query patterns like combining filters or sorting.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/iris-eval/mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server