Skip to main content
Glama
langchain-ai

LangSmith MCP Server

Official
by langchain-ai

fetch_runs

Retrieve LangSmith runs for analytics and export using flexible filters, query language, and trace-level constraints to explore traces, tools, and chains.

Instructions

Fetch LangSmith runs (traces, tools, chains, etc.) from one or more projects using flexible filters, query language expressions, and trace-level constraints.


๐Ÿงฉ PURPOSE

This is a general-purpose LangSmith run fetcher designed for analytics, trace export, and automated exploration.

It wraps client.list_runs() with complete support for:

  • Multiple project names or IDs

  • The Filter Query Language (FQL) for precise queries

  • Hierarchical filtering across trace trees

  • Sorting and result limiting

It returns raw dict objects suitable for further analysis or export.


โš™๏ธ PARAMETERS

project_name : str The project name to fetch runs from. For multiple projects, use JSON array string (e.g., '["project1", "project2"]').

trace_id : str, optional Return only runs that belong to a specific trace tree. It is a UUID string, e.g. "123e4567-e89b-12d3-a456-426614174000".

run_type : str, optional Filter runs by type (e.g. "llm", "chain", "tool", "retriever").

error : str, optional Filter by error status: "true" for errored runs, "false" for successful runs.

is_root : str, optional Filter root traces: "true" for only top-level traces, "false" to exclude roots. If not provided, returns all runs.

filter : str, optional A Filter Query Language (FQL) expression that filters runs by fields, metadata, tags, feedback, latency, or time.

โ”€โ”€โ”€ Common field names โ”€โ”€โ”€
- `id`, `name`, `run_type`
- `start_time`, `end_time`
- `latency`
- `total_tokens`
- `error`
- `tags`
- `feedback_key`, `feedback_score`
- `metadata_key`, `metadata_value`
- `execution_order`

โ”€โ”€โ”€ Supported comparators โ”€โ”€โ”€
- `eq`, `neq` โ†’ equal / not equal
- `gt`, `gte`, `lt`, `lte` โ†’ numeric or time comparisons
- `has` โ†’ tag or metadata contains value
- `search` โ†’ substring or full-text match
- `and`, `or`, `not` โ†’ logical operators

โ”€โ”€โ”€ Examples โ”€โ”€โ”€
```python
'gt(latency, "5s")'                                # took longer than 5 seconds
'neq(error, null)'                                  # errored runs
'has(tags, "beta")'                                 # runs tagged "beta"
'and(eq(name,"ChatOpenAI"), eq(run_type,"llm"))'    # named & typed runs
'search("image classification")'                    # full-text search
```

trace_filter : str, optional Filter applied to the root run in each trace tree. Lets you select child runs based on root attributes or feedback.

Example:
```python
'and(eq(feedback_key,"user_score"), eq(feedback_score,1))'
```
โ†’ return runs whose root trace has a user_score of 1.

tree_filter : str, optional Filter applied to any run in the trace tree (including siblings or children). Example: python 'eq(name,"ExpandQuery")' โ†’ return runs if any run in their trace had that name.

order_by : str, default "-start_time" Sort field; prefix with "-" for descending order.

limit : int, default 50 Maximum number of runs to return.

reference_example_id : str, optional Filter runs by reference example ID. Returns only runs associated with the specified dataset example ID.

format_type : str, default "pretty" Output format for extracted messages. Options: - "pretty" (default): Human-readable formatted text focusing on human/AI/tool message exchanges - "json": Pretty-printed JSON format - "raw": Compact single-line JSON format

When format_type is set, the tool extracts messages from runs and formats them,
making it ideal for conversational AI agents that care about message exchanges
rather than full trace details. The response returns only the formatted output:
- `formatted`: Formatted string representation of messages (when format_type is provided)

When format_type is not set, the response returns:
- `runs`: Full run data

๐Ÿ“ค RETURNS

Dict[str, Any] Dictionary containing: - If format_type is set: {"formatted": str} - formatted string representation of messages - If format_type is not set: {"runs": List[Dict]} - list of LangSmith run dictionaries


๐Ÿงช EXAMPLES

1๏ธโƒฃ Get latest 10 root runs

runs = fetch_runs("alpha-project", is_root="true", limit=10)

2๏ธโƒฃ Get all tool runs that errored

runs = fetch_runs("alpha-project", run_type="tool", error="true")

3๏ธโƒฃ Get all runs that took >5s and have tag "experimental"

runs = fetch_runs("alpha-project", filter='and(gt(latency,"5s"), has(tags,"experimental"))')

4๏ธโƒฃ Get all runs in a specific conversation thread

thread_id = "abc-123"
fql = f'and(in(metadata_key, ["session_id","conversation_id","thread_id"]), eq(metadata_value, "{thread_id}"))'
runs = fetch_runs("alpha-project", is_root="true", filter=fql)

5๏ธโƒฃ List all runs called "extractor" whose root trace has feedback user_score=1

runs = fetch_runs(
    "alpha-project",
    filter='eq(name,"extractor")',
    trace_filter='and(eq(feedback_key,"user_score"), eq(feedback_score,1))'
)

6๏ธโƒฃ List all runs that started after a timestamp and either errored or got low feedback

fql = 'and(gt(start_time,"2023-07-15T12:34:56Z"), or(neq(error,null), and(eq(feedback_key,"Correctness"), eq(feedback_score,0.0))))'
runs = fetch_runs("alpha-project", filter=fql)

7๏ธโƒฃ Get formatted messages for conversational AI (default: pretty format)

# Returns formatted messages focusing on human/AI/tool exchanges
result = fetch_runs("alpha-project", limit=10, format_type="pretty")
# result["formatted"] contains human-readable formatted messages
# result["messages"] contains the raw message list
# result["runs"] contains full run data

8๏ธโƒฃ Get messages in JSON format

result = fetch_runs("alpha-project", limit=10, format_type="json")
# result["messages"] contains messages as JSON array
# result["formatted"] contains pretty-printed JSON string

๐Ÿง  NOTES FOR AGENTS

  • Use this to query LangSmith data sources dynamically.

  • Compose FQL strings programmatically based on your intent.

  • Combine filter, trace_filter, and tree_filter for hierarchical logic.

  • Always verify that project_name matches an existing LangSmith project.

  • Returned dict objects have fields like:

  • id, name, run_type, inputs, outputs, error, start_time, end_time, latency, metadata, feedback, etc.

  • If the trace is big, save it to a file (if you have this ability) and analyze it locally.

  • For conversational AI agents: Use format_type="pretty" (default) to get human-readable message exchanges focusing on human/AI/tool messages rather than full trace details.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
project_nameYes
trace_idNo
run_typeNo
errorNo
is_rootNo
filterNo
trace_filterNo
tree_filterNo
order_byNo-start_time
limitNo
reference_example_idNo
format_typeNopretty

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does an excellent job disclosing behavioral traits. It explains the tool 'wraps client.list_runs()' with complete support for various filtering methods, describes return formats (raw dict objects), mentions output structure changes based on format_type parameter, and provides practical notes about verifying project_name and handling large traces. The only minor gap is lack of explicit rate limit or authentication requirements.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

While well-structured with clear sections (PURPOSE, PARAMETERS, RETURNS, EXAMPLES, NOTES), the description is excessively long with redundant information. Some examples could be condensed, and the FQL syntax details might be overly verbose for a tool description. However, the front-loaded summary is effective, and the structure helps navigation despite the length.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (12 parameters, no annotations, 0% schema coverage, but has output schema), the description is remarkably complete. It covers purpose, usage, all parameters with semantics, return formats, extensive examples, and agent-specific notes. The output schema existence means the description doesn't need to detail return structures, but it still explains the conditional returns based on format_type parameter.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage and 12 parameters, the description compensates fully by providing extensive parameter documentation. Each parameter gets clear explanations with examples, especially for complex ones like filter (with FQL syntax details), trace_filter, tree_filter, and format_type. The description adds significant value beyond the bare schema by explaining parameter interactions and practical usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool 'fetches LangSmith runs' with specific resources (traces, tools, chains, etc.) and methods (flexible filters, query language). It explicitly distinguishes this as a 'general-purpose LangSmith run fetcher' for analytics and trace export, differentiating it from sibling tools like list_datasets or list_projects that handle different resource types.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool versus alternatives. It states it's for 'analytics, trace export, and automated exploration' and specifically advises 'For conversational AI agents: Use format_type="pretty" to get human-readable message exchanges.' It also distinguishes from sibling tools by focusing on runs rather than datasets, prompts, or experiments.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/langchain-ai/langsmith-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server