Skip to main content
Glama

discovery_analyze

Destructive

Analyze tabular data to find statistically validated patterns, feature interactions, and subgroup effects that go beyond obvious relationships, with results checked against academic literature.

Instructions

Run Disco on tabular data to find novel, statistically validated patterns.

This is NOT another data analyst — it's a discovery pipeline that systematically
searches for feature interactions, subgroup effects, and conditional relationships
nobody thought to look for, then validates each on hold-out data with FDR-corrected
p-values and checks novelty against academic literature.

This is a long-running operation. Returns a run_id immediately.
Use discovery_status to poll and discovery_get_results to fetch completed results.

Use this when you need to go beyond answering questions about data and start
finding things nobody thought to ask. Do NOT use this for summary statistics,
visualization, or SQL queries.

Public runs are free but results are published. Private runs cost credits.
Call discovery_estimate first to check cost. Private report URLs require
sign-in — tell the user to sign in at the dashboard with the same email
address used to create the account (email code, no password needed).

Call discovery_upload first to upload your file, then pass the returned file_ref here.

Args:
    target_column: The column to analyze — what drives it, beyond what's obvious.
    file_ref: The file reference returned by discovery_upload.
    analysis_depth: Search depth (1=fast, higher=deeper). Default 1.
    visibility: "public" (free) or "private" (costs credits). Default "public".
    title: Optional title for the analysis.
    description: Optional description of the dataset.
    excluded_columns: Optional JSON array of column names to exclude from analysis.
    column_descriptions: Optional JSON object mapping column names to descriptions. Significantly improves pattern explanations — always provide if column names are non-obvious (e.g. {"col_7": "patient age", "feat_a": "blood pressure"}).
    author: Optional author name for the report.
    source_url: Optional source URL for the dataset.
    use_llms: Slower and more expensive, but you get smarter pre-processing, summary page, literature context and pattern novelty assessment. Only applies to private runs — public runs always use LLMs. Default false.
    api_key: Disco API key (disco_...). Optional if DISCOVERY_API_KEY env var is set.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
target_columnYes
file_refNo
analysis_depthNo
visibilityNopublic
titleNo
descriptionNo
excluded_columnsNo
column_descriptionsNo
authorNo
source_urlNo
use_llmsNo
api_keyNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate destructiveHint=true and idempotentHint=false, but the description adds valuable behavioral context beyond this: it explains this is a 'long-running operation' with immediate run_id return, mentions cost implications (public vs private runs), authentication requirements for private reports, and workflow dependencies (upload first, then poll). While it doesn't explicitly mention destructive behavior, it provides operational context that complements the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections: purpose statement, behavioral context, usage guidelines, prerequisites, and parameter explanations. While comprehensive, some sentences could be more concise (e.g., the LLM explanation is verbose). The information is front-loaded with the core purpose first.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (12 parameters, destructive operation, long-running nature) and the presence of an output schema (which handles return values), the description provides excellent contextual completeness. It covers workflow dependencies, cost implications, authentication requirements, operational characteristics, and parameter semantics, making it sufficiently complete for an agent to use effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description carries the full burden of parameter documentation. It provides meaningful context for most parameters: explains target_column purpose ('what drives it, beyond what's obvious'), file_ref dependency, analysis_depth meaning, visibility cost implications, and gives specific guidance for column_descriptions. However, it doesn't cover all 12 parameters equally well (e.g., author, source_url get minimal explanation).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('Run Disco on tabular data to find novel, statistically validated patterns') and distinguishes it from alternatives by explicitly stating what it is NOT ('NOT another data analyst', 'Do NOT use this for summary statistics, visualization, or SQL queries'). It differentiates from siblings by explaining its unique discovery pipeline approach.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use ('Use this when you need to go beyond answering questions about data and start finding things nobody thought to ask') and when not to use ('Do NOT use this for summary statistics, visualization, or SQL queries'). It also mentions prerequisites ('Call discovery_upload first') and alternatives ('Use discovery_status to poll and discovery_get_results to fetch completed results').

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/leap-laboratories/discovery-engine'

If you have feedback or need assistance with the MCP directory API, please join our Discord server