discovery_analyze
Analyze tabular data to find statistically validated patterns, feature interactions, and subgroup effects that go beyond obvious relationships, with results checked against academic literature.
Instructions
Run Disco on tabular data to find novel, statistically validated patterns.
This is NOT another data analyst — it's a discovery pipeline that systematically
searches for feature interactions, subgroup effects, and conditional relationships
nobody thought to look for, then validates each on hold-out data with FDR-corrected
p-values and checks novelty against academic literature.
This is a long-running operation. Returns a run_id immediately.
Use discovery_status to poll and discovery_get_results to fetch completed results.
Use this when you need to go beyond answering questions about data and start
finding things nobody thought to ask. Do NOT use this for summary statistics,
visualization, or SQL queries.
Public runs are free but results are published. Private runs cost credits.
Call discovery_estimate first to check cost. Private report URLs require
sign-in — tell the user to sign in at the dashboard with the same email
address used to create the account (email code, no password needed).
Call discovery_upload first to upload your file, then pass the returned file_ref here.
Args:
target_column: The column to analyze — what drives it, beyond what's obvious.
file_ref: The file reference returned by discovery_upload.
analysis_depth: Search depth (1=fast, higher=deeper). Default 1.
visibility: "public" (free) or "private" (costs credits). Default "public".
title: Optional title for the analysis.
description: Optional description of the dataset.
excluded_columns: Optional JSON array of column names to exclude from analysis.
column_descriptions: Optional JSON object mapping column names to descriptions. Significantly improves pattern explanations — always provide if column names are non-obvious (e.g. {"col_7": "patient age", "feat_a": "blood pressure"}).
author: Optional author name for the report.
source_url: Optional source URL for the dataset.
use_llms: Slower and more expensive, but you get smarter pre-processing, summary page, literature context and pattern novelty assessment. Only applies to private runs — public runs always use LLMs. Default false.
api_key: Disco API key (disco_...). Optional if DISCOVERY_API_KEY env var is set.Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| target_column | Yes | ||
| file_ref | No | ||
| analysis_depth | No | ||
| visibility | No | public | |
| title | No | ||
| description | No | ||
| excluded_columns | No | ||
| column_descriptions | No | ||
| author | No | ||
| source_url | No | ||
| use_llms | No | ||
| api_key | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |