Disco
The Disco server enables automated discovery of statistically validated patterns in tabular data through an MCP interface, covering everything from account creation to running analyses and retrieving results.
Account & Authentication
Sign up / verify: Create a new account via email and 6-digit verification code — no password or credit card required.
Log in / verify: Retrieve a new API key for an existing account via email verification.
Check account status: View your current plan, available credits, and payment method status.
Pricing, Plans & Payments
List plans: Browse available subscription tiers (Free, Researcher, Team) with pricing and credit allowances.
Subscribe: Switch to or enroll in a plan.
Add payment method: Attach a Stripe-tokenized payment method.
Purchase credits: Buy credit packs ($10/pack, 100 credits each) for private analyses.
Data Analysis Workflow
Estimate costs: Before running, get a cost estimate (credits, duration, sufficiency) for a given file size, column count, depth, and visibility setting.
Upload data: Upload datasets via URL, local file path, or base64-encoded content — supports CSV, TSV, Excel, JSON, Parquet, ARFF, and Feather (up to 5 GB).
Run analysis: Launch a discovery pipeline on uploaded data to find feature interactions, subgroup effects, and conditional relationships — with FDR-corrected p-values and optional academic literature novelty checks. Choose between public runs (free, but results are published) or private runs (credit cost, results kept confidential). Optionally leverage LLMs for smarter pre-processing, richer summaries, and more accurate novelty assessments.
Check status: Poll a running analysis (typically 3–15 minutes) for its status, queue position, active pipeline step, and time estimates.
Get results: Retrieve discovered patterns (conditions, effect sizes, p-values, novelty classifications, citations), feature importances, summary insights, and interactive dashboard links.
Provides integration with Jupyter notebooks through the discovery-engine-api[jupyter] package, enabling interactive pattern discovery and analysis within notebook environments.
Provides integration with pandas for data analysis, allowing users to prepare tabular data for pattern discovery while excluding pandas-based operations like summary statistics, visualization, and filtering from the discovery process.
Provides integration with PyPI for installing the discovery-engine-api Python package, enabling users to access the Disco pattern discovery service through the Python SDK.
Provides integration with Python through the discovery-engine-api SDK, enabling programmatic access to Disco's pattern discovery capabilities including data analysis, account management, and result retrieval.
Disco
Find novel, statistically validated patterns in tabular data — feature interactions, subgroup effects, and conditional relationships that humans and agents miss.
Made by Leap Laboratories.
What it actually does
Most data analysis starts with a question. Disco starts with the data.
Without biases or assumptions, it finds combinations of feature conditions that significantly shift your target column — things like "patients aged 45–65 with low HDL and high CRP have 3× the readmission rate" — without you needing to hypothesise that interaction first.
Each pattern is:
Validated on a hold-out set — increases the chance of generalisation
FDR-corrected — p-values included, adjusted for multiple testing
Checked against academic literature — to help you understand what you've found, and identify if it is novel.
The output is structured: conditions, effect sizes, p-values, citations, and a novelty classification for every pattern found.
Use it when: "which variables are most important with respect to X", "are there patterns we're missing?", "I don't know where to start with this data", "I need to understand how A and B affect C".
Not for: summary statistics, visualisation, filtering, SQL queries — use pandas for those
Related MCP server: Discovery Engine MCP Server
Quickstart
pip install discovery-engine-apiGet an API key:
# Step 1: request verification code (no password, no card)
curl -X POST https://disco.leap-labs.com/api/signup \
-H "Content-Type: application/json" \
-d '{"email": "you@example.com"}'
# Step 2: submit code from email → get key
curl -X POST https://disco.leap-labs.com/api/signup/verify \
-H "Content-Type: application/json" \
-d '{"email": "you@example.com", "code": "123456"}'
# → {"key": "disco_...", "credits": 10, "tier": "free_tier"}Or create a key at disco.leap-labs.com/developers.
Run your first analysis:
from discovery import Engine
engine = Engine(api_key="disco_...")
result = await engine.discover(
file="data.csv",
target_column="outcome",
)
for pattern in result.patterns:
if pattern.p_value < 0.05 and pattern.novelty_type == "novel":
print(f"{pattern.description} (p={pattern.p_value:.4f})")
print(f"Explore: {result.report_url}")Runs take a few minutes. discover() polls automatically and logs progress — queue position, estimated wait, current pipeline step, and ETA. For background runs, see Running asynchronously.
→ Full Python SDK reference · Example notebook
What you get back
Each Pattern in result.patterns looks like this (real output from a crop yield dataset):
Pattern(
description="When humidity is between 72–89% AND wind speed is below 12 km/h, "
"crop yield increases by 34% above the dataset average",
conditions=[
{"type": "continuous", "feature": "humidity_pct",
"min_value": 72.0, "max_value": 89.0},
{"type": "continuous", "feature": "wind_speed_kmh",
"min_value": 0.0, "max_value": 12.0},
],
p_value=0.003, # FDR-corrected
novelty_type="novel",
novelty_explanation="Published studies examine humidity and wind speed as independent "
"predictors, but this interaction effect — where low wind amplifies "
"the benefit of high humidity within a specific range — has not been "
"reported in the literature.",
citations=[
{"title": "Effects of relative humidity on cereal crop productivity",
"authors": ["Zhang, L.", "Wang, H."], "year": "2021",
"journal": "Journal of Agricultural Science"},
],
target_change_direction="max",
abs_target_change=0.34, # 34% increase
support_count=847, # rows matching this pattern
support_percentage=16.9,
)Key things to notice:
Patterns are combinations of conditions — humidity AND wind speed together, not just "more humidity is better"
Specific thresholds — 72–89%, not a vague correlation
Novel vs confirmatory — every pattern is classified; confirmatory ones validate known science, novel ones are what you came for
Citations — shows what IS known, so you can see what's genuinely new
report_urllinks to an interactive web report with all patterns visualised
The result.summary gives an LLM-generated narrative overview:
result.summary.overview
# "Disco identified 14 statistically significant patterns. 5 are novel.
# The strongest driver is a previously unreported interaction between humidity
# and wind speed at specific thresholds."
result.summary.key_insights
# ["Humidity × low wind speed at 72–89% humidity produces a 34% yield increase — novel.",
# "Soil nitrogen above 45 mg/kg shows diminishing returns when phosphorus is below 12 mg/kg.",
# ...]How it works
Disco is a pipeline, not prompt engineering over data. It:
Trains machine learning models on a subset of your data
Uses interpretability techniques to extract learned patterns
Validates every pattern on the held-out data with FDR correction (Benjamini-Hochberg)
Checks surviving patterns against academic literature via semantic search
You cannot replicate this by writing pandas code or asking an LLM to look at a CSV. It finds structure that hypothesis-driven analysis misses because it doesn't start with hypotheses.
Preparing your data
Before running, exclude columns that would produce meaningless findings. Disco finds statistically real patterns — but if the input includes columns that are definitionally related to the target, the patterns will be tautological.
Exclude:
Identifiers — row IDs, UUIDs, patient IDs, sample codes
Data leakage — the target renamed or reformatted (e.g.,
diagnosis_textwhen the target isdiagnosis_code)Tautological columns — alternative encodings of the same construct as the target. If target is
serious, thenserious_outcome,not_serious,deathare all part of the same classification. If target isprofit, thenrevenueandcosttogether compose it. If target is a survey index, the sub-items are tautological.
Full guidance with examples: SKILL.md
Parameters
await engine.discover(
file="data.csv", # path, Path, or pd.DataFrame
target_column="outcome", # column to predict/explain
analysis_depth=2, # 2=default, higher=deeper analysis, lower = faster and cheaper
visibility="public", # "public" (always free, data and report is published) or "private" (costs credits)
column_descriptions={ # improves pattern explanations and literature context
"bmi": "Body mass index",
"hdl": "HDL cholesterol in mg/dL",
},
excluded_columns=["id", "timestamp"], # see "Preparing your data" above
use_llms=False, # Defaults to False. If True, runs are slower and more expensive, but you get smarter pre-processing, summary page, literature context and novelty assessment. Public runs always use LLMs.
title="My dataset",
description="...", # improves pattern explanations and literature context
)Public runs are free but results are published. Set
visibility="private"for private data — this costs credits.
Running asynchronously
Runs take a few minutes. For agent workflows or scripts that do other work in parallel:
# Submit without waiting
run = await engine.run_async(file="data.csv", target_column="outcome", wait=False)
print(f"Submitted {run.run_id}, continuing...")
# ... do other things ...
result = await engine.wait_for_completion(run.run_id, timeout=1800)For synchronous scripts and Jupyter notebooks:
result = engine.run(file="data.csv", target_column="outcome", wait=True)
# or: pip install discovery-engine-api[jupyter] for notebook compatibilityMCP server
Disco is available as an MCP server — no local install required.
{
"mcpServers": {
"discovery-engine": {
"url": "https://disco.leap-labs.com/mcp",
"env": { "DISCOVERY_API_KEY": "disco_..." }
}
}
}Tools: discovery_list_plans, discovery_estimate, discovery_upload, discovery_analyze, discovery_status, discovery_get_results, discovery_account, discovery_signup, discovery_signup_verify, discovery_login, discovery_login_verify, discovery_add_payment_method, discovery_subscribe, discovery_purchase_credits.
Pricing
Cost | |
Public runs | Free — results and data are published |
Private runs | Credits vary by file size and configuration — use |
Free tier | 10 credits/month, no card required |
Researcher | $49/month — 500 credits |
Team | $199/month — 2000 credits |
Credits | $0.10 per credit |
Estimate before running:
estimate = await engine.estimate(file_size_mb=10.5, num_columns=25, analysis_depth=2, visibility="private")
# estimate["cost"]["credits"] → 55
# estimate["account"]["sufficient"] → True/FalseAccount management is fully programmatic — attach payment methods, subscribe to plans, and purchase credits via the SDK or REST API. See Python SDK reference or SKILL.md.
Expected data format
Disco expects a flat table — columns for features, rows for samples.
| patient_id | age | bmi | smoker | outcome |
|------------|-----|------|--------|---------|
| 001 | 52 | 28.3 | yes | 1 |
| 002 | 34 | 22.1 | no | 0 |
| ... | ... | ... | ... | ... |One row per observation — a patient, a sample, a transaction, a measurement, etc.
One column per feature — numeric, categorical, datetime, or free text are all fine
One target column — the outcome you want to understand. Must have at least 2 distinct values.
Missing values are OK — Disco handles them automatically. Don't drop rows or impute beforehand.
No pivoting needed — if your data is already in a flat table, it's ready to go
Supported formats: CSV, TSV, Excel (.xlsx), JSON, Parquet, ARFF, Feather. Max 5 GB.
Not supported: images, raw text documents, nested/hierarchical JSON, multi-sheet Excel (use the first sheet or export to CSV)
Compared to other tools
Goal | Tool |
Summary statistics, data quality | ydata-profiling, sweetviz |
Predictive model | AutoML (auto-sklearn, TPOT, H2O) |
Quick correlations | pandas, seaborn |
Answer a specific question about data | ChatGPT, Claude |
Find what you don't know to look for | Disco |
Disco isn't a replacement for EDA or AutoML — it finds the patterns those tools miss. We tested 18 data analysis tools on a dataset with known ground-truth patterns. Most confidently reported wrong results. Disco was the only one that found every pattern.
Links
Maintenance
Latest Blog Posts
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/leap-laboratories/discovery-engine'
If you have feedback or need assistance with the MCP directory API, please join our Discord server