query_dataapi
Run SQL queries to discover columns, filter data, compute counts, and retrieve raw results for analysis.
Instructions
STEP 3 & 5 of the workflow — explore columns, answer filtered questions, get raw data.
This is the MOST IMPORTANT tool for data exploration and filtered analysis. It runs arbitrary SQL against the database and returns structured results.
Returns: {"columns": [...], "rows": [{col: val}, ...]}
══ USE THIS TOOL FOR ══════════════════════════════════════════════════════
COLUMN DISCOVERY (Step 3) — always do this before creating any visual: query_dataapi(dataconnection_id=10, query="SELECT * FROM schema.table_name LIMIT 3") → reveals exact column names, data types, and sample values. → column names are case-sensitive; use EXACTLY as returned here.
FILTERED QUESTIONS — when the user asks about a specific subset of data: "Show shipping codes for Mock Vendor X" "What is Turbine Oil's price trend?" "Which priority-1 orders are overdue?" → create_smart_visual() CANNOT apply filters (CDV API limitation). → Use this tool with a WHERE clause instead, then present results as a table.
COUNT / FREQUENCY questions — when the user wants counts: "What are the most common shipping codes?" → create_smart_visual() does NOT support COUNT aggregation. → Use this tool: query="SELECT col, COUNT(*) as cnt FROM ... GROUP BY col ORDER BY cnt DESC"
TIME-SERIES queries — price trends, monthly patterns, etc.: → Time-based CDV visuals are blocked via the API. → Use this tool to fetch the trend data, then describe it or format it for Plotly.
HEATMAPS / CROSS-TABS — e.g. "spend by destination and shipping code": → CDV has no heatmap type via the API. → Use this tool to get the pivot data, format it for plotly.graph_objects.Heatmap.
══ HOW TO USE ═════════════════════════════════════════════════════════════
Connection-based SQL (RECOMMENDED — most flexible): query_dataapi(dataconnection_id=<id_from_list_connections>, query="SELECT col1, SUM(col2) FROM schema.table WHERE col3='val' GROUP BY col1 ORDER BY 2 DESC LIMIT 20")
Important SQL notes:
• Table names use schema.table format (e.g. logistics.procurement_transactions)
• SQL reserved words (date, time, year, etc.) must be backtick-quoted:
✓ SELECT date, time FROM ... NOT: SELECT date, time FROM ...
• Use standard Impala/Hive SQL syntax
Dataset-based query (simpler, but less flexible): query_dataapi(dataset=<id_from_list_datasets>, dimensions="col1,col2", aggregates="SUM(col3) as total", limit=20)
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| dataset | No | ||
| dataconnection_id | No | ||
| query | No | ||
| limit | No | ||
| dimensions | No | ||
| aggregates | No | ||
| filters | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |