Run SELECT Query
run_select_queryRun SELECT queries on Hydrolix time-series databases using ClickHouse SQL, with automatic result truncation and support for summary and regular tables.
Instructions
Run a SELECT query in a Hydrolix time-series database using the Clickhouse SQL dialect. Queries run using this tool will timeout after 30 seconds.
RESULT TRUNCATION:
Query results are automatically truncated when the total cell count (rows * columns) exceeds the configured limit.
Response shape: - Always present: columns, rows, truncated (bool), row_count - Only when truncated=true: total_row_count, message Note: total_row_count is the number of rows fetched from the server, which is capped at 100,000. The actual table may contain more rows than this value suggests.
Note: if the cell limit is smaller than the number of columns, row_count will be 0 — in that case you must either refine the query (fewer columns, stricter filters) or increase max_cells.
MANDATORY PRE-QUERY CHECK:
Before running ANY query, call get_table_info(database, table_name) if you haven't already. Check is_summary_table and read column metadata (column_category, merge_function per column). If is_summary_table=True: follow summary_table_info from get_table_info response and rules below. If is_summary_table=False: use standard SQL (count, sum, avg, etc.).
The primary key on tables queried this way is always a timestamp. Queries should include either a LIMIT clause or a filter based on the primary key as a performance guard to ensure they return in a reasonable amount of time. Queries should select specific fields and avoid the use of SELECT * to avoid performance issues. The performance guard used for the query should be clearly communicated with the user, and the user should be informed that the query may take a long time to run if the performance guard is not used. When choosing a performance guard, the user's preference should be requested and used if available. When using aggregations, the performance guard should take form of a primary key filter, or else the LIMIT should be applied in a subquery before applying the aggregations.
When matching columns based on substrings, prefix or suffix matches should be used instead of
full-text search whenever possible. When searching for substrings, the syntax column LIKE
'%suffix' or column LIKE 'prefix%' should be used.
SUMMARY TABLE RULES (if is_summary_table=True):
Use column_category from get_table_info to determine column usage — do NOT infer from names.
column_category='AggregateColumn': MUST be wrapped in its merge_function
Stores binary AggregateFunction state — direct SELECT causes deserialization errors
Use exact merge_function from column metadata (do NOT infer from column name)
count(vendor_id) → countMerge(
count(vendor_id)), countIf(c) → countIfMerge(countIf(c))Always use backticks for column names with special characters
column_category='SummaryColumn': select directly, no wrapping
ALIAS that wraps -Merge internally — NEVER wrap in sum()/count()/avg() (ILLEGAL_AGGREGATION)
Per-row value — for grand totals use the corresponding AggregateColumn + merge_function
column_category='Column'/'AliasColumn': dimension columns, use as-is
Many have function-like names (e.g.,
toStartOfMinute(dt)) — LITERAL names, not expressionsWRONG: SELECT toStartOfMinute(dt) RIGHT: SELECT
toStartOfMinute(dt)For time filters: use '2022-06-01' or '2022-06-01 00:00:00' — NOT partial '2022-06-01 00:00'
Use >= and < for ranges: WHERE col >= '2022-06-01' AND col < '2022-06-02'
GROUP BY: required when SELECT mixes dimension columns with aggregates
Only Column/AliasColumn go in GROUP BY — never AggregateColumn or SummaryColumn
NEVER use SELECT * on summary tables (causes deserialization errors)
Summary table query patterns (after calling get_table_info first):
Pattern 1: Aggregate entire table
-- First: get_table_info('database', 'summary_table')
-- Read column.merge_function for count(column_name) = "countMerge"
SELECT countMerge(count(column_name)) as total FROM database.summary_table
Pattern 2: Aggregate with grouping by dimension and optional time range filter
-- First: get_table_info('database', 'summary_table')
-- Read merge_function for each aggregate column
SELECT toStartOfMinute(datetime_field) as time_bucket,
countMerge(count(column_name)) as total,
avgMerge(avg(other_column)) as avg_value
FROM database.summary_table
WHERE toStartOfMinute(datetime_field) >= '2022-06-01'
AND toStartOfMinute(datetime_field) < '2022-06-02'
GROUP BY toStartOfMinute(datetime_field)
ORDER BY time_bucket DESC
Pattern 3: Multiple aggregates (no dimensions, no GROUP BY)
-- First: get_table_info('database', 'summary_table')
SELECT countMerge(count(column_name)) as count_result,
sumMerge(sum(other_column)) as sum_result
FROM database.summary_table
Pattern 4: Using column_category='SummaryColumn'
-- First: get_table_info('database', 'summary_table')
-- SummaryColumns are per-row values — use with GROUP BY to break down by dimension
SELECT cdn, cnt_all, sum_bytes FROM database.summary_table GROUP BY cdn
-- No -Merge needed, these are pre-defined aliases
-- For a grand total across all rows, use AggregateColumn + merge_function instead:
SELECT countMerge(count()) AS grand_total FROM database.summary_table
Pattern 5: Using dimensions with function-like names (common pattern)
-- First: get_table_info('database', 'summary_table')
-- Dimension column named: toStartOfMinute(primary_datetime) — LITERAL name, not an expression
-- WRONG: SELECT toStartOfMinute(primary_datetime) ... (tries to call function)
-- RIGHT: Use the literal column name with backticks
SELECT toStartOfMinute(primary_datetime) as time_bucket,
countMerge(count()) as cnt,
maxMerge(max(value)) as max_val
FROM database.summary_table
GROUP BY toStartOfMinute(primary_datetime)
ORDER BY time_bucket DESC
LIMIT 10
Regular table examples (non-summary):
Example query. Purpose: get logs from the application.logs table. Primary key: timestamp.
Performance guard: 10 minute recency filter.
SELECT message, timestamp FROM application.logs WHERE timestamp > now() - INTERVAL 10 MINUTES
Example query. Purpose: get the median humidity from the weather.measurements table. Primary
key: date. Performance guard: 1000 row limit, applied before aggregation.
SELECT median(humidity) FROM (SELECT humidity FROM weather.measurements LIMIT 1000)
Example query. Purpose: get the lowest temperature from the weather.measurements table over
the last 10 years. Primary key: date. Performance guard: date range filter.
SELECT min(temperature) FROM weather.measurements WHERE date > now() - INTERVAL 10 YEARS
Example query. Purpose: get the app name with the most log messages from the application.logs
table in the window between new year and valentine's day of 2024. Primary key: timestamp.
Performance guard: date range filter.
SELECT app, count(*) FROM application.logs WHERE timestamp > '2024-01-01' AND timestamp < '2024-02-14' GROUP BY app ORDER BY count(*) DESC LIMIT 1
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| query | Yes | ||
| max_cells | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||