Skip to main content
Glama
hydrolix

mcp-hydrolix

Official

Run SELECT Query

run_select_query
Read-onlyIdempotent

Run SELECT queries on Hydrolix time-series databases using ClickHouse SQL, with automatic result truncation and support for summary and regular tables.

Instructions

Run a SELECT query in a Hydrolix time-series database using the Clickhouse SQL dialect. Queries run using this tool will timeout after 30 seconds.

RESULT TRUNCATION:

Query results are automatically truncated when the total cell count (rows * columns) exceeds the configured limit.

Response shape: - Always present: columns, rows, truncated (bool), row_count - Only when truncated=true: total_row_count, message Note: total_row_count is the number of rows fetched from the server, which is capped at 100,000. The actual table may contain more rows than this value suggests.

Note: if the cell limit is smaller than the number of columns, row_count will be 0 — in that case you must either refine the query (fewer columns, stricter filters) or increase max_cells.

MANDATORY PRE-QUERY CHECK:

Before running ANY query, call get_table_info(database, table_name) if you haven't already. Check is_summary_table and read column metadata (column_category, merge_function per column). If is_summary_table=True: follow summary_table_info from get_table_info response and rules below. If is_summary_table=False: use standard SQL (count, sum, avg, etc.).

The primary key on tables queried this way is always a timestamp. Queries should include either a LIMIT clause or a filter based on the primary key as a performance guard to ensure they return in a reasonable amount of time. Queries should select specific fields and avoid the use of SELECT * to avoid performance issues. The performance guard used for the query should be clearly communicated with the user, and the user should be informed that the query may take a long time to run if the performance guard is not used. When choosing a performance guard, the user's preference should be requested and used if available. When using aggregations, the performance guard should take form of a primary key filter, or else the LIMIT should be applied in a subquery before applying the aggregations.

When matching columns based on substrings, prefix or suffix matches should be used instead of full-text search whenever possible. When searching for substrings, the syntax column LIKE '%suffix' or column LIKE 'prefix%' should be used.

SUMMARY TABLE RULES (if is_summary_table=True):

Use column_category from get_table_info to determine column usage — do NOT infer from names.

  1. column_category='AggregateColumn': MUST be wrapped in its merge_function

    • Stores binary AggregateFunction state — direct SELECT causes deserialization errors

    • Use exact merge_function from column metadata (do NOT infer from column name)

    • count(vendor_id) → countMerge(count(vendor_id)), countIf(c) → countIfMerge(countIf(c))

    • Always use backticks for column names with special characters

  2. column_category='SummaryColumn': select directly, no wrapping

    • ALIAS that wraps -Merge internally — NEVER wrap in sum()/count()/avg() (ILLEGAL_AGGREGATION)

    • Per-row value — for grand totals use the corresponding AggregateColumn + merge_function

  3. column_category='Column'/'AliasColumn': dimension columns, use as-is

    • Many have function-like names (e.g., toStartOfMinute(dt)) — LITERAL names, not expressions

    • WRONG: SELECT toStartOfMinute(dt) RIGHT: SELECT toStartOfMinute(dt)

    • For time filters: use '2022-06-01' or '2022-06-01 00:00:00' — NOT partial '2022-06-01 00:00'

    • Use >= and < for ranges: WHERE col >= '2022-06-01' AND col < '2022-06-02'

  4. GROUP BY: required when SELECT mixes dimension columns with aggregates

    • Only Column/AliasColumn go in GROUP BY — never AggregateColumn or SummaryColumn

  5. NEVER use SELECT * on summary tables (causes deserialization errors)

Summary table query patterns (after calling get_table_info first):

Pattern 1: Aggregate entire table -- First: get_table_info('database', 'summary_table') -- Read column.merge_function for count(column_name) = "countMerge" SELECT countMerge(count(column_name)) as total FROM database.summary_table

Pattern 2: Aggregate with grouping by dimension and optional time range filter -- First: get_table_info('database', 'summary_table') -- Read merge_function for each aggregate column SELECT toStartOfMinute(datetime_field) as time_bucket, countMerge(count(column_name)) as total, avgMerge(avg(other_column)) as avg_value FROM database.summary_table WHERE toStartOfMinute(datetime_field) >= '2022-06-01' AND toStartOfMinute(datetime_field) < '2022-06-02' GROUP BY toStartOfMinute(datetime_field) ORDER BY time_bucket DESC

Pattern 3: Multiple aggregates (no dimensions, no GROUP BY) -- First: get_table_info('database', 'summary_table') SELECT countMerge(count(column_name)) as count_result, sumMerge(sum(other_column)) as sum_result FROM database.summary_table

Pattern 4: Using column_category='SummaryColumn' -- First: get_table_info('database', 'summary_table') -- SummaryColumns are per-row values — use with GROUP BY to break down by dimension SELECT cdn, cnt_all, sum_bytes FROM database.summary_table GROUP BY cdn -- No -Merge needed, these are pre-defined aliases -- For a grand total across all rows, use AggregateColumn + merge_function instead: SELECT countMerge(count()) AS grand_total FROM database.summary_table

Pattern 5: Using dimensions with function-like names (common pattern) -- First: get_table_info('database', 'summary_table') -- Dimension column named: toStartOfMinute(primary_datetime) — LITERAL name, not an expression -- WRONG: SELECT toStartOfMinute(primary_datetime) ... (tries to call function) -- RIGHT: Use the literal column name with backticks SELECT toStartOfMinute(primary_datetime) as time_bucket, countMerge(count()) as cnt, maxMerge(max(value)) as max_val FROM database.summary_table GROUP BY toStartOfMinute(primary_datetime) ORDER BY time_bucket DESC LIMIT 10

Regular table examples (non-summary):

Example query. Purpose: get logs from the application.logs table. Primary key: timestamp. Performance guard: 10 minute recency filter.

SELECT message, timestamp FROM application.logs WHERE timestamp > now() - INTERVAL 10 MINUTES

Example query. Purpose: get the median humidity from the weather.measurements table. Primary key: date. Performance guard: 1000 row limit, applied before aggregation.

SELECT median(humidity) FROM (SELECT humidity FROM weather.measurements LIMIT 1000)

Example query. Purpose: get the lowest temperature from the weather.measurements table over the last 10 years. Primary key: date. Performance guard: date range filter.

SELECT min(temperature) FROM weather.measurements WHERE date > now() - INTERVAL 10 YEARS

Example query. Purpose: get the app name with the most log messages from the application.logs table in the window between new year and valentine's day of 2024. Primary key: timestamp. Performance guard: date range filter. SELECT app, count(*) FROM application.logs WHERE timestamp > '2024-01-01' AND timestamp < '2024-02-14' GROUP BY app ORDER BY count(*) DESC LIMIT 1

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryYes
max_cellsNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description reveals timeout (30 sec), result truncation logic with cell limits, row count caps, and handling of edge cases (row_count=0). These details go well beyond annotations (readOnlyHint, destructiveHint, idempotentHint, openWorldHint) and fully disclose behavioral traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is long but justified by the tool's complexity (summary tables, multiple patterns). It is front-loaded with the core purpose and structured with sections (RESULT TRUNCATION, MANDATORY PRE-QUERY CHECK, etc.). However, it could be more concise by reducing example repetition, so a 4 is appropriate.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity, low schema coverage, and presence of an output schema, the description covers everything: purpose, prerequisites, behavior, error cases, performance guards, and comprehensive examples for summary and regular tables. It is complete for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0% description coverage, so the description fully compensates. It explains the 'query' parameter as a SELECT statement, and 'max_cells' as a truncation threshold with detailed behavior (cell limit, row_count=0 scenario). Examples demonstrate usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool runs SELECT queries in a Hydrolix time-series database using Clickhouse SQL dialect. It specifies the verb 'run', the resource 'SELECT query', and the context, distinguishing it from siblings like get_table_info, list_databases, and list_tables.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance: a mandatory pre-query check calling get_table_info, rules for summary vs regular tables, performance guard requirements, and when to apply truncation. It implicitly tells when not to use (e.g., without prior info) and offers clear alternatives through the mandatory check.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/hydrolix/mcp-hydrolix'

If you have feedback or need assistance with the MCP directory API, please join our Discord server