Skip to main content
Glama
Aguantar

io.github.Aguantar/clickhouse-dataops-mcp

by Aguantar

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault
CLICKHOUSE_HOSTNoClickHouse HTTP hostlocalhost
CLICKHOUSE_PORTNoClickHouse HTTP port8123
CLICKHOUSE_USERNoClickHouse usernamedefault
CLICKHOUSE_DATABASENoDefault databasecdc_pipeline
CLICKHOUSE_PASSWORDNoClickHouse password
CLICKHOUSE_QUERY_TIMEOUTNoQuery timeout in seconds30

Capabilities

Features and capabilities supported by this server

CapabilityDetails
tools
{
  "listChanged": false
}
prompts
{
  "listChanged": false
}
resources
{
  "subscribe": false,
  "listChanged": false
}
experimental
{}

Tools

Functions exposed to the LLM to take actions

NameDescription
ch_queryA

Execute a read-only SQL query against ClickHouse.

Runs SELECT queries with automatic safety validation (DDL/DML blocked), LIMIT enforcement, and partition pruning analysis. Returns results with a warning if the query doesn't leverage partitioning efficiently.

Args: sql: SELECT query to execute database: Target database (default: cdc_pipeline) max_rows: Maximum rows to return (default: 1000, max: 10000)

ch_explain_queryA

Analyze a query's execution plan and suggest optimizations.

Runs EXPLAIN PLAN and EXPLAIN PIPELINE, then provides structured analysis:

  • Whether partition pruning is active

  • Whether sorting keys are utilized

  • Specific optimization suggestions (add time filters, use pre-aggregated tables, etc.)

This is the key differentiator from generic ClickHouse tools — it doesn't just execute queries, it advises on how to write better ones.

Args: sql: The SELECT query to analyze database: Target database (default: cdc_pipeline)

ch_table_schemaA

Get comprehensive table metadata: columns, engine, keys, partitions, and sample data.

Returns column types, partition/sorting/primary keys, TTL settings, row counts, disk usage, partition breakdown, and 5 sample rows. Essential for understanding table structure before writing queries.

Args: table: Table name database: Database name (default: cdc_pipeline)

ch_pipeline_latencyA

Analyze CDC pipeline latency by segment.

Measures latency at each stage of the pipeline:

  • source_to_cdc: Upbit → Debezium/Kafka

  • cdc_to_flink: Kafka → Flink processing

  • flink_to_insert: Flink → ClickHouse write

  • end_to_end: total source_ts → inserted_at

Each segment reports p50, p95, p99, and max latency in milliseconds. Also includes data freshness (seconds behind real-time) and per-market breakdown.

Args: market: Filter by market (e.g., "KRW-BTC"). Empty = all markets period: Time window — "10m", "1h", "6h", or "24h" (default: "1h")

ch_data_qualityA

Run data quality checks: nulls, duplicates, gaps, and market coverage.

Checks for a specific date:

  • Null/empty values per column

  • Duplicate rows by primary key

  • Hourly data gaps (missing time windows)

  • Market coverage (are all 5 coins present?)

  • Data freshness

Args: table: Table to check (default: crypto_trades) database: Database name (default: cdc_pipeline) check_date: Date to check in YYYY-MM-DD format (default: today)

ch_slow_queriesA

Find slow queries with root cause diagnosis.

Scans system.query_log for queries exceeding the duration threshold, then generates a diagnosis for each:

  • Full scan detection (high read_rows without partition pruning)

  • Memory-intensive query detection

  • Aggregation optimization suggestions (use pre-aggregated tables)

  • Error detection

Args: hours: Look back period in hours (default: 24) min_duration_ms: Minimum query duration to report (default: 1000ms) limit: Maximum number of slow queries to return (default: 20)

ch_disk_usageA

Analyze disk usage by table and partition with recommendations.

Returns:

  • Total disk usage for the database

  • Per-table breakdown (rows, size, parts, percentage)

  • Per-partition breakdown (top 20 by size)

  • TTL information

  • Recommendations (excessive parts, missing TTL on large tables)

Args: database: Database to analyze (default: cdc_pipeline)

ch_list_tablesA

List all tables with metadata and built-in descriptions.

Returns table name, engine type, partition/sorting keys, row count, disk size, and a human-readable description of each table's purpose. Use this as the starting point to understand what data is available.

Args: database: Database to list (default: cdc_pipeline)

Prompts

Interactive templates invoked by user choice

NameDescription

No prompts

Resources

Contextual data attached and managed by the client

NameDescription

No resources

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Aguantar/clickhouse-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server