Skip to main content
Glama
kevintalbert

Cloudera Data Visualization MCP Server

by kevintalbert

query_dataapi

Run SQL queries to discover columns, filter data, compute counts, and retrieve raw results for analysis.

Instructions

STEP 3 & 5 of the workflow — explore columns, answer filtered questions, get raw data.

This is the MOST IMPORTANT tool for data exploration and filtered analysis. It runs arbitrary SQL against the database and returns structured results.

Returns: {"columns": [...], "rows": [{col: val}, ...]}

══ USE THIS TOOL FOR ══════════════════════════════════════════════════════

  1. COLUMN DISCOVERY (Step 3) — always do this before creating any visual: query_dataapi(dataconnection_id=10, query="SELECT * FROM schema.table_name LIMIT 3") → reveals exact column names, data types, and sample values. → column names are case-sensitive; use EXACTLY as returned here.

  2. FILTERED QUESTIONS — when the user asks about a specific subset of data: "Show shipping codes for Mock Vendor X" "What is Turbine Oil's price trend?" "Which priority-1 orders are overdue?" → create_smart_visual() CANNOT apply filters (CDV API limitation). → Use this tool with a WHERE clause instead, then present results as a table.

  3. COUNT / FREQUENCY questions — when the user wants counts: "What are the most common shipping codes?" → create_smart_visual() does NOT support COUNT aggregation. → Use this tool: query="SELECT col, COUNT(*) as cnt FROM ... GROUP BY col ORDER BY cnt DESC"

  4. TIME-SERIES queries — price trends, monthly patterns, etc.: → Time-based CDV visuals are blocked via the API. → Use this tool to fetch the trend data, then describe it or format it for Plotly.

  5. HEATMAPS / CROSS-TABS — e.g. "spend by destination and shipping code": → CDV has no heatmap type via the API. → Use this tool to get the pivot data, format it for plotly.graph_objects.Heatmap.

══ HOW TO USE ═════════════════════════════════════════════════════════════

Connection-based SQL (RECOMMENDED — most flexible): query_dataapi(dataconnection_id=<id_from_list_connections>, query="SELECT col1, SUM(col2) FROM schema.table WHERE col3='val' GROUP BY col1 ORDER BY 2 DESC LIMIT 20")

Important SQL notes: • Table names use schema.table format (e.g. logistics.procurement_transactions) • SQL reserved words (date, time, year, etc.) must be backtick-quoted: ✓ SELECT date, time FROM ... NOT: SELECT date, time FROM ... • Use standard Impala/Hive SQL syntax

Dataset-based query (simpler, but less flexible): query_dataapi(dataset=<id_from_list_datasets>, dimensions="col1,col2", aggregates="SUM(col3) as total", limit=20)

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
datasetNo
dataconnection_idNo
queryNo
limitNo
dimensionsNo
aggregatesNo
filtersNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It discloses that it runs arbitrary SQL (implying read operation), returns structured results, requires case-sensitive column names, and uses Impala/Hive SQL syntax with backtick quoting. It does not explicitly state read-only or non-destructive nature, but the context implies it. Overall, it is transparent enough for safe agent invocation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear headings, bullet points, and examples. It is longer than necessary but the organization helps readability. The front-loading with the return format and importance statement is effective. A minor redundancy exists in the SQL notes, but overall it is concise enough for the complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 parameters, no annotations, no schema descriptions), the description covers usage scenarios, SQL syntax, and return format. It provides examples and distinguishes from sibling tools. The missing 'filters' parameter description is a notable gap, but otherwise the description is thorough for an agent to use the tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It explains the two modes (connection-based and dataset-based) and covers parameters like dataconnection_id, dataset, query, limit, dimensions, aggregates with examples. However, the 'filters' parameter is omitted from the description, leaving its purpose unclear. This gap prevents a higher score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it runs arbitrary SQL for data exploration and filtered analysis. It distinguishes from siblings by explaining limitations of create_smart_visual (no filters, no COUNT, no time-series, no heatmaps) and provides specific use cases. The verb 'query' and resource 'dataapi' are explicit.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly lists five use cases (column discovery, filtered questions, count/frequency, time-series, heatmaps) with concrete examples and explanations of why alternative tools are unsuitable. It provides detailed 'HOW TO USE' instructions and SQL syntax notes. Although it does not explicitly say when NOT to use, the use cases effectively define the scope.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/kevintalbert/CDV-MCP-Server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server