Skip to main content
Glama

get_sql_table_schema

Retrieve the schema, sample data, and row count of a parquet file to analyze its structure before querying. Ideal for understanding columns, data types, and preparing SQL queries efficiently.

Instructions

Get the schema and sample data for a specific parquet file

WORKFLOW NOTE: Use this function to explore the structure of parquet files
before writing SQL queries against them. This will show you:
1. All available columns and their data types
2. Sample data from the file
3. Total row count

Usage example:
1. Get list of files: files = list_available_sql_tables()
2. For a specific file: schema = get_sql_table_schema(files[0]['path'])
3. Use columns in your SQL: query_sql("SELECT column1, column2 FROM read_parquet('/path/to/file.parquet')")

Args:
    file_path: Path to the parquet file (from list_available_sql_tables or query_dataset)
    
Returns:
    Table schema information including columns, data types, and sample data

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_pathYes

Implementation Reference

  • MCP tool registration of get_sql_table_schema with @mcp.tool() decorator. Thin wrapper that delegates to the actual implementation in sql.py.
    @mcp.tool()
    def get_sql_table_schema(file_path: str) -> Dict[str, Any]:
        """
        Get the schema and sample data for a specific parquet file
        
        WORKFLOW NOTE: Use this function to explore the structure of parquet files
        before writing SQL queries against them. This will show you:
        1. All available columns and their data types
        2. Sample data from the file
        3. Total row count
        
        Usage example:
        1. Get list of files: files = list_available_sql_tables()
        2. For a specific file: schema = get_sql_table_schema(files[0]['path'])
        3. Use columns in your SQL: query_sql("SELECT column1, column2 FROM read_parquet('/path/to/file.parquet')")
        
        Args:
            file_path: Path to the parquet file (from list_available_sql_tables or query_dataset)
            
        Returns:
            Table schema information including columns, data types, and sample data
        """
        from cryo_mcp.sql import get_table_schema
        return get_table_schema(file_path)
  • Core handler implementation: Uses DuckDB to load the parquet file, query information_schema for columns and types, fetch sample rows, count total rows, and return structured schema info.
    def get_table_schema(file_path: str) -> Dict[str, Any]:
        """
        Get schema information for a parquet file.
        
        Args:
            file_path: Path to the parquet file
            
        Returns:
            Dictionary with schema information
        """
        conn = create_connection()
        
        try:
            path = Path(file_path)
            if not path.exists() or path.suffix != '.parquet':
                return {
                    "success": False,
                    "error": f"File not found or not a parquet file: {file_path}"
                }
            
            # Register a temporary view for the file
            conn.execute(f"CREATE VIEW temp_view AS SELECT * FROM '{file_path}'")
            
            # Get schema info
            schema_result = conn.execute("SELECT column_name, data_type FROM information_schema.columns WHERE table_name='temp_view'").fetchdf()
            
            # Get sample data
            sample_data = conn.execute("SELECT * FROM temp_view LIMIT 5").fetchdf()
            
            # Get row count (might be expensive for large files)
            row_count = conn.execute("SELECT COUNT(*) as count FROM temp_view").fetchone()[0]
            
            return {
                "success": True,
                "file_path": file_path,
                "columns": schema_result.to_dict(orient="records"),
                "sample_data": sample_data.to_dict(orient="records"),
                "row_count": row_count
            }
        except Exception as e:
            return {
                "success": False,
                "error": str(e)
            }
        finally:
            conn.close()
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well by disclosing key behaviors: it returns schema, sample data, and row count; it's for exploration (not modification); and it requires a file path from other tools. It doesn't mention performance characteristics or error handling, but covers the core functionality adequately.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (purpose, workflow note, usage example, args, returns) and every sentence adds value. It's slightly longer than minimal but justified by the comprehensive guidance. The front-loaded purpose statement is excellent.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a single-parameter tool with no annotations and no output schema, the description provides excellent context: clear purpose, usage guidelines, parameter explanation, and return value description. It doesn't detail the exact output structure, but given the tool's exploratory nature and the sibling context, this is reasonably complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0% description coverage, so the description must compensate fully. It clearly explains the single parameter's purpose ('Path to the parquet file'), source ('from list_available_sql_tables or query_dataset'), and provides usage examples showing how to obtain and use it, adding significant value beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Get the schema and sample data'), resource ('for a specific parquet file'), and distinguishes it from siblings like list_available_sql_tables (which lists files) and query_sql (which executes queries). The WORKFLOW NOTE further clarifies its exploratory purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use this tool ('to explore the structure of parquet files before writing SQL queries') and provides a detailed workflow example showing how it integrates with sibling tools (list_available_sql_tables and query_sql). It clearly positions this as a preparatory step for querying.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/z80dev/cryo-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server