get_sql_table_schema
Retrieve the schema, sample data, and row count of a parquet file to analyze its structure before querying. Ideal for understanding columns, data types, and preparing SQL queries efficiently.
Instructions
Get the schema and sample data for a specific parquet file
WORKFLOW NOTE: Use this function to explore the structure of parquet files
before writing SQL queries against them. This will show you:
1. All available columns and their data types
2. Sample data from the file
3. Total row count
Usage example:
1. Get list of files: files = list_available_sql_tables()
2. For a specific file: schema = get_sql_table_schema(files[0]['path'])
3. Use columns in your SQL: query_sql("SELECT column1, column2 FROM read_parquet('/path/to/file.parquet')")
Args:
file_path: Path to the parquet file (from list_available_sql_tables or query_dataset)
Returns:
Table schema information including columns, data types, and sample data
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| file_path | Yes |
Implementation Reference
- cryo_mcp/server.py:711-735 (registration)MCP tool registration of get_sql_table_schema with @mcp.tool() decorator. Thin wrapper that delegates to the actual implementation in sql.py.@mcp.tool() def get_sql_table_schema(file_path: str) -> Dict[str, Any]: """ Get the schema and sample data for a specific parquet file WORKFLOW NOTE: Use this function to explore the structure of parquet files before writing SQL queries against them. This will show you: 1. All available columns and their data types 2. Sample data from the file 3. Total row count Usage example: 1. Get list of files: files = list_available_sql_tables() 2. For a specific file: schema = get_sql_table_schema(files[0]['path']) 3. Use columns in your SQL: query_sql("SELECT column1, column2 FROM read_parquet('/path/to/file.parquet')") Args: file_path: Path to the parquet file (from list_available_sql_tables or query_dataset) Returns: Table schema information including columns, data types, and sample data """ from cryo_mcp.sql import get_table_schema return get_table_schema(file_path)
- cryo_mcp/sql.py:268-313 (handler)Core handler implementation: Uses DuckDB to load the parquet file, query information_schema for columns and types, fetch sample rows, count total rows, and return structured schema info.def get_table_schema(file_path: str) -> Dict[str, Any]: """ Get schema information for a parquet file. Args: file_path: Path to the parquet file Returns: Dictionary with schema information """ conn = create_connection() try: path = Path(file_path) if not path.exists() or path.suffix != '.parquet': return { "success": False, "error": f"File not found or not a parquet file: {file_path}" } # Register a temporary view for the file conn.execute(f"CREATE VIEW temp_view AS SELECT * FROM '{file_path}'") # Get schema info schema_result = conn.execute("SELECT column_name, data_type FROM information_schema.columns WHERE table_name='temp_view'").fetchdf() # Get sample data sample_data = conn.execute("SELECT * FROM temp_view LIMIT 5").fetchdf() # Get row count (might be expensive for large files) row_count = conn.execute("SELECT COUNT(*) as count FROM temp_view").fetchone()[0] return { "success": True, "file_path": file_path, "columns": schema_result.to_dict(orient="records"), "sample_data": sample_data.to_dict(orient="records"), "row_count": row_count } except Exception as e: return { "success": False, "error": str(e) } finally: conn.close()