get_column_stats
Analyze a specific column in a data file to generate statistical insights in JSON format for data analysis and visualization.
Instructions
Get statistics for a specific column.
Args: file_path: Path to the data file column: Column name to analyze
Returns: Column statistics in JSON format
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| file_path | Yes | ||
| column | Yes |
Implementation Reference
- src/visidata_mcp/server.py:407-488 (handler)Main implementation of get_column_stats tool. Loads data from file using pandas, calculates comprehensive statistics for the specified column including type, null counts, unique values, sample data. For numeric columns: min, max, mean, median, std, and quartiles. For non-numeric: most common values with counts and percentages.
@mcp.tool() def get_column_stats(file_path: str, column: str) -> str: """ Get statistics for a specific column. Args: file_path: Path to the data file column: Column name to analyze Returns: Column statistics in JSON format """ try: import pandas as pd from pathlib import Path file_extension = Path(file_path).suffix.lower() # Load with pandas if file_extension == '.csv': df = pd.read_csv(file_path) elif file_extension == '.json': df = pd.read_json(file_path) elif file_extension in ['.xlsx', '.xls']: df = pd.read_excel(file_path) elif file_extension == '.tsv': df = pd.read_csv(file_path, sep='\t') else: df = pd.read_csv(file_path) if column not in df.columns: return f"Error: Column '{column}' not found. Available columns: {list(df.columns)}" col_data = df[column] stats = { "column": column, "type": str(col_data.dtype), "total_values": len(col_data), "non_null_values": int(col_data.notna().sum()), "null_count": int(col_data.isna().sum()), "unique_values": int(col_data.nunique()) } # Sample values sample_values = [] for value in col_data.dropna().head(10): if hasattr(value, 'item'): # numpy types sample_values.append(value.item()) else: sample_values.append(str(value)) stats["sample_values"] = sample_values # If numeric, calculate additional stats if pd.api.types.is_numeric_dtype(col_data): numeric_col = col_data.dropna() if not numeric_col.empty: stats["min"] = float(numeric_col.min()) stats["max"] = float(numeric_col.max()) stats["mean"] = float(numeric_col.mean()) stats["median"] = float(numeric_col.median()) stats["std"] = float(numeric_col.std()) stats["quartiles"] = { "25%": float(numeric_col.quantile(0.25)), "50%": float(numeric_col.quantile(0.50)), "75%": float(numeric_col.quantile(0.75)) } else: # For non-numeric, show value counts value_counts = col_data.value_counts().head(10) stats["most_common"] = [] for value, count in value_counts.items(): stats["most_common"].append({ "value": str(value), "count": int(count), "percentage": round(count / len(col_data) * 100, 2) }) return json.dumps(stats, indent=2) except Exception as e: return f"Error getting column stats: {str(e)}\n{traceback.format_exc()}" - src/visidata_mcp/server.py:407-418 (registration)Tool registration via @mcp.tool() decorator that registers get_column_stats as an MCP tool with parameters file_path and column.
@mcp.tool() def get_column_stats(file_path: str, column: str) -> str: """ Get statistics for a specific column. Args: file_path: Path to the data file column: Column name to analyze Returns: Column statistics in JSON format """ - examples/demo.py:23-23 (helper)Import of get_column_stats function in demo example.
get_column_stats, - examples/demo.py:96-105 (helper)Usage example showing how to call get_column_stats with file path and column name, then parse and display the returned statistics.
result = get_column_stats(str(sample_file), "salary") stats = json.loads(result) print(f"Statistics for '{stats['column']}' column:") print(f"- Type: {stats['type']}") print(f"- Values: {stats['total_values']}") print(f"- Null count: {stats['null_count']}") if 'min' in stats: print(f"- Range: ${stats['min']:,} - ${stats['max']:,}") print(f"- Average: ${stats['mean']:,.2f}") print()