analyze_data
Analyze datasets to extract statistics and identify data types, enabling data exploration and insight generation from files.
Instructions
Perform basic analysis on a dataset.
Args: file_path: Path to the data file
Returns: Analysis results including statistics and data types
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| file_path | Yes |
Implementation Reference
- src/visidata_mcp/server.py:184-257 (handler)The analyze_data tool handler that performs basic analysis on datasets. It loads data from a file (CSV, JSON, Excel, TSV) using pandas, analyzes each column for data type, null counts, sample values, and basic statistics (min/max/mean for numeric columns, most common values for text columns), and returns the analysis as JSON.
@mcp.tool() def analyze_data(file_path: str) -> str: """ Perform basic analysis on a dataset. Args: file_path: Path to the data file Returns: Analysis results including statistics and data types """ try: import pandas as pd from pathlib import Path file_extension = Path(file_path).suffix.lower() # Load with pandas if file_extension == '.csv': df = pd.read_csv(file_path) elif file_extension == '.json': df = pd.read_json(file_path) elif file_extension in ['.xlsx', '.xls']: df = pd.read_excel(file_path) elif file_extension == '.tsv': df = pd.read_csv(file_path, sep='\t') else: df = pd.read_csv(file_path) analysis = { "filename": Path(file_path).name, "total_rows": len(df), "total_columns": len(df.columns), "columns": [] } # Analyze each column for col_name in df.columns: col_data = df[col_name] col_info = { "name": col_name, "type": str(col_data.dtype), "null_count": int(col_data.isna().sum()), "non_null_count": int(col_data.notna().sum()), } # Get some sample values sample_values = [] valid_values = col_data.dropna().head(5) for value in valid_values: if hasattr(value, 'item'): # numpy types sample_values.append(value.item()) else: sample_values.append(str(value) if value is not None else None) col_info["sample_values"] = sample_values # Add basic statistics for numeric columns if pd.api.types.is_numeric_dtype(col_data): col_info["min"] = float(col_data.min()) if not col_data.empty else None col_info["max"] = float(col_data.max()) if not col_data.empty else None col_info["mean"] = float(col_data.mean()) if not col_data.empty else None col_info["unique_count"] = int(col_data.nunique()) else: col_info["unique_count"] = int(col_data.nunique()) col_info["most_common"] = list(col_data.value_counts().head(3).index) analysis["columns"].append(col_info) return json.dumps(analysis, indent=2) except Exception as e: return f"Error analyzing data: {str(e)}\n{traceback.format_exc()}" - src/visidata_mcp/server.py:184-184 (registration)The @mcp.tool() decorator registers the analyze_data function as an MCP tool with the FastMCP server.
@mcp.tool()