describe_dataset

describe_dataset

Generate comprehensive statistics for tabular datasets to analyze structure, column types, numeric summaries, missing values, and data previews.

Instructions

Generate comprehensive statistics for a tabular dataset. Args: file_path: Path to CSV or SQLite file include_all: If True, include statistics for all columns (not just numeric) Returns: Dictionary containing: - shape: (rows, columns) - columns: List of column names with their types - numeric_stats: Descriptive statistics for numeric columns - missing_values: Count of missing values per column - sample: First 5 rows as preview

Input Schema

TableJSON Schema

Name	Required	Description	Default
`file_path`	Yes
`include_all`	No

Implementation Reference

src/mcp_tabular/server.py:108-160 (handler)
Main handler function for the 'describe_dataset' tool. Loads dataset from CSV/SQLite, computes comprehensive statistics including shape, column types, missing values, numeric descriptive stats (with skew/kurtosis), categorical summaries, and a data sample.
@mcp.tool() def describe_dataset(file_path: str, include_all: bool = False) -> dict[str, Any]: """ Generate comprehensive statistics for a tabular dataset. Args: file_path: Path to CSV or SQLite file include_all: If True, include statistics for all columns (not just numeric) Returns: Dictionary containing: - shape: (rows, columns) - columns: List of column names with their types - numeric_stats: Descriptive statistics for numeric columns - missing_values: Count of missing values per column - sample: First 5 rows as preview """ df = _load_data(file_path) # Basic info result = { "shape": {"rows": len(df), "columns": len(df.columns)}, "columns": { col: str(df[col].dtype) for col in df.columns }, "missing_values": df.isnull().sum().to_dict(), } # Numeric statistics numeric_cols = _get_numeric_columns(df) if numeric_cols: stats_df = df[numeric_cols].describe() # Add additional stats stats_df.loc["median"] = df[numeric_cols].median() stats_df.loc["skew"] = df[numeric_cols].skew() stats_df.loc["kurtosis"] = df[numeric_cols].kurtosis() result["numeric_stats"] = stats_df.to_dict() # Categorical columns info cat_cols = df.select_dtypes(include=["object", "category"]).columns.tolist() if cat_cols: result["categorical_columns"] = { col: { "unique_values": df[col].nunique(), "top_values": df[col].value_counts().head(5).to_dict() } for col in cat_cols } # Sample data result["sample"] = df.head(5).to_dict(orient="records") return result
src/mcp_tabular/server.py:103-106 (helper)
Helper function to identify numeric columns, used in describe_dataset for statistics computation.
def _get_numeric_columns(df: pd.DataFrame) -> list[str]: """Get list of numeric column names.""" return df.select_dtypes(include=[np.number]).columns.tolist()
src/mcp_tabular/server.py:70-101 (helper)
Core helper function to load datasets from CSV or SQLite files, handling path resolution and table selection for databases. Called by describe_dataset.
def _load_data(file_path: str) -> pd.DataFrame: """Load data from CSV or SQLite file.""" path = _resolve_path(file_path) if not path.exists(): raise FileNotFoundError( f"File not found: {file_path}\n" f"Resolved to: {path}\n" f"Project root: {_PROJECT_ROOT}\n" f"Current working directory: {Path.cwd()}" ) suffix = path.suffix.lower() if suffix == ".csv": return pd.read_csv(str(path)) elif suffix in (".db", ".sqlite", ".sqlite3"): # For SQLite, list tables or load first table conn = sqlite3.connect(str(path)) tables = pd.read_sql_query( "SELECT name FROM sqlite_master WHERE type='table'", conn ) if tables.empty: conn.close() raise ValueError(f"No tables found in SQLite database: {file_path}") first_table = tables.iloc[0]["name"] df = pd.read_sql_query(f"SELECT * FROM {first_table}", conn) conn.close() return df else: raise ValueError(f"Unsupported file format: {suffix}. Use .csv or .db/.sqlite")
src/mcp_tabular/server.py:49-68 (helper)
Helper to resolve relative file paths to absolute paths based on project root, used by _load_data.
def _resolve_path(file_path: str) -> Path: """ Resolve file path relative to project root if it's a relative path. Args: file_path: Absolute or relative file path Returns: Resolved absolute Path """ path = Path(file_path) # If absolute path, use as-is if path.is_absolute(): return path # Otherwise, resolve relative to project root resolved = _PROJECT_ROOT / path return resolved.resolve()
src/mcp_tabular/server.py:108-108 (registration)
MCP tool registration decorator for the describe_dataset function.
@mcp.tool()

MCP Tabular Data Analysis Server

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API