Skip to main content
Glama
K02D

MCP Tabular Data Analysis Server

by K02D

describe_dataset

Generate comprehensive statistics for tabular datasets to analyze structure, column types, numeric summaries, missing values, and data previews.

Instructions

Generate comprehensive statistics for a tabular dataset.

Args:
    file_path: Path to CSV or SQLite file
    include_all: If True, include statistics for all columns (not just numeric)

Returns:
    Dictionary containing:
    - shape: (rows, columns)
    - columns: List of column names with their types
    - numeric_stats: Descriptive statistics for numeric columns
    - missing_values: Count of missing values per column
    - sample: First 5 rows as preview

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_pathYes
include_allNo

Implementation Reference

  • Main handler function for the 'describe_dataset' tool. Loads dataset from CSV/SQLite, computes comprehensive statistics including shape, column types, missing values, numeric descriptive stats (with skew/kurtosis), categorical summaries, and a data sample.
    @mcp.tool()
    def describe_dataset(file_path: str, include_all: bool = False) -> dict[str, Any]:
        """
        Generate comprehensive statistics for a tabular dataset.
        
        Args:
            file_path: Path to CSV or SQLite file
            include_all: If True, include statistics for all columns (not just numeric)
        
        Returns:
            Dictionary containing:
            - shape: (rows, columns)
            - columns: List of column names with their types
            - numeric_stats: Descriptive statistics for numeric columns
            - missing_values: Count of missing values per column
            - sample: First 5 rows as preview
        """
        df = _load_data(file_path)
        
        # Basic info
        result = {
            "shape": {"rows": len(df), "columns": len(df.columns)},
            "columns": {
                col: str(df[col].dtype) for col in df.columns
            },
            "missing_values": df.isnull().sum().to_dict(),
        }
        
        # Numeric statistics
        numeric_cols = _get_numeric_columns(df)
        if numeric_cols:
            stats_df = df[numeric_cols].describe()
            # Add additional stats
            stats_df.loc["median"] = df[numeric_cols].median()
            stats_df.loc["skew"] = df[numeric_cols].skew()
            stats_df.loc["kurtosis"] = df[numeric_cols].kurtosis()
            result["numeric_stats"] = stats_df.to_dict()
        
        # Categorical columns info
        cat_cols = df.select_dtypes(include=["object", "category"]).columns.tolist()
        if cat_cols:
            result["categorical_columns"] = {
                col: {
                    "unique_values": df[col].nunique(),
                    "top_values": df[col].value_counts().head(5).to_dict()
                }
                for col in cat_cols
            }
        
        # Sample data
        result["sample"] = df.head(5).to_dict(orient="records")
        
        return result
  • Helper function to identify numeric columns, used in describe_dataset for statistics computation.
    def _get_numeric_columns(df: pd.DataFrame) -> list[str]:
        """Get list of numeric column names."""
        return df.select_dtypes(include=[np.number]).columns.tolist()
  • Core helper function to load datasets from CSV or SQLite files, handling path resolution and table selection for databases. Called by describe_dataset.
    def _load_data(file_path: str) -> pd.DataFrame:
        """Load data from CSV or SQLite file."""
        path = _resolve_path(file_path)
        
        if not path.exists():
            raise FileNotFoundError(
                f"File not found: {file_path}\n"
                f"Resolved to: {path}\n"
                f"Project root: {_PROJECT_ROOT}\n"
                f"Current working directory: {Path.cwd()}"
            )
        
        suffix = path.suffix.lower()
        
        if suffix == ".csv":
            return pd.read_csv(str(path))
        elif suffix in (".db", ".sqlite", ".sqlite3"):
            # For SQLite, list tables or load first table
            conn = sqlite3.connect(str(path))
            tables = pd.read_sql_query(
                "SELECT name FROM sqlite_master WHERE type='table'", conn
            )
            if tables.empty:
                conn.close()
                raise ValueError(f"No tables found in SQLite database: {file_path}")
            first_table = tables.iloc[0]["name"]
            df = pd.read_sql_query(f"SELECT * FROM {first_table}", conn)
            conn.close()
            return df
        else:
            raise ValueError(f"Unsupported file format: {suffix}. Use .csv or .db/.sqlite")
  • Helper to resolve relative file paths to absolute paths based on project root, used by _load_data.
    def _resolve_path(file_path: str) -> Path:
        """
        Resolve file path relative to project root if it's a relative path.
        
        Args:
            file_path: Absolute or relative file path
        
        Returns:
            Resolved absolute Path
        """
        path = Path(file_path)
        
        # If absolute path, use as-is
        if path.is_absolute():
            return path
        
        # Otherwise, resolve relative to project root
        resolved = _PROJECT_ROOT / path
        return resolved.resolve()
  • MCP tool registration decorator for the describe_dataset function.
    @mcp.tool()

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/K02D/mcp-tabular'

If you have feedback or need assistance with the MCP directory API, please join our Discord server