Skip to main content
Glama
K02D

MCP Tabular Data Analysis Server

by K02D

describe_dataset

Generate comprehensive statistics for tabular datasets to analyze structure, column types, numeric metrics, missing values, and sample previews.

Instructions

Generate comprehensive statistics for a tabular dataset. Args: file_path: Path to CSV or SQLite file include_all: If True, include statistics for all columns (not just numeric) Returns: Dictionary containing: - shape: (rows, columns) - columns: List of column names with their types - numeric_stats: Descriptive statistics for numeric columns - missing_values: Count of missing values per column - sample: First 5 rows as preview

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_pathYes
include_allNo

Implementation Reference

  • The primary handler function for the describe_dataset tool. Decorated with @mcp.tool() for registration in the FastMCP server. Loads dataset, computes shape, dtypes, missing values, enhanced numeric stats (describe + median/skew/kurtosis), categorical summaries, and head sample.
    @mcp.tool() def describe_dataset(file_path: str, include_all: bool = False) -> dict[str, Any]: """ Generate comprehensive statistics for a tabular dataset. Args: file_path: Path to CSV or SQLite file include_all: If True, include statistics for all columns (not just numeric) Returns: Dictionary containing: - shape: (rows, columns) - columns: List of column names with their types - numeric_stats: Descriptive statistics for numeric columns - missing_values: Count of missing values per column - sample: First 5 rows as preview """ df = _load_data(file_path) # Basic info result = { "shape": {"rows": len(df), "columns": len(df.columns)}, "columns": { col: str(df[col].dtype) for col in df.columns }, "missing_values": df.isnull().sum().to_dict(), } # Numeric statistics numeric_cols = _get_numeric_columns(df) if numeric_cols: stats_df = df[numeric_cols].describe() # Add additional stats stats_df.loc["median"] = df[numeric_cols].median() stats_df.loc["skew"] = df[numeric_cols].skew() stats_df.loc["kurtosis"] = df[numeric_cols].kurtosis() result["numeric_stats"] = stats_df.to_dict() # Categorical columns info cat_cols = df.select_dtypes(include=["object", "category"]).columns.tolist() if cat_cols: result["categorical_columns"] = { col: { "unique_values": df[col].nunique(), "top_values": df[col].value_counts().head(5).to_dict() } for col in cat_cols } # Sample data result["sample"] = df.head(5).to_dict(orient="records") return result
  • Key helper function called by describe_dataset to load CSV or SQLite files into pandas DataFrame, handling path resolution and first-table loading for DBs.
    def _load_data(file_path: str) -> pd.DataFrame: """Load data from CSV or SQLite file.""" path = _resolve_path(file_path) if not path.exists(): raise FileNotFoundError( f"File not found: {file_path}\n" f"Resolved to: {path}\n" f"Project root: {_PROJECT_ROOT}\n" f"Current working directory: {Path.cwd()}" ) suffix = path.suffix.lower() if suffix == ".csv": return pd.read_csv(str(path)) elif suffix in (".db", ".sqlite", ".sqlite3"): # For SQLite, list tables or load first table conn = sqlite3.connect(str(path)) tables = pd.read_sql_query( "SELECT name FROM sqlite_master WHERE type='table'", conn ) if tables.empty: conn.close() raise ValueError(f"No tables found in SQLite database: {file_path}") first_table = tables.iloc[0]["name"] df = pd.read_sql_query(f"SELECT * FROM {first_table}", conn) conn.close() return df else: raise ValueError(f"Unsupported file format: {suffix}. Use .csv or .db/.sqlite")
  • Helper function used by describe_dataset to identify numeric columns for statistics computation.
    def _get_numeric_columns(df: pd.DataFrame) -> list[str]: """Get list of numeric column names.""" return df.select_dtypes(include=[np.number]).columns.tolist()
  • Helper function used by _load_data to resolve relative file paths to absolute paths based on project root.
    def _resolve_path(file_path: str) -> Path: """ Resolve file path relative to project root if it's a relative path. Args: file_path: Absolute or relative file path Returns: Resolved absolute Path """ path = Path(file_path) # If absolute path, use as-is if path.is_absolute(): return path # Otherwise, resolve relative to project root resolved = _PROJECT_ROOT / path return resolved.resolve()
  • The @mcp.tool() decorator registers the describe_dataset function as an MCP tool in the FastMCP server instance.
    @mcp.tool()

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/K02D/mcp-tabular'

If you have feedback or need assistance with the MCP directory API, please join our Discord server