Skip to main content
Glama

get_data_summary

Analyze dataset structure, dimensions, data types, and memory usage to understand data characteristics for exploration and analysis planning.

Instructions

Get comprehensive data overview and structural summary.

Provides high-level overview of dataset structure, dimensions, data types, and memory usage. Essential first step in data exploration and analysis planning workflows.

Returns: Comprehensive data overview with structural information

Summary Components: ๐Ÿ“ Dimensions: Rows, columns, shape information ๐Ÿ”ข Data Types: Column type distribution and analysis ๐Ÿ’พ Memory Usage: Resource consumption breakdown ๐Ÿ‘€ Preview: Sample rows for quick data understanding (optional) ๐Ÿ“Š Overview: High-level dataset characteristics

Examples: # Full data summary with preview summary = await get_data_summary(ctx)

# Structure summary without preview data summary = await get_data_summary(ctx, include_preview=False)

AI Workflow Integration: 1. Initial data exploration and understanding 2. Planning analytical approaches based on data structure 3. Resource planning for large dataset processing 4. Data quality initial assessment

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
include_previewYesInclude sample data rows in summary
max_preview_rowsYesMaximum number of preview rows to include

Implementation Reference

  • The main handler function that executes the get_data_summary tool. It retrieves the session data, computes shape, column info, data types, missing data statistics, memory usage, and optional data preview.
    async def get_data_summary( ctx: Annotated[Context, Field(description="FastMCP context for session access")], *, include_preview: Annotated[ bool, Field(description="Include sample data rows in summary"), ] = True, max_preview_rows: Annotated[ int, Field(description="Maximum number of preview rows to include"), ] = 10, ) -> DataSummaryResult: """Get comprehensive data overview and structural summary. Provides high-level overview of dataset structure, dimensions, data types, and memory usage. Essential first step in data exploration and analysis planning workflows. Returns: Comprehensive data overview with structural information Summary Components: ๐Ÿ“ Dimensions: Rows, columns, shape information ๐Ÿ”ข Data Types: Column type distribution and analysis ๐Ÿ’พ Memory Usage: Resource consumption breakdown ๐Ÿ‘€ Preview: Sample rows for quick data understanding (optional) ๐Ÿ“Š Overview: High-level dataset characteristics Examples: # Full data summary with preview summary = await get_data_summary(ctx) # Structure summary without preview data summary = await get_data_summary(ctx, include_preview=False) AI Workflow Integration: 1. Initial data exploration and understanding 2. Planning analytical approaches based on data structure 3. Resource planning for large dataset processing 4. Data quality initial assessment """ # Get session_id from FastMCP context session_id = ctx.session_id _session, df = get_session_data(session_id) # Create coordinate system coordinate_system = { "row_indexing": f"0 to {len(df) - 1} (0-based)", "column_indexing": "Use column names or 0-based indices", } # Create shape info shape = {"rows": len(df), "columns": len(df.columns)} # Create DataTypeInfo objects for each column columns_info = {} for col in df.columns: col_dtype = str(df[col].dtype) # Map pandas dtypes to Pydantic model literals if "int" in col_dtype: mapped_dtype = "int64" elif "float" in col_dtype: mapped_dtype = "float64" elif "bool" in col_dtype: mapped_dtype = "bool" elif "datetime" in col_dtype: mapped_dtype = "datetime64" elif "category" in col_dtype: mapped_dtype = "category" else: mapped_dtype = "object" columns_info[str(col)] = DataTypeInfo( type=cast( "Literal['int64', 'float64', 'object', 'bool', 'datetime64', 'category']", mapped_dtype, ), nullable=bool(df[col].isna().any()), unique_count=int(df[col].nunique()), null_count=int(df[col].isna().sum()), ) # Create data types categorization (convert column names to strings) data_types = { "numeric": [str(col) for col in df.select_dtypes(include=["number"]).columns], "text": [str(col) for col in df.select_dtypes(include=["object"]).columns], "datetime": [str(col) for col in df.select_dtypes(include=["datetime"]).columns], "boolean": [str(col) for col in df.select_dtypes(include=["bool"]).columns], } # Create missing data info total_missing = int(df.isna().sum().sum()) missing_by_column = {str(col): int(df[col].isna().sum()) for col in df.columns} # Handle empty dataframe total_cells = len(df) * len(df.columns) missing_percentage = round(total_missing / total_cells * 100, 2) if total_cells > 0 else 0.0 missing_data = MissingDataInfo( total_missing=total_missing, missing_by_column=missing_by_column, missing_percentage=missing_percentage, ) # Create preview if include_preview: preview_data = create_data_preview_with_indices(df, num_rows=max_preview_rows) # Convert to DataPreview object preview = DataPreview( rows=preview_data.get("records", []), row_count=preview_data.get("total_rows", 0), column_count=preview_data.get("total_columns", 0), truncated=preview_data.get("preview_rows", 0) < preview_data.get("total_rows", 0), ) else: preview = None # Calculate memory usage memory_usage_mb = round(df.memory_usage(deep=True).sum() / (1024 * 1024), 2) return DataSummaryResult( coordinate_system=coordinate_system, shape=shape, columns=columns_info, data_types=data_types, missing_data=missing_data, memory_usage_mb=memory_usage_mb, preview=preview, )
  • Pydantic model defining the output schema/response structure for the get_data_summary tool, including shape, columns, data types, missing data, memory usage, and optional preview.
    class DataSummaryResult(BaseToolResponse): """Response model for data overview and summary.""" coordinate_system: dict[str, str] shape: dict[str, int] columns: dict[str, DataTypeInfo] data_types: dict[str, list[str]] missing_data: MissingDataInfo memory_usage_mb: float preview: DataPreview | None = None
  • FastMCP tool registration decorator that registers the get_data_summary function as a tool named 'get_data_summary' on the discovery_server.
    discovery_server.tool(name="get_data_summary")(get_data_summary)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jonpspri/databeak'

If you have feedback or need assistance with the MCP directory API, please join our Discord server