Skip to main content
Glama

get_data_summary

Generate a comprehensive overview of dataset structure, dimensions, data types, and memory usage to support initial data exploration and analysis planning.

Instructions

Get comprehensive data overview and structural summary.

Provides high-level overview of dataset structure, dimensions, data types, and memory usage. Essential first step in data exploration and analysis planning workflows.

Returns: Comprehensive data overview with structural information

Summary Components: šŸ“ Dimensions: Rows, columns, shape information šŸ”¢ Data Types: Column type distribution and analysis šŸ’¾ Memory Usage: Resource consumption breakdown šŸ‘€ Preview: Sample rows for quick data understanding (optional) šŸ“Š Overview: High-level dataset characteristics

Examples: # Full data summary with preview summary = await get_data_summary(ctx)

# Structure summary without preview data summary = await get_data_summary(ctx, include_preview=False)

AI Workflow Integration: 1. Initial data exploration and understanding 2. Planning analytical approaches based on data structure 3. Resource planning for large dataset processing 4. Data quality initial assessment

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
include_previewYesInclude sample data rows in summary
max_preview_rowsYesMaximum number of preview rows to include

Implementation Reference

  • The core handler function for the 'get_data_summary' tool. Retrieves the dataframe from the session, computes dataset shape, column data types and stats, missing data information, memory usage, and an optional data preview using create_data_preview_with_indices.
    async def get_data_summary( ctx: Annotated[Context, Field(description="FastMCP context for session access")], *, include_preview: Annotated[ bool, Field(description="Include sample data rows in summary"), ] = True, max_preview_rows: Annotated[ int, Field(description="Maximum number of preview rows to include"), ] = 10, ) -> DataSummaryResult: """Get comprehensive data overview and structural summary. Provides high-level overview of dataset structure, dimensions, data types, and memory usage. Essential first step in data exploration and analysis planning workflows. Returns: Comprehensive data overview with structural information Summary Components: šŸ“ Dimensions: Rows, columns, shape information šŸ”¢ Data Types: Column type distribution and analysis šŸ’¾ Memory Usage: Resource consumption breakdown šŸ‘€ Preview: Sample rows for quick data understanding (optional) šŸ“Š Overview: High-level dataset characteristics Examples: # Full data summary with preview summary = await get_data_summary(ctx) # Structure summary without preview data summary = await get_data_summary(ctx, include_preview=False) AI Workflow Integration: 1. Initial data exploration and understanding 2. Planning analytical approaches based on data structure 3. Resource planning for large dataset processing 4. Data quality initial assessment """ # Get session_id from FastMCP context session_id = ctx.session_id _session, df = get_session_data(session_id) # Create coordinate system coordinate_system = { "row_indexing": f"0 to {len(df) - 1} (0-based)", "column_indexing": "Use column names or 0-based indices", } # Create shape info shape = {"rows": len(df), "columns": len(df.columns)} # Create DataTypeInfo objects for each column columns_info = {} for col in df.columns: col_dtype = str(df[col].dtype) # Map pandas dtypes to Pydantic model literals if "int" in col_dtype: mapped_dtype = "int64" elif "float" in col_dtype: mapped_dtype = "float64" elif "bool" in col_dtype: mapped_dtype = "bool" elif "datetime" in col_dtype: mapped_dtype = "datetime64" elif "category" in col_dtype: mapped_dtype = "category" else: mapped_dtype = "object" columns_info[str(col)] = DataTypeInfo( type=cast( "Literal['int64', 'float64', 'object', 'bool', 'datetime64', 'category']", mapped_dtype, ), nullable=bool(df[col].isna().any()), unique_count=int(df[col].nunique()), null_count=int(df[col].isna().sum()), ) # Create data types categorization (convert column names to strings) data_types = { "numeric": [str(col) for col in df.select_dtypes(include=["number"]).columns], "text": [str(col) for col in df.select_dtypes(include=["object"]).columns], "datetime": [str(col) for col in df.select_dtypes(include=["datetime"]).columns], "boolean": [str(col) for col in df.select_dtypes(include=["bool"]).columns], } # Create missing data info total_missing = int(df.isna().sum().sum()) missing_by_column = {str(col): int(df[col].isna().sum()) for col in df.columns} # Handle empty dataframe total_cells = len(df) * len(df.columns) missing_percentage = round(total_missing / total_cells * 100, 2) if total_cells > 0 else 0.0 missing_data = MissingDataInfo( total_missing=total_missing, missing_by_column=missing_by_column, missing_percentage=missing_percentage, ) # Create preview if include_preview: preview_data = create_data_preview_with_indices(df, num_rows=max_preview_rows) # Convert to DataPreview object preview = DataPreview( rows=preview_data.get("records", []), row_count=preview_data.get("total_rows", 0), column_count=preview_data.get("total_columns", 0), truncated=preview_data.get("preview_rows", 0) < preview_data.get("total_rows", 0), ) else: preview = None # Calculate memory usage memory_usage_mb = round(df.memory_usage(deep=True).sum() / (1024 * 1024), 2) return DataSummaryResult( coordinate_system=coordinate_system, shape=shape, columns=columns_info, data_types=data_types, missing_data=missing_data, memory_usage_mb=memory_usage_mb, preview=preview, )
  • Pydantic model defining the output schema for the get_data_summary tool response, including dataset structure, types, missing data, and preview.
    class DataSummaryResult(BaseToolResponse): """Response model for data overview and summary.""" coordinate_system: dict[str, str] shape: dict[str, int] columns: dict[str, DataTypeInfo] data_types: dict[str, list[str]] missing_data: MissingDataInfo memory_usage_mb: float preview: DataPreview | None = None
  • Explicit registration of the get_data_summary handler as an MCP tool on the discovery_server FastMCP instance.
    discovery_server.tool(name="get_data_summary")(get_data_summary)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jonpspri/databeak'

If you have feedback or need assistance with the MCP directory API, please join our Discord server