Skip to main content
Glama

data_summary

Generate comprehensive summaries of Excel file data to analyze spreadsheet contents, identify patterns, and extract key insights for data-driven decision making.

Instructions

Generate a comprehensive summary of the data in an Excel file.

Args:
    file_path: Path to the Excel file
    sheet_name: Name of the sheet to summarize (for Excel files)
    
Returns:
    Comprehensive data summary as string

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_pathYes
sheet_nameNo

Implementation Reference

  • The @mcp.tool() decorator registers the data_summary tool with the MCP server.
    @mcp.tool()
  • Function signature provides the input schema (file_path: str, optional sheet_name: str) and output type (str), with docstring describing parameters.
    def data_summary(file_path: str, sheet_name: Optional[str] = None) -> str:
        """
        Generate a comprehensive summary of the data in an Excel file.
        
        Args:
            file_path: Path to the Excel file
            sheet_name: Name of the sheet to summarize (for Excel files)
            
        Returns:
            Comprehensive data summary as string
        """
  • Full implementation of the data_summary tool. Reads various data files (Excel, CSV, TSV, JSON), computes file info, data structure, quality metrics, and statistics for numeric, categorical, and datetime columns, returns formatted JSON summary.
    @mcp.tool()
    def data_summary(file_path: str, sheet_name: Optional[str] = None) -> str:
        """
        Generate a comprehensive summary of the data in an Excel file.
        
        Args:
            file_path: Path to the Excel file
            sheet_name: Name of the sheet to summarize (for Excel files)
            
        Returns:
            Comprehensive data summary as string
        """
        try:
            # Read file
            _, ext = os.path.splitext(file_path)
            ext = ext.lower()
            
            read_params = {}
            if ext in ['.xlsx', '.xls', '.xlsm'] and sheet_name is not None:
                read_params["sheet_name"] = sheet_name
                
            if ext in ['.xlsx', '.xls', '.xlsm']:
                df = pd.read_excel(file_path, **read_params)
            elif ext == '.csv':
                df = pd.read_csv(file_path)
            elif ext == '.tsv':
                df = pd.read_csv(file_path, sep='\t')
            elif ext == '.json':
                df = pd.read_json(file_path)
            else:
                return f"Unsupported file extension: {ext}"
            
            # Basic file info
            file_info = {
                "file_name": os.path.basename(file_path),
                "file_type": ext,
                "file_size": f"{os.path.getsize(file_path) / 1024:.2f} KB",
                "last_modified": datetime.fromtimestamp(os.path.getmtime(file_path)).strftime('%Y-%m-%d %H:%M:%S')
            }
            
            # Data structure
            data_structure = {
                "rows": df.shape[0],
                "columns": df.shape[1],
                "column_names": list(df.columns),
                "column_types": {col: str(dtype) for col, dtype in df.dtypes.items()},
                "memory_usage": f"{df.memory_usage(deep=True).sum() / 1024:.2f} KB"
            }
            
            # Data quality
            data_quality = {
                "missing_values": {col: int(count) for col, count in df.isnull().sum().items()},
                "missing_percentage": {col: f"{count/len(df)*100:.2f}%" for col, count in df.isnull().sum().items()},
                "duplicate_rows": int(df.duplicated().sum()),
                "unique_values": {col: int(df[col].nunique()) for col in df.columns}
            }
            
            # Statistical summary
            numeric_cols = df.select_dtypes(include=['number']).columns
            categorical_cols = df.select_dtypes(include=['object', 'category']).columns
            datetime_cols = df.select_dtypes(include=['datetime', 'datetime64']).columns
            
            statistics = {}
            if len(numeric_cols) > 0:
                statistics["numeric"] = df[numeric_cols].describe().to_dict()
            
            if len(categorical_cols) > 0:
                statistics["categorical"] = {
                    col: {
                        "unique_values": int(df[col].nunique()),
                        "top_values": df[col].value_counts().head(5).to_dict()
                    } for col in categorical_cols
                }
            
            if len(datetime_cols) > 0:
                statistics["datetime"] = {
                    col: {
                        "min": df[col].min().strftime('%Y-%m-%d') if pd.notna(df[col].min()) else None,
                        "max": df[col].max().strftime('%Y-%m-%d') if pd.notna(df[col].max()) else None,
                        "range_days": (df[col].max() - df[col].min()).days if pd.notna(df[col].min()) and pd.notna(df[col].max()) else None
                    } for col in datetime_cols
                }
            
            # Combine all info
            summary = {
                "file_info": file_info,
                "data_structure": data_structure,
                "data_quality": data_quality,
                "statistics": statistics
            }
            
            return json.dumps(summary, indent=2, default=str)
        except Exception as e:
            return f"Error generating summary: {str(e)}"
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions generating a 'comprehensive summary' but doesn't specify what that entails (e.g., statistical summaries, data types, missing values, or format details). It also lacks information on permissions, file size limits, error handling, or performance characteristics, which are critical for a tool that reads and processes Excel files.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, with the main purpose stated first. The Args and Returns sections are structured clearly, though the 'Returns' section is somewhat vague ('Comprehensive data summary as string'). There's minimal waste, but it could be more precise in defining the output format.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of processing Excel data, no annotations, no output schema, and low schema coverage, the description is incomplete. It doesn't explain what a 'comprehensive summary' includes, how errors are handled, or any limitations (e.g., file size, supported Excel versions). For a tool with 2 parameters and no structured safety hints, this leaves significant gaps for an AI agent to use it correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It adds basic meaning by explaining that 'file_path' is the 'Path to the Excel file' and 'sheet_name' is the 'Name of the sheet to summarize (for Excel files)', which clarifies their roles beyond the schema's titles. However, it doesn't provide details on accepted file formats, path constraints, or sheet name handling when null, leaving gaps in parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Generate a comprehensive summary of the data in an Excel file.' This specifies the verb ('generate'), resource ('data in an Excel file'), and output type ('comprehensive summary'). However, it doesn't explicitly differentiate from sibling tools like 'analyze_excel' or 'read_excel', which might have overlapping functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. With sibling tools like 'analyze_excel', 'filter_excel', 'pivot_table', and 'read_excel' available, there's no indication of what makes this tool unique or when it should be preferred over other data processing tools. Usage is implied only by the general purpose statement.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/yzfly/mcp-excel-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server