Skip to main content
Glama
K02D

MCP Tabular Data Analysis Server

by K02D

group_aggregate

Group tabular data by specified columns and compute aggregations like sum, mean, count, or max to analyze patterns and summarize datasets.

Instructions

Group data and compute aggregations.

Args:
    file_path: Path to CSV or SQLite file
    group_by: Columns to group by
    aggregations: Dictionary mapping column names to list of aggregation functions
                 (e.g., {"sales": ["sum", "mean"], "quantity": ["count", "max"]})
                 Supported: sum, mean, median, min, max, count, std, var

Returns:
    Dictionary containing grouped and aggregated data

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_pathYes
group_byYes
aggregationsYes

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • The main handler function for the 'group_aggregate' tool. It loads data using _load_data, validates inputs, performs pandas groupby with specified aggregations, flattens multi-level columns, and returns the results as a dictionary with metadata.
    @mcp.tool()
    def group_aggregate(
        file_path: str,
        group_by: list[str],
        aggregations: dict[str, list[str]],
    ) -> dict[str, Any]:
        """
        Group data and compute aggregations.
        
        Args:
            file_path: Path to CSV or SQLite file
            group_by: Columns to group by
            aggregations: Dictionary mapping column names to list of aggregation functions
                         (e.g., {"sales": ["sum", "mean"], "quantity": ["count", "max"]})
                         Supported: sum, mean, median, min, max, count, std, var
        
        Returns:
            Dictionary containing grouped and aggregated data
        """
        df = _load_data(file_path)
        
        # Validate group_by columns
        invalid = [c for c in group_by if c not in df.columns]
        if invalid:
            raise ValueError(f"Group-by columns not found: {invalid}")
        
        # Validate aggregation columns
        for col in aggregations:
            if col not in df.columns:
                raise ValueError(f"Aggregation column '{col}' not found")
        
        # Perform groupby
        grouped = df.groupby(group_by).agg(aggregations)
        
        # Flatten column names
        grouped.columns = ["_".join(col).strip() for col in grouped.columns]
        grouped = grouped.reset_index()
        
        return {
            "group_by": group_by,
            "aggregations": aggregations,
            "group_count": len(grouped),
            "result": grouped.to_dict(orient="records"),
        }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. While it mentions the tool processes CSV or SQLite files and returns a dictionary, it lacks critical behavioral details: whether this operation modifies source files, memory/performance characteristics for large datasets, error handling for invalid inputs, or authentication requirements. The description covers basic functionality but misses important operational context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly structured and concise. It begins with a clear purpose statement, then provides organized parameter documentation with helpful examples, and concludes with return value information. Every sentence earns its place, and the formatting with bullet-like sections makes it easily scannable without unnecessary verbiage.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (3 parameters with nested objects), no annotations, but with an output schema present, the description is reasonably complete. It covers all parameters thoroughly, specifies supported file formats and aggregation functions, and mentions the return type. The main gap is lack of behavioral context about file handling and performance, but the parameter documentation is comprehensive enough for basic usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description provides excellent parameter semantics despite 0% schema description coverage. It clearly explains each parameter's purpose: 'file_path' accepts CSV or SQLite files, 'group_by' takes columns for grouping, and 'aggregations' is a dictionary mapping columns to specific functions with enumerated examples. The list of supported aggregation functions ('sum, mean, median, min, max, count, std, var') adds crucial value beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Group data and compute aggregations.' This specifies both the verb ('group and compute aggregations') and the resource ('data'), making it immediately understandable. However, it doesn't explicitly differentiate from sibling tools like 'create_pivot_table' or 'analyze_time_series' which might have overlapping functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. With multiple sibling tools for data analysis (e.g., 'create_pivot_table', 'analyze_time_series', 'compute_correlation'), there's no indication of when this specific aggregation approach is preferred or what distinguishes it from other data manipulation tools on the server.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/K02D/mcp-tabular'

If you have feedback or need assistance with the MCP directory API, please join our Discord server