Skip to main content
Glama
shibuiwilliam

MCP Data Wrangler

data_mean

Calculate mean values for each column in a dataset using the input data file path. Simplify data analysis and preprocessing within the MCP Data Wrangler server for accurate descriptive statistics.

Instructions

Mean values for each column

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
input_data_file_pathNoPath to the input data file

Implementation Reference

  • The handler function that implements the core logic of the 'data_mean' tool: loads data from file, computes column means using Polars DataFrame.mean(), formats as JSON, and returns as TextContent.
    async def handle_data_mean(
        arguments: dict[str, Any],
    ) -> list[types.TextContent | types.ImageContent | types.EmbeddedResource]:
        data_mean_input = DataMeanInputSchema.from_args(arguments)
        mean_df = data_mean_input.df.mean()
    
        # Convert the DataFrame to a dictionary format
        mean_dict = {
            "description": "Mean values for each column",
            "mean_values": {
                col: str(val) if val is not None else None for col, val in zip(mean_df.columns, mean_df.row(0))
            },
        }
    
        return [
            types.TextContent(
                type="text",
                text=json.dumps(mean_dict),
            )
        ]
  • Pydantic model subclassing Data for input validation of 'data_mean' tool. Provides inputSchema() for MCP Tool registration, and factory methods to load DataFrame from file path argument.
    class DataMeanInputSchema(Data):
        model_config = ConfigDict(
            validate_assignment=True,
            frozen=True,
            extra="forbid",
            arbitrary_types_allowed=True,
        )
    
        @staticmethod
        def input_schema() -> dict:
            return {
                "type": "object",
                "properties": {
                    "input_data_file_path": {
                        "type": "string",
                        "description": "Path to the input data file",
                    },
                },
            }
    
        @staticmethod
        def from_schema(input_data_file_path: str) -> "DataMeanInputSchema":
            data = Data.from_file(input_data_file_path)
            return DataMeanInputSchema(df=data.df)
    
        @staticmethod
        def from_args(arguments: dict[str, Any]) -> "DataMeanInputSchema":
            input_data_file_path = arguments["input_data_file_path"]
            return DataMeanInputSchema.from_schema(input_data_file_path=input_data_file_path)
  • MCP Tool registration for 'data_mean': specifies name 'data_mean', description 'Mean values for each column', and references input schema from DataMeanInputSchema.input_schema().
    types.Tool(
        name=MCPServerDataWrangler.data_mean.value[0],
        description=MCPServerDataWrangler.data_mean.value[1],
        inputSchema=DataMeanInputSchema.input_schema(),
    ),
  • Handler mapping registration: associates tool name 'data_mean' with handle_data_mean function.
    MCPServerDataWrangler.data_mean.value[0]: handle_data_mean,
    MCPServerDataWrangler.data_mean_horizontal.value[0]: handle_data_mean_horizontal,
  • Enum definition providing the canonical name and description for the 'data_mean' tool.
    data_mean = ("data_mean", "Mean values for each column")
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It states the tool calculates mean values, implying a read-only operation, but doesn't disclose critical behaviors: whether it modifies the input file, requires specific data formats, handles errors (e.g., missing values), or returns results in a particular structure. For a tool with no annotation coverage, this leaves significant gaps in understanding its behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise ('Mean values for each column'), which is efficient and front-loaded with the core purpose. However, it's under-specified rather than optimally concise—it could benefit from slightly more detail (e.g., 'Calculates the arithmetic mean for numerical columns in a data file') without becoming verbose. Every word earns its place, but more value could be added.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (a statistical calculation tool), lack of annotations, and no output schema, the description is incomplete. It doesn't explain what the tool returns (e.g., a dictionary of column means), error conditions, or dependencies on data types. For a tool with 1 parameter but significant behavioral implications, more context is needed to make it fully usable by an AI agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 1 parameter with 100% description coverage ('Path to the input data file'), so the schema does the heavy lifting. The description adds no meaning beyond the schema—it doesn't explain what 'input data file' entails (e.g., CSV, JSON), constraints, or examples. With high schema coverage, the baseline is 3, as the description doesn't compensate but doesn't detract either.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Mean values for each column' states what the tool does (calculates means) but is vague about the resource (data from a file) and lacks specificity. It distinguishes from siblings like data_max or data_min by indicating it calculates means rather than other statistics, but doesn't clarify the scope (e.g., numerical columns only) or how it handles non-numerical data.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. The description doesn't mention prerequisites (e.g., requires a data file), exclusions (e.g., not for categorical data), or comparisons to siblings like data_mean_horizontal (for row-wise means) or describe_data (which might include means). Usage is implied from the name and context, but not explicitly stated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/shibuiwilliam/mcp-server-data-wrangler'

If you have feedback or need assistance with the MCP directory API, please join our Discord server