Skip to main content
Glama
NovaAI-innovation

CSV MCP Server

get_statistics

Generate statistical summaries for numeric columns in CSV files to analyze data distribution and identify patterns.

Instructions

Get statistical summary of numeric columns in the CSV file.

Args:
    filename: Name of the CSV file

Returns:
    Dictionary with statistical analysis of numeric columns

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
filenameYes

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • MCP tool handler and registration for 'get_statistics'. This decorated function handles the tool invocation and delegates to the CSVManager implementation.
    @mcp.tool()
    def get_statistics(filename: str) -> Dict[str, Any]:
        """
        Get statistical summary of numeric columns in the CSV file.
        
        Args:
            filename: Name of the CSV file
        
        Returns:
            Dictionary with statistical analysis of numeric columns
        """
        try:
            return csv_manager.get_statistics(filename)
        except Exception as e:
            return {"success": False, "error": str(e)}
  • Core implementation of get_statistics method in CSVManager class. Loads the CSV, selects numeric columns, computes descriptive statistics using pandas.describe(), and returns formatted results.
    def get_statistics(self, filename: str) -> Dict[str, Any]:
        """Get statistical summary of numeric columns in the CSV file."""
        filepath = self._get_file_path(filename)
        
        if not filepath.exists():
            raise FileNotFoundError(f"CSV file '{filename}' not found")
        
        try:
            df = pd.read_csv(filepath)
            
            # Get numeric columns only
            numeric_df = df.select_dtypes(include=['number'])
            
            if numeric_df.empty:
                return {
                    "success": True,
                    "filename": filename,
                    "message": "No numeric columns found",
                    "statistics": {}
                }
            
            # Convert describe results to serializable format
            stats_dict = {}
            for col in numeric_df.columns:
                col_stats = numeric_df[col].describe()
                stats_dict[col] = {stat: float(value) if pd.notna(value) else None 
                                 for stat, value in col_stats.items()}
            
            return {
                "success": True,
                "filename": filename,
                "statistics": stats_dict,
                "numeric_columns": list(numeric_df.columns),
                "total_columns": len(df.columns),
                "null_counts": {col: int(count) for col, count in numeric_df.isnull().sum().items()}
            }
        except Exception as e:
            logger.error(f"Failed to get statistics: {e}")
            raise
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states what the tool does but lacks important behavioral details: it doesn't specify what happens if the file doesn't exist, if there are no numeric columns, what specific statistics are calculated (mean, median, etc.), whether this is a read-only operation, or any performance considerations. The description provides basic functionality but misses critical behavioral context for a tool with no annotation coverage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and well-structured with clear sections (purpose statement, Args, Returns). Each sentence earns its place by providing essential information. The front-loaded purpose statement is clear, though the formatting with separate sections could be slightly more concise. No wasted words or redundant information is present.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (statistical analysis), no annotations, and the presence of an output schema (which handles return value documentation), the description is minimally complete. It covers the basic purpose and parameters but lacks important context about error conditions, statistical methodology, and behavioral constraints. The output schema existence means the description doesn't need to detail return values, but other gaps remain for a tool performing data analysis.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description explicitly documents the single parameter ('filename: Name of the CSV file') in the Args section, adding semantic meaning beyond the schema's 0% description coverage. However, it doesn't provide additional context like file path requirements, supported CSV formats, or encoding considerations. With only one parameter and the description compensating for the schema's lack of documentation, this meets the baseline for adequate parameter semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('Get') and resource ('statistical summary of numeric columns in the CSV file'). It distinguishes from siblings like 'read_csv' or 'filter_data' by focusing specifically on statistical analysis rather than general data reading or manipulation. However, it doesn't explicitly differentiate from potential statistical siblings (none exist in the list).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites (e.g., file must exist), when not to use it (e.g., for non-CSV files or non-numeric analysis), or compare it to siblings like 'get_info' or 'validate_data' that might provide different types of file information. The usage context is implied but not explicitly stated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/NovaAI-innovation/csv-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server