Skip to main content
Glama
mckinsey

vizro-mcp

Official
by mckinsey

load_and_analyze_data

Load and analyze data files from local paths or URLs in formats like CSV, JSON, Excel, and Parquet to extract DataFrame information and metadata for data understanding.

Instructions

Use to understand local or remote data files. Must be called with absolute paths or URLs.

Supported formats:
- CSV (.csv)
- JSON (.json)
- HTML (.html, .htm)
- Excel (.xls, .xlsx)
- OpenDocument Spreadsheet (.ods)
- Parquet (.parquet)

Returns:
    DataAnalysisResults object containing DataFrame information and metadata

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
path_or_urlYesAbsolute (important!) local file path or URL to a data file

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
validYes
df_infoYes
messageYes
df_metadataYes

Implementation Reference

  • The primary handler function for the 'load_and_analyze_data' tool. It is decorated with @mcp.tool() for registration in the FastMCP server. This function loads data from local or remote paths/URLs, analyzes it, and returns structured results including DataFrame info and metadata.
    @mcp.tool()
    def load_and_analyze_data(
        path_or_url: str = Field(description="Absolute (important!) local file path or URL to a data file"),
    ) -> DataAnalysisResults:
        """Use to understand local or remote data files. Must be called with absolute paths or URLs.
    
        Supported formats:
        - CSV (.csv)
        - JSON (.json)
        - HTML (.html, .htm)
        - Excel (.xls, .xlsx)
        - OpenDocument Spreadsheet (.ods)
        - Parquet (.parquet)
    
        Returns:
            DataAnalysisResults object containing DataFrame information and metadata
        """
        # Handle files and URLs
        path_or_url_type = path_or_url_check(path_or_url)
        mime_type, _ = mimetypes.guess_type(str(path_or_url))
        processed_path_or_url = path_or_url
    
        if path_or_url_type == "remote":
            processed_path_or_url = convert_github_url_to_raw(path_or_url)
        elif path_or_url_type == "local":
            processed_path_or_url = Path(path_or_url)
        else:
            return DataAnalysisResults(valid=False, message="Invalid path or URL", df_info=None, df_metadata=None)
    
        try:
            df, read_fn = load_dataframe_by_format(processed_path_or_url, mime_type)
    
        except Exception as e:
            return DataAnalysisResults(
                valid=False,
                message=f"""Failed to load data: {e!s}. Remember to use the ABSOLUTE path or URL!
    Alternatively, you can use any data analysis means available to you. Most important information are the column names and
    column types for passing along to the `validate_dashboard_config` or `validate_chart_code` tools.""",
                df_info=None,
                df_metadata=None,
            )
    
        df_info = get_dataframe_info(df)
        df_metadata = DFMetaData(
            file_name=Path(path_or_url).stem if isinstance(processed_path_or_url, Path) else Path(path_or_url).name,
            file_path_or_url=str(processed_path_or_url),
            file_location_type=path_or_url_type,
            read_function_string=read_fn,
        )
    
        return DataAnalysisResults(valid=True, message="Data loaded successfully", df_info=df_info, df_metadata=df_metadata)
  • Dataclass defining the output schema for the load_and_analyze_data tool.
    class DataAnalysisResults:
        """Results of the data analysis tool."""
    
        valid: bool
        message: str
        df_info: DFInfo | None
        df_metadata: DFMetaData | None
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses key behavioral traits: requirement for absolute paths/URLs, supported formats, and return type. However, it lacks details on error handling, performance characteristics (e.g., file size limits), or authentication needs for URLs. For a tool with no annotations, this is a moderate but incomplete disclosure.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded: it starts with the core purpose, then provides essential details (path/URL requirement, supported formats, return value). Every sentence earns its place with no redundancy or fluff. The bulleted list enhances readability without wasting space.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 1 parameter with full schema coverage and an output schema (implied by 'Returns: DataAnalysisResults object'), the description is largely complete. It covers purpose, input constraints, formats, and output. However, for a data analysis tool with no annotations, it could benefit from more behavioral context (e.g., limitations or side effects). The output schema reduces the need to explain return values in detail.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents the single parameter 'path_or_url' with its description. The description adds value by emphasizing 'absolute (important!)' in the context and listing supported formats, which clarifies what the parameter accepts beyond just being a string. With 0 parameters beyond the schema, this exceeds the baseline of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Use to understand local or remote data files' with a specific verb ('understand') and resource ('data files'). It distinguishes from siblings by focusing on data analysis rather than schema retrieval, validation, or visualization planning. However, it doesn't explicitly name or contrast with specific sibling tools like 'get_sample_data_info' which might overlap.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides some usage context: 'Must be called with absolute paths or URLs' and lists supported file formats, which implies when to use it (for analyzing those file types). However, it doesn't explicitly state when not to use it or mention alternatives among the sibling tools (e.g., vs. 'get_sample_data_info' for metadata only). The guidance is implied rather than explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mckinsey/vizro'

If you have feedback or need assistance with the MCP directory API, please join our Discord server