Skip to main content
Glama
mckinsey

vizro-mcp

Official
by mckinsey

get_sample_data_info

Retrieve sample dataset information for data visualization tasks. Provides details on iris, tips, stocks, and gapminder datasets to help select appropriate data for charts and graphs.

Instructions

If user provides no data, use this tool to get sample data information.

Use the following data for the below purposes:
    - iris: mostly numerical with one categorical column, good for scatter, histogram, boxplot, etc.
    - tips: contains mix of numerical and categorical columns, good for bar, pie, etc.
    - stocks: stock prices, good for line, scatter, generally things that change over time
    - gapminder: demographic data, good for line, scatter, generally things with maps or many categories

Returns:
    Data info object containing information about the dataset.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
data_nameYesName of the dataset to get sample data for

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_nameYes
file_path_or_urlYes
column_names_typesNo
file_location_typeYes
read_function_stringYes

Implementation Reference

  • The primary handler function for the 'get_sample_data_info' MCP tool, which is also where it is registered using the @mcp.tool() decorator. It dispatches predefined sample dataset metadata based on the input data_name.
    @mcp.tool()
    def get_sample_data_info(
        data_name: Literal["iris", "tips", "stocks", "gapminder"] = Field(
            description="Name of the dataset to get sample data for"
        ),
    ) -> DFMetaData:
        """If user provides no data, use this tool to get sample data information.
    
        Use the following data for the below purposes:
            - iris: mostly numerical with one categorical column, good for scatter, histogram, boxplot, etc.
            - tips: contains mix of numerical and categorical columns, good for bar, pie, etc.
            - stocks: stock prices, good for line, scatter, generally things that change over time
            - gapminder: demographic data, good for line, scatter, generally things with maps or many categories
    
        Returns:
            Data info object containing information about the dataset.
        """
        if data_name == "iris":
            return IRIS
        elif data_name == "tips":
            return TIPS
        elif data_name == "stocks":
            return STOCKS
        elif data_name == "gapminder":
            return GAPMINDER
  • Pydantic-style dataclass defining the DFMetaData type, which serves as the schema for the tool's return value.
    @dataclass
    class DFMetaData:
        file_name: str
        file_path_or_url: str
        file_location_type: Literal["local", "remote"]
        read_function_string: Literal["pd.read_csv", "pd.read_json", "pd.read_html", "pd.read_parquet", "pd.read_excel"]
        column_names_types: dict[str, str] | None = None
  • Supporting dataclass instances providing metadata for the four sample datasets returned by the tool.
    IRIS = DFMetaData(
        file_name="iris_data",
        file_path_or_url="https://raw.githubusercontent.com/plotly/datasets/master/iris-id.csv",
        file_location_type="remote",
        read_function_string="pd.read_csv",
        column_names_types={
            "sepal_length": "float",
            "sepal_width": "float",
            "petal_length": "float",
            "petal_width": "float",
            "species": "str",
        },
    )
    
    TIPS = DFMetaData(
        file_name="tips_data",
        file_path_or_url="https://raw.githubusercontent.com/plotly/datasets/master/tips.csv",
        file_location_type="remote",
        read_function_string="pd.read_csv",
        column_names_types={
            "total_bill": "float",
            "tip": "float",
            "sex": "str",
            "smoker": "str",
            "day": "str",
            "time": "str",
            "size": "int",
        },
    )
    
    STOCKS = DFMetaData(
        file_name="stocks_data",
        file_path_or_url="https://raw.githubusercontent.com/plotly/datasets/master/stockdata.csv",
        file_location_type="remote",
        read_function_string="pd.read_csv",
        column_names_types={
            "Date": "str",
            "IBM": "float",
            "MSFT": "float",
            "SBUX": "float",
            "AAPL": "float",
            "GSPC": "float",
        },
    )
    
    GAPMINDER = DFMetaData(
        file_name="gapminder_data",
        file_path_or_url="https://raw.githubusercontent.com/plotly/datasets/master/gapminder_unfiltered.csv",
        file_location_type="remote",
        read_function_string="pd.read_csv",
        column_names_types={
            "country": "str",
            "continent": "str",
            "year": "int",
            "lifeExp": "float",
            "pop": "int",
            "gdpPercap": "float",
        },
    )
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It describes the tool's behavior (returns a data info object) and provides context about sample datasets, but doesn't cover important behavioral aspects like error handling, response format details, or performance characteristics that would be helpful for an agent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections: usage condition, dataset details, and return information. It's appropriately sized for its purpose, though the dataset descriptions could be slightly more concise. Every sentence serves a clear purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity, 100% schema coverage, and presence of an output schema, the description provides good contextual completeness. It explains when to use the tool, details about available datasets, and what to expect in return. The main gap is lack of behavioral details that would be helpful despite the output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has 100% description coverage for its single parameter, so the baseline is 3. The description adds significant value by explaining the semantics of each enum value (iris, tips, stocks, gapminder) with specific use cases and characteristics, which goes well beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'get sample data information' when no user data is provided. It specifies the action (get) and resource (sample data information), though it doesn't explicitly differentiate from sibling tools like 'load_and_analyze_data' which might handle actual data loading.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidance: 'If user provides no data, use this tool to get sample data information.' This clearly defines when to use this tool versus alternatives, establishing a specific context for its application.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mckinsey/vizro'

If you have feedback or need assistance with the MCP directory API, please join our Discord server