get_data_info

Get descriptive statistics and a data preview for Stata, CSV, or Excel files. Understand variable details and optionally view head rows to explore a dataset without prior knowledge.

Instructions

Get descriptive statistics and a data preview for a data file (dta, csv, xlsx). Returns overview, variable details, and optional head rows filtered by requested variables. Use when you need to understand a dataset or have no prior knowledge of the data.

Input Schema

TableJSON Schema

Name	Required	Default
`data_path`	Yes
`vars_list`	No
`encoding`	No	utf-8
`head`	No

Output Schema

TableJSON Schema

Name	Required	Description	Default
`result`	Yes

Implementation Reference

src/stata_mcp/api/get_data_info.py:18-42 (handler)

Primary handler function for the get_data_info tool. Accepts data_path, optional vars_list, encoding, and config_file. Resolves the path, determines file extension, fetches the appropriate data handler class, instantiates it, and returns JSON-serialized dataset info.

def get_data_info(
    data_path: str,
    vars_list: List[str] | None = None,
    encoding: str = "utf-8",
    config_file: str | Path | None = None,
) -> str:
    """Return descriptive statistics for a supported dataset."""
    runtime = create_runtime_context(config_file=config_file)
    resolved_data_path = Path(data_path).expanduser().resolve()
    data_extension = resolved_data_path.suffix.lower().strip(".")

    data_info_cls = get_data_handler(data_extension)
    if not data_info_cls:
        return f"Unsupported file extension now: {data_extension}"

    data_info = data_info_cls(
        resolved_data_path,
        vars_list,
        encoding=encoding,
        cache_dir=runtime.tmp_base_path,
    )
    try:
        return json.dumps(data_info.info, ensure_ascii=False)
    except Exception as error:
        return f"Failed to generate data summary for {resolved_data_path}: {error}"

src/stata_mcp/mcp_servers.py:379-425 (handler)

Alternative (legacy/mcp-server) handler for get_data_info. Same core logic but uses config.STATA_MCP_FOLDER.TMP for caching and supports a head parameter (row preview). Also includes logging and cache awareness.

def get_data_info(
        data_path: str,
        vars_list: List[str] | None = None,
        encoding: str = "utf-8",
        head: int = 0,
) -> str:
    """
    Return descriptive statistics for a supported data file.

    Args:
        data_path (str): Absolute path to .dta, .csv, .xlsx, .xls, .sav file.
        vars_list (List[str] | None): Optional variable subset (default: all variables).
        encoding (str): File encoding (ignored for .dta).
        head (int): Number of preview rows (0 = disabled).

    Returns:
        str: JSON string with overview, variable details, and config.

    Examples:
        >>> get_data_info("/Applications/Stata/auto.dta")
        >>> get_data_info("/Applications/Stata/auto.dta", vars_list=["price", "mpg"], head=5)
    """
    data_path = Path(data_path).expanduser().resolve()
    data_extension = data_path.suffix.lower().strip(".")

    # Lazy import: pandas/numpy/requests are heavy, only load when needed
    from .data_info import get_data_handler

    # Get the appropriate data handler class from the registry
    data_info_cls = get_data_handler(data_extension)

    if not data_info_cls:
        logging.error(f"Unsupported file extension: {data_extension} for data file: {data_path}")
        return f"Unsupported file extension now: {data_extension}"

    data_info = data_info_cls(data_path, vars_list, encoding=encoding, cache_dir=config.STATA_MCP_FOLDER.TMP, head=head)
    try:
        info = data_info.info
        if data_info.is_cache:
            saved_path = info.get("saved_path", None)
            logging.info(f"Successfully generated data summary for {data_path}, saved to {saved_path}")
        else:
            logging.info(f"Successfully generated data summary for {data_path}")
        return json.dumps(info, ensure_ascii=False)
    except Exception as e:
        logging.error(f"Failed to generate data summary for {data_path}: {str(e)}")
        return f"Failed to generate data summary for {data_path}: {str(e)}"

src/stata_mcp/mcp_servers.py:597-606 (registration)

Registration entry for get_data_info in the _TOOL_REGISTRY dict. Maps the tool name to its description, the handler function, and the profiles ('core', 'all') under which it is registered.

"get_data_info": {
    "description": (
        "Get descriptive statistics and a data preview for a data file "
        "(dta, csv, xlsx). Returns overview, variable details, "
        "and optional head rows filtered by requested variables. "
        "Use when you need to understand a dataset or have no prior knowledge of the data."
    ),
    "func": get_data_info,
    "profiles": {"core", "all"},
},

src/stata_mcp/api/__init__.py:12-27 (schema)

Re-export of get_data_info from the api package, making it available via from ..api import get_data_info.

from .get_data_info import get_data_info
from .read_log import read_log
from .stata_do import stata_do
from .stata_help import stata_help
from .write_dofile import write_dofile

__all__ = [
    "RuntimeContext",
    "create_runtime_context",
    "ado_package_install",
    "get_data_info",
    "read_log",
    "stata_do",
    "stata_help",
    "write_dofile",
]

src/stata_mcp/cli/_parsers.py:106-106 (helper)
CLI parser definition flagging get_data_info as a 'core' tool in the --core argument's help text.
```
help="Register only core tools (stata_do, get_data_info, help)",
```

Stata-MCP

get_data_info

Instructions

Input Schema

Output Schema

Implementation Reference

Tool Definition Quality

Other Tools

Latest Blog Posts

MCP directory API