get_data_info
Get descriptive statistics and a data preview for Stata, CSV, or Excel files. Understand variable details and optionally view head rows to explore a dataset without prior knowledge.
Instructions
Get descriptive statistics and a data preview for a data file (dta, csv, xlsx). Returns overview, variable details, and optional head rows filtered by requested variables. Use when you need to understand a dataset or have no prior knowledge of the data.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| data_path | Yes | ||
| vars_list | No | ||
| encoding | No | utf-8 | |
| head | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |
Implementation Reference
- src/stata_mcp/api/get_data_info.py:18-42 (handler)Primary handler function for the get_data_info tool. Accepts data_path, optional vars_list, encoding, and config_file. Resolves the path, determines file extension, fetches the appropriate data handler class, instantiates it, and returns JSON-serialized dataset info.
def get_data_info( data_path: str, vars_list: List[str] | None = None, encoding: str = "utf-8", config_file: str | Path | None = None, ) -> str: """Return descriptive statistics for a supported dataset.""" runtime = create_runtime_context(config_file=config_file) resolved_data_path = Path(data_path).expanduser().resolve() data_extension = resolved_data_path.suffix.lower().strip(".") data_info_cls = get_data_handler(data_extension) if not data_info_cls: return f"Unsupported file extension now: {data_extension}" data_info = data_info_cls( resolved_data_path, vars_list, encoding=encoding, cache_dir=runtime.tmp_base_path, ) try: return json.dumps(data_info.info, ensure_ascii=False) except Exception as error: return f"Failed to generate data summary for {resolved_data_path}: {error}" - src/stata_mcp/mcp_servers.py:379-425 (handler)Alternative (legacy/mcp-server) handler for get_data_info. Same core logic but uses config.STATA_MCP_FOLDER.TMP for caching and supports a head parameter (row preview). Also includes logging and cache awareness.
def get_data_info( data_path: str, vars_list: List[str] | None = None, encoding: str = "utf-8", head: int = 0, ) -> str: """ Return descriptive statistics for a supported data file. Args: data_path (str): Absolute path to .dta, .csv, .xlsx, .xls, .sav file. vars_list (List[str] | None): Optional variable subset (default: all variables). encoding (str): File encoding (ignored for .dta). head (int): Number of preview rows (0 = disabled). Returns: str: JSON string with overview, variable details, and config. Examples: >>> get_data_info("/Applications/Stata/auto.dta") >>> get_data_info("/Applications/Stata/auto.dta", vars_list=["price", "mpg"], head=5) """ data_path = Path(data_path).expanduser().resolve() data_extension = data_path.suffix.lower().strip(".") # Lazy import: pandas/numpy/requests are heavy, only load when needed from .data_info import get_data_handler # Get the appropriate data handler class from the registry data_info_cls = get_data_handler(data_extension) if not data_info_cls: logging.error(f"Unsupported file extension: {data_extension} for data file: {data_path}") return f"Unsupported file extension now: {data_extension}" data_info = data_info_cls(data_path, vars_list, encoding=encoding, cache_dir=config.STATA_MCP_FOLDER.TMP, head=head) try: info = data_info.info if data_info.is_cache: saved_path = info.get("saved_path", None) logging.info(f"Successfully generated data summary for {data_path}, saved to {saved_path}") else: logging.info(f"Successfully generated data summary for {data_path}") return json.dumps(info, ensure_ascii=False) except Exception as e: logging.error(f"Failed to generate data summary for {data_path}: {str(e)}") return f"Failed to generate data summary for {data_path}: {str(e)}" - src/stata_mcp/mcp_servers.py:597-606 (registration)Registration entry for get_data_info in the _TOOL_REGISTRY dict. Maps the tool name to its description, the handler function, and the profiles ('core', 'all') under which it is registered.
"get_data_info": { "description": ( "Get descriptive statistics and a data preview for a data file " "(dta, csv, xlsx). Returns overview, variable details, " "and optional head rows filtered by requested variables. " "Use when you need to understand a dataset or have no prior knowledge of the data." ), "func": get_data_info, "profiles": {"core", "all"}, }, - src/stata_mcp/api/__init__.py:12-27 (schema)Re-export of get_data_info from the api package, making it available via from ..api import get_data_info.
from .get_data_info import get_data_info from .read_log import read_log from .stata_do import stata_do from .stata_help import stata_help from .write_dofile import write_dofile __all__ = [ "RuntimeContext", "create_runtime_context", "ado_package_install", "get_data_info", "read_log", "stata_do", "stata_help", "write_dofile", ] - CLI parser definition flagging get_data_info as a 'core' tool in the --core argument's help text.
help="Register only core tools (stata_do, get_data_info, help)",