Skip to main content
Glama
petrmasa
by petrmasa

load_dataset

Load a CSV file into memory and return column names, row count, and categorical distinct values to prepare for driver analysis.

Instructions

Load a CSV or ZIP-compressed CSV file into memory under a given name. Call this ONCE before find_drivers or explain_segment — you do not need to reload the same file again within the same session.

Returns column names, row count, and for categorical columns their distinct values. Use this metadata to:

  1. Identify the target variable and target class for find_drivers.

  2. Spot columns that are direct encodings or duplicates of the target (e.g. a numeric "survived" column when the target is "alive") — pass those in the attributes exclusion list so find_drivers does not pick them as trivial drivers.

Args: name: Short label to refer to this dataset in later calls (e.g. "accidents") path: Absolute or relative path to a CSV or ZIP-compressed CSV file separator: Column delimiter — use "\t" for tab-separated files, default is "," encoding: File encoding, default "utf-8" (use "cp1250" for Windows Eastern European files)

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
nameYes
pathYes
separatorNo,
encodingNoutf-8

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It discloses return values (column names, row count, distinct values for categorical columns) and hints at in-memory loading. It does not fully detail side effects or error conditions, but for a load function, this is adequate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured: first paragraph for purpose and usage, second for return and metadata application, third for parameters. Every sentence is informative and earns its place, with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (loading datasets), the description covers the essential: input parameters, usage sequence, and output metadata. It integrates with sibling tools (find_drivers, explain_segment) and provides actionable guidance. Output schema exists, but description still explains return values, making it complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, so description compensates by explaining each parameter: name as a short label, path as absolute/relative path, separator with default and tab example, encoding with utf-8 default and cp1250 for Windows. This adds significant meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'load', resource 'CSV or ZIP-compressed CSV file', and the purpose 'into memory under a given name'. It also distinguishes itself from siblings by indicating it should be called ONCE before find_drivers or explain_segment.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to call ('ONCE before find_drivers or explain_segment') and that reloading the same file is unnecessary. Provides guidance on using returned metadata for find_drivers, including identifying target variables and exclusion lists.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/petrmasa/key_drivers_mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server