Skip to main content
Glama

download_kaggle_dataset

Download files from a specific Kaggle dataset by providing the dataset reference and optional download path. Simplifies data retrieval for analysis and projects.

Instructions

Downloads files for a specific Kaggle dataset. Args: dataset_ref: The reference of the dataset (e.g., 'username/dataset-slug'). download_path: Optional. The path to download the files to. Defaults to '<project_root>/datasets/<dataset_slug>'.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
dataset_refYes
download_pathNo

Implementation Reference

  • The handler function decorated with @mcp.tool(), which registers and implements the 'download_kaggle_dataset' tool. It uses the Kaggle API to download the specified dataset to a determined path, handling directory creation, path resolution, and errors.
    @mcp.tool()
    async def download_kaggle_dataset(dataset_ref: str, download_path: str | None = None) -> str:
        """Downloads files for a specific Kaggle dataset.
        Args:
            dataset_ref: The reference of the dataset (e.g., 'username/dataset-slug').
            download_path: Optional. The path to download the files to. Defaults to '<project_root>/datasets/<dataset_slug>'.
        """
        if not api:
            # Return an informative error if API is not available
            return json.dumps({"error": "Kaggle API not authenticated or available."})
    
        print(f"Attempting to download dataset: {dataset_ref}")
    
        # Determine absolute download path based on script location
        # Use Path.cwd() if run via script entry point, or __file__ if run directly
        try:
            project_root = Path(__file__).parent.parent.resolve() # NEW: this is the parent of src/, i.e., the project root
        except NameError: # __file__ might not be defined when run via entry point
            project_root = Path.cwd() # NEW: Assume cwd is project root if __file__ is not defined
    
    
        if not download_path:
            try:
                dataset_slug = dataset_ref.split('/')[1]
            except IndexError:
                return f"Error: Invalid dataset_ref format '{dataset_ref}'. Expected 'username/dataset-slug'."
            # Construct absolute path relative to project root
            download_path_obj = project_root / "datasets" / dataset_slug # NEW
        else:
            # If a path is provided, resolve it relative to project root
            download_path_obj = project_root / Path(download_path) # NEW
            # Ensure it's fully resolved
            download_path_obj = download_path_obj.resolve()
    
    
        # Ensure download directory exists (using the Path object)
        try:
            download_path_obj.mkdir(parents=True, exist_ok=True)
            print(f"Ensured download directory exists: {download_path_obj}") # Will print absolute path
        except OSError as e:
            return f"Error creating download directory '{download_path_obj}': {e}"
    
        try:
            print(f"Calling api.dataset_download_files for {dataset_ref} to path {str(download_path_obj)}")
            # Pass the path as a string to the Kaggle API
            api.dataset_download_files(dataset_ref, path=str(download_path_obj), unzip=True, quiet=False)
            return f"Successfully downloaded and unzipped dataset '{dataset_ref}' to '{str(download_path_obj)}'." # Show absolute path
        except Exception as e:
            # Log the error potentially
            print(f"Error downloading dataset '{dataset_ref}': {e}")
            # Check for 404 Not Found
            if "404" in str(e):
                return f"Error: Dataset '{dataset_ref}' not found or access denied."
            # Check for other specific Kaggle errors if needed
            return f"Error downloading dataset '{dataset_ref}': {str(e)}"
  • src/server.py:63-63 (registration)
    The @mcp.tool() decorator registers the download_kaggle_dataset function as an MCP tool.
    @mcp.tool()
  • Type hints and docstring define the input schema: dataset_ref (str, required), download_path (str optional), returning str.
    async def download_kaggle_dataset(dataset_ref: str, download_path: str | None = None) -> str:
        """Downloads files for a specific Kaggle dataset.
        Args:
            dataset_ref: The reference of the dataset (e.g., 'username/dataset-slug').
            download_path: Optional. The path to download the files to. Defaults to '<project_root>/datasets/<dataset_slug>'.
        """
  • Initialization and authentication of the KaggleApi instance used by the tool via closure.
    api = None # Initialize api as None first
    try:
        api = KaggleApi()
        api.authenticate()
        print("Kaggle API Authenticated Successfully.")
    except Exception as e:
        print(f"Error authenticating Kaggle API: {e}")
        # api remains None if authentication fails
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It states the action but lacks critical details: whether authentication is required (Kaggle typically needs API credentials), what happens if files already exist at the path, error handling, or any rate limits. The description is minimal beyond the basic operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with a clear purpose statement followed by parameter explanations. It avoids unnecessary fluff, though the formatting with 'Args:' could be more integrated. Every sentence adds value, making it appropriately concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of downloading datasets (which often involves authentication, file management, and error cases), no annotations, and no output schema, the description is insufficient. It misses key contextual details like authentication requirements, response format, or handling of large downloads, leaving significant gaps for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds meaningful context for both parameters: it explains the format of 'dataset_ref' with an example and clarifies the default behavior and path structure for 'download_path'. With 0% schema description coverage, this compensates somewhat, but it doesn't fully detail constraints (e.g., path validity, dataset accessibility).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Downloads files') and resource ('for a specific Kaggle dataset'), making the purpose immediately understandable. It distinguishes from the sibling tool 'search_kaggle_datasets' by focusing on downloading rather than searching, though it doesn't explicitly contrast them.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. While it's implied this is for downloading after a dataset is identified (versus searching with the sibling tool), there's no explicit mention of prerequisites, dependencies, or when-not-to-use scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/arrismo/kaggle-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server