Skip to main content
Glama

download_kaggle_dataset

Download files from a specific Kaggle dataset by providing the dataset reference and optional download path. Simplifies data retrieval for analysis and projects.

Instructions

Downloads files for a specific Kaggle dataset. Args: dataset_ref: The reference of the dataset (e.g., 'username/dataset-slug'). download_path: Optional. The path to download the files to. Defaults to '<project_root>/datasets/<dataset_slug>'.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
dataset_refYes
download_pathNo

Implementation Reference

  • The handler function decorated with @mcp.tool(), which registers and implements the 'download_kaggle_dataset' tool. It uses the Kaggle API to download the specified dataset to a determined path, handling directory creation, path resolution, and errors.
    @mcp.tool()
    async def download_kaggle_dataset(dataset_ref: str, download_path: str | None = None) -> str:
        """Downloads files for a specific Kaggle dataset.
        Args:
            dataset_ref: The reference of the dataset (e.g., 'username/dataset-slug').
            download_path: Optional. The path to download the files to. Defaults to '<project_root>/datasets/<dataset_slug>'.
        """
        if not api:
            # Return an informative error if API is not available
            return json.dumps({"error": "Kaggle API not authenticated or available."})
    
        print(f"Attempting to download dataset: {dataset_ref}")
    
        # Determine absolute download path based on script location
        # Use Path.cwd() if run via script entry point, or __file__ if run directly
        try:
            project_root = Path(__file__).parent.parent.resolve() # NEW: this is the parent of src/, i.e., the project root
        except NameError: # __file__ might not be defined when run via entry point
            project_root = Path.cwd() # NEW: Assume cwd is project root if __file__ is not defined
    
    
        if not download_path:
            try:
                dataset_slug = dataset_ref.split('/')[1]
            except IndexError:
                return f"Error: Invalid dataset_ref format '{dataset_ref}'. Expected 'username/dataset-slug'."
            # Construct absolute path relative to project root
            download_path_obj = project_root / "datasets" / dataset_slug # NEW
        else:
            # If a path is provided, resolve it relative to project root
            download_path_obj = project_root / Path(download_path) # NEW
            # Ensure it's fully resolved
            download_path_obj = download_path_obj.resolve()
    
    
        # Ensure download directory exists (using the Path object)
        try:
            download_path_obj.mkdir(parents=True, exist_ok=True)
            print(f"Ensured download directory exists: {download_path_obj}") # Will print absolute path
        except OSError as e:
            return f"Error creating download directory '{download_path_obj}': {e}"
    
        try:
            print(f"Calling api.dataset_download_files for {dataset_ref} to path {str(download_path_obj)}")
            # Pass the path as a string to the Kaggle API
            api.dataset_download_files(dataset_ref, path=str(download_path_obj), unzip=True, quiet=False)
            return f"Successfully downloaded and unzipped dataset '{dataset_ref}' to '{str(download_path_obj)}'." # Show absolute path
        except Exception as e:
            # Log the error potentially
            print(f"Error downloading dataset '{dataset_ref}': {e}")
            # Check for 404 Not Found
            if "404" in str(e):
                return f"Error: Dataset '{dataset_ref}' not found or access denied."
            # Check for other specific Kaggle errors if needed
            return f"Error downloading dataset '{dataset_ref}': {str(e)}"
  • src/server.py:63-63 (registration)
    The @mcp.tool() decorator registers the download_kaggle_dataset function as an MCP tool.
    @mcp.tool()
  • Type hints and docstring define the input schema: dataset_ref (str, required), download_path (str optional), returning str.
    async def download_kaggle_dataset(dataset_ref: str, download_path: str | None = None) -> str:
        """Downloads files for a specific Kaggle dataset.
        Args:
            dataset_ref: The reference of the dataset (e.g., 'username/dataset-slug').
            download_path: Optional. The path to download the files to. Defaults to '<project_root>/datasets/<dataset_slug>'.
        """
  • Initialization and authentication of the KaggleApi instance used by the tool via closure.
    api = None # Initialize api as None first
    try:
        api = KaggleApi()
        api.authenticate()
        print("Kaggle API Authenticated Successfully.")
    except Exception as e:
        print(f"Error authenticating Kaggle API: {e}")
        # api remains None if authentication fails
Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/arrismo/kaggle-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server