Skip to main content
Glama
cfahlgren1

HF Dataset MCP

by cfahlgren1

list_parquet_files

Retrieve direct download URLs for Parquet files from Hugging Face datasets to enable data processing and analysis.

Instructions

Get URLs for the dataset's Parquet files for direct download or processing

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
datasetYesDataset ID (e.g., 'stanfordnlp/imdb')

Implementation Reference

  • The handler function that executes the logic for listing parquet files for a given dataset.
      async ({ dataset }) => {
        const data = await fetchDatasetViewer<ParquetResponse>("/parquet", {
          dataset,
        });
    
        return {
          content: [
            {
              type: "text" as const,
              text: JSON.stringify(data.parquet_files, null, 2),
            },
          ],
        };
      }
    );
  • Registration function that defines the "list_parquet_files" tool and its parameters.
    export function registerListParquetFiles(server: McpServer) {
      server.tool(
        "list_parquet_files",
        "Get URLs for the dataset's Parquet files for direct download or processing",
        {
          dataset: z.string().describe("Dataset ID (e.g., 'stanfordnlp/imdb')"),
        },
        async ({ dataset }) => {
          const data = await fetchDatasetViewer<ParquetResponse>("/parquet", {
            dataset,
          });
    
          return {
            content: [
              {
                type: "text" as const,
                text: JSON.stringify(data.parquet_files, null, 2),
              },
            ],
          };
        }
      );
    }
  • Type definition for the response structure returned by the parquet file API.
    interface ParquetResponse {
      parquet_files: Array<{
        dataset: string;
        config: string;
        split: string;
        url: string;
        filename: string;
        size: number;
      }>;
      pending: unknown[];
      failed: unknown[];
      partial: boolean;
    }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/cfahlgren1/hf-dataset-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server