Skip to main content
Glama
michaelwaves

Hugging Face Hub MCP Server

by michaelwaves

hf_get_dataset_parquet

Retrieve auto-converted parquet files for a specific dataset, subset, or split from the Hugging Face Hub. Access structured data files efficiently for machine learning workflows.

Instructions

Get the list of auto-converted parquet files for a dataset. Can specify subset (config) and split to get specific files.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
nNoOptional shard number to get the nth parquet file
repo_idYesDataset repository ID
splitNoOptional dataset split (train, test, validation, etc.)
subsetNoOptional dataset subset/config name

Implementation Reference

  • The handler function that performs argument validation using isDatasetParquetArgs and delegates to HuggingFaceClient.getDatasetParquet to fetch parquet file information, formatting the result as MCP CallToolResult.
    export async function handleGetDatasetParquet(client: HuggingFaceClient, args: unknown): Promise<CallToolResult> { try { if (!isDatasetParquetArgs(args)) { throw new Error("Invalid arguments for hf_get_dataset_parquet"); } const { repo_id, subset, split, n } = args; const results = await client.getDatasetParquet(repo_id, subset, split, n); return { content: [{ type: "text", text: results }], isError: false, }; } catch (error) { return { content: [ { type: "text", text: `Error: ${error instanceof Error ? error.message : String(error)}`, }, ], isError: true, }; } }
  • The Tool definition including name, description, and inputSchema for validating tool arguments.
    export const getDatasetParquetToolDefinition: Tool = { name: "hf_get_dataset_parquet", description: "Get the list of auto-converted parquet files for a dataset. Can specify subset (config) and split to get specific files.", inputSchema: { type: "object", properties: { repo_id: { type: "string", description: "Dataset repository ID" }, subset: { type: "string", description: "Optional dataset subset/config name" }, split: { type: "string", description: "Optional dataset split (train, test, validation, etc.)" }, n: { type: "number", description: "Optional shard number to get the nth parquet file" } }, required: ["repo_id"] } };
  • Core utility method in HuggingFaceClient that constructs the HF API endpoint for parquet files and performs the HTTP GET request using axios.
    async getDatasetParquet(repoId: string, subset?: string, split?: string, n?: number): Promise<string> { try { let endpoint = `/api/datasets/${repoId}/parquet`; if (subset) { endpoint += `/${subset}`; if (split) { endpoint += `/${split}`; if (n !== undefined) { endpoint += `/${n}.parquet`; } } } const response: AxiosResponse = await this.httpClient.get(endpoint); return JSON.stringify(response.data, null, 2); } catch (error) { throw new Error(`Failed to fetch dataset parquet: ${error instanceof Error ? error.message : String(error)}`); } }
  • src/server.ts:87-88 (registration)
    Registration of the tool handler in the MCP server's CallToolRequestSchema switch statement.
    case 'hf_get_dataset_parquet': return handleGetDatasetParquet(this.client, args);
  • Type guard helper function for validating input arguments match DatasetParquetArgs.
    function isDatasetParquetArgs(args: unknown): args is DatasetParquetArgs { return ( typeof args === "object" && args !== null && "repo_id" in args && typeof (args as { repo_id: string }).repo_id === "string" ); }

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/michaelwaves/hf-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server