hf_get_dataset_parquet
Retrieve auto-converted parquet files for a specific dataset, subset, or split from the Hugging Face Hub. Access structured data files efficiently for machine learning workflows.
Instructions
Get the list of auto-converted parquet files for a dataset. Can specify subset (config) and split to get specific files.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| n | No | Optional shard number to get the nth parquet file | |
| repo_id | Yes | Dataset repository ID | |
| split | No | Optional dataset split (train, test, validation, etc.) | |
| subset | No | Optional dataset subset/config name |
Implementation Reference
- src/tools/datasets.ts:225-249 (handler)The handler function that performs argument validation using isDatasetParquetArgs and delegates to HuggingFaceClient.getDatasetParquet to fetch parquet file information, formatting the result as MCP CallToolResult.export async function handleGetDatasetParquet(client: HuggingFaceClient, args: unknown): Promise<CallToolResult> { try { if (!isDatasetParquetArgs(args)) { throw new Error("Invalid arguments for hf_get_dataset_parquet"); } const { repo_id, subset, split, n } = args; const results = await client.getDatasetParquet(repo_id, subset, split, n); return { content: [{ type: "text", text: results }], isError: false, }; } catch (error) { return { content: [ { type: "text", text: `Error: ${error instanceof Error ? error.message : String(error)}`, }, ], isError: true, }; } }
- src/tools/datasets.ts:83-109 (schema)The Tool definition including name, description, and inputSchema for validating tool arguments.export const getDatasetParquetToolDefinition: Tool = { name: "hf_get_dataset_parquet", description: "Get the list of auto-converted parquet files for a dataset. Can specify subset (config) and split to get specific files.", inputSchema: { type: "object", properties: { repo_id: { type: "string", description: "Dataset repository ID" }, subset: { type: "string", description: "Optional dataset subset/config name" }, split: { type: "string", description: "Optional dataset split (train, test, validation, etc.)" }, n: { type: "number", description: "Optional shard number to get the nth parquet file" } }, required: ["repo_id"] } };
- src/client.ts:97-114 (helper)Core utility method in HuggingFaceClient that constructs the HF API endpoint for parquet files and performs the HTTP GET request using axios.async getDatasetParquet(repoId: string, subset?: string, split?: string, n?: number): Promise<string> { try { let endpoint = `/api/datasets/${repoId}/parquet`; if (subset) { endpoint += `/${subset}`; if (split) { endpoint += `/${split}`; if (n !== undefined) { endpoint += `/${n}.parquet`; } } } const response: AxiosResponse = await this.httpClient.get(endpoint); return JSON.stringify(response.data, null, 2); } catch (error) { throw new Error(`Failed to fetch dataset parquet: ${error instanceof Error ? error.message : String(error)}`); } }
- src/server.ts:87-88 (registration)Registration of the tool handler in the MCP server's CallToolRequestSchema switch statement.case 'hf_get_dataset_parquet': return handleGetDatasetParquet(this.client, args);
- src/tools/datasets.ts:155-162 (helper)Type guard helper function for validating input arguments match DatasetParquetArgs.function isDatasetParquetArgs(args: unknown): args is DatasetParquetArgs { return ( typeof args === "object" && args !== null && "repo_id" in args && typeof (args as { repo_id: string }).repo_id === "string" ); }