Skip to main content
Glama
michaelwaves

Hugging Face Hub MCP Server

by michaelwaves

hf_get_dataset_parquet

Retrieve auto-converted parquet files for a specific dataset, subset, or split from the Hugging Face Hub. Access structured data files efficiently for machine learning workflows.

Instructions

Get the list of auto-converted parquet files for a dataset. Can specify subset (config) and split to get specific files.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
nNoOptional shard number to get the nth parquet file
repo_idYesDataset repository ID
splitNoOptional dataset split (train, test, validation, etc.)
subsetNoOptional dataset subset/config name

Implementation Reference

  • The handler function that performs argument validation using isDatasetParquetArgs and delegates to HuggingFaceClient.getDatasetParquet to fetch parquet file information, formatting the result as MCP CallToolResult.
    export async function handleGetDatasetParquet(client: HuggingFaceClient, args: unknown): Promise<CallToolResult> {
        try {
            if (!isDatasetParquetArgs(args)) {
                throw new Error("Invalid arguments for hf_get_dataset_parquet");
            }
    
            const { repo_id, subset, split, n } = args;
            const results = await client.getDatasetParquet(repo_id, subset, split, n);
            
            return {
                content: [{ type: "text", text: results }],
                isError: false,
            };
        } catch (error) {
            return {
                content: [
                    {
                        type: "text",
                        text: `Error: ${error instanceof Error ? error.message : String(error)}`,
                    },
                ],
                isError: true,
            };
        }
    }
  • The Tool definition including name, description, and inputSchema for validating tool arguments.
    export const getDatasetParquetToolDefinition: Tool = {
        name: "hf_get_dataset_parquet", 
        description:
            "Get the list of auto-converted parquet files for a dataset. Can specify subset (config) and split to get specific files.",
        inputSchema: {
            type: "object",
            properties: {
                repo_id: {
                    type: "string",
                    description: "Dataset repository ID"
                },
                subset: {
                    type: "string",
                    description: "Optional dataset subset/config name"
                },
                split: {
                    type: "string",
                    description: "Optional dataset split (train, test, validation, etc.)"
                },
                n: {
                    type: "number",
                    description: "Optional shard number to get the nth parquet file"
                }
            },
            required: ["repo_id"]
        }
    };
  • Core utility method in HuggingFaceClient that constructs the HF API endpoint for parquet files and performs the HTTP GET request using axios.
    async getDatasetParquet(repoId: string, subset?: string, split?: string, n?: number): Promise<string> {
        try {
            let endpoint = `/api/datasets/${repoId}/parquet`;
            if (subset) {
                endpoint += `/${subset}`;
                if (split) {
                    endpoint += `/${split}`;
                    if (n !== undefined) {
                        endpoint += `/${n}.parquet`;
                    }
                }
            }
            const response: AxiosResponse = await this.httpClient.get(endpoint);
            return JSON.stringify(response.data, null, 2);
        } catch (error) {
            throw new Error(`Failed to fetch dataset parquet: ${error instanceof Error ? error.message : String(error)}`);
        }
    }
  • src/server.ts:87-88 (registration)
    Registration of the tool handler in the MCP server's CallToolRequestSchema switch statement.
    case 'hf_get_dataset_parquet':
        return handleGetDatasetParquet(this.client, args);
  • Type guard helper function for validating input arguments match DatasetParquetArgs.
    function isDatasetParquetArgs(args: unknown): args is DatasetParquetArgs {
        return (
            typeof args === "object" &&
            args !== null &&
            "repo_id" in args &&
            typeof (args as { repo_id: string }).repo_id === "string"
        );
    }
Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/michaelwaves/hf-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server