Skip to main content
Glama
michaelwaves

Hugging Face Hub MCP Server

by michaelwaves

hf_get_dataset_parquet

Retrieve auto-converted parquet files for a specific dataset, subset, or split from the Hugging Face Hub. Access structured data files efficiently for machine learning workflows.

Instructions

Get the list of auto-converted parquet files for a dataset. Can specify subset (config) and split to get specific files.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
nNoOptional shard number to get the nth parquet file
repo_idYesDataset repository ID
splitNoOptional dataset split (train, test, validation, etc.)
subsetNoOptional dataset subset/config name

Implementation Reference

  • The handler function that performs argument validation using isDatasetParquetArgs and delegates to HuggingFaceClient.getDatasetParquet to fetch parquet file information, formatting the result as MCP CallToolResult.
    export async function handleGetDatasetParquet(client: HuggingFaceClient, args: unknown): Promise<CallToolResult> {
        try {
            if (!isDatasetParquetArgs(args)) {
                throw new Error("Invalid arguments for hf_get_dataset_parquet");
            }
    
            const { repo_id, subset, split, n } = args;
            const results = await client.getDatasetParquet(repo_id, subset, split, n);
            
            return {
                content: [{ type: "text", text: results }],
                isError: false,
            };
        } catch (error) {
            return {
                content: [
                    {
                        type: "text",
                        text: `Error: ${error instanceof Error ? error.message : String(error)}`,
                    },
                ],
                isError: true,
            };
        }
    }
  • The Tool definition including name, description, and inputSchema for validating tool arguments.
    export const getDatasetParquetToolDefinition: Tool = {
        name: "hf_get_dataset_parquet", 
        description:
            "Get the list of auto-converted parquet files for a dataset. Can specify subset (config) and split to get specific files.",
        inputSchema: {
            type: "object",
            properties: {
                repo_id: {
                    type: "string",
                    description: "Dataset repository ID"
                },
                subset: {
                    type: "string",
                    description: "Optional dataset subset/config name"
                },
                split: {
                    type: "string",
                    description: "Optional dataset split (train, test, validation, etc.)"
                },
                n: {
                    type: "number",
                    description: "Optional shard number to get the nth parquet file"
                }
            },
            required: ["repo_id"]
        }
    };
  • Core utility method in HuggingFaceClient that constructs the HF API endpoint for parquet files and performs the HTTP GET request using axios.
    async getDatasetParquet(repoId: string, subset?: string, split?: string, n?: number): Promise<string> {
        try {
            let endpoint = `/api/datasets/${repoId}/parquet`;
            if (subset) {
                endpoint += `/${subset}`;
                if (split) {
                    endpoint += `/${split}`;
                    if (n !== undefined) {
                        endpoint += `/${n}.parquet`;
                    }
                }
            }
            const response: AxiosResponse = await this.httpClient.get(endpoint);
            return JSON.stringify(response.data, null, 2);
        } catch (error) {
            throw new Error(`Failed to fetch dataset parquet: ${error instanceof Error ? error.message : String(error)}`);
        }
    }
  • src/server.ts:87-88 (registration)
    Registration of the tool handler in the MCP server's CallToolRequestSchema switch statement.
    case 'hf_get_dataset_parquet':
        return handleGetDatasetParquet(this.client, args);
  • Type guard helper function for validating input arguments match DatasetParquetArgs.
    function isDatasetParquetArgs(args: unknown): args is DatasetParquetArgs {
        return (
            typeof args === "object" &&
            args !== null &&
            "repo_id" in args &&
            typeof (args as { repo_id: string }).repo_id === "string"
        );
    }

Tool Definition Quality

Score is being calculated. Check back soon.

Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/michaelwaves/hf-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server