hf_list_datasets
Search, filter, and retrieve detailed metadata for datasets on the Hugging Face Hub, including downloads, likes, and tags. Refine results by author, search terms, or tags for targeted exploration.
Instructions
Get information from all datasets in the Hub. Supports filtering by search terms, authors, tags, and more. Returns paginated results with dataset metadata including downloads, likes, and tags.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| author | No | Filter datasets by author or organization (e.g., 'huggingface', 'microsoft') | |
| config | No | Whether to also fetch the repo config | |
| direction | No | Sort direction: '-1' for descending, anything else for ascending | |
| filter | No | Filter based on tags (e.g., 'task_categories:text-classification', 'languages:en') | |
| full | No | Whether to fetch most dataset data including all tags and files | |
| limit | No | Limit the number of datasets fetched | |
| search | No | Filter based on substrings for repos and their usernames (e.g., 'pets', 'microsoft') | |
| sort | No | Property to use when sorting (e.g., 'downloads', 'author') |
Implementation Reference
- src/tools/datasets.ts:173-196 (handler)The MCP tool handler for 'hf_list_datasets': validates arguments using isDatasetSearchArgs, calls client.getDatasets(), and formats the CallToolResult.export async function handleListDatasets(client: HuggingFaceClient, args: unknown): Promise<CallToolResult> { try { if (!isDatasetSearchArgs(args)) { throw new Error("Invalid arguments for hf_list_datasets"); } const results = await client.getDatasets(args as Record<string, any>); return { content: [{ type: "text", text: results }], isError: false, }; } catch (error) { return { content: [ { type: "text", text: `Error: ${error instanceof Error ? error.message : String(error)}`, }, ], isError: true, }; } }
- src/tools/datasets.ts:8-51 (schema)The tool definition for 'hf_list_datasets' including name, description, and detailed inputSchema for filtering and pagination parameters.export const listDatasetsToolDefinition: Tool = { name: "hf_list_datasets", description: "Get information from all datasets in the Hub. Supports filtering by search terms, authors, tags, and more. " + "Returns paginated results with dataset metadata including downloads, likes, and tags.", inputSchema: { type: "object", properties: { search: { type: "string", description: "Filter based on substrings for repos and their usernames (e.g., 'pets', 'microsoft')" }, author: { type: "string", description: "Filter datasets by author or organization (e.g., 'huggingface', 'microsoft')" }, filter: { type: "string", description: "Filter based on tags (e.g., 'task_categories:text-classification', 'languages:en')" }, sort: { type: "string", description: "Property to use when sorting (e.g., 'downloads', 'author')" }, direction: { type: "string", description: "Sort direction: '-1' for descending, anything else for ascending" }, limit: { type: "number", description: "Limit the number of datasets fetched" }, full: { type: "boolean", description: "Whether to fetch most dataset data including all tags and files" }, config: { type: "boolean", description: "Whether to also fetch the repo config" } }, required: [] } };
- src/client.ts:70-77 (helper)Core helper method in HuggingFaceClient: performs HTTP GET to Hugging Face Hub API '/api/datasets' endpoint with query params, returns pretty-printed JSON string of the response data.async getDatasets(params: Record<string, any> = {}): Promise<string> { try { const response: AxiosResponse = await this.httpClient.get('/api/datasets', { params }); return JSON.stringify(response.data, null, 2); } catch (error) { throw new Error(`Failed to fetch datasets: ${error instanceof Error ? error.message : String(error)}`); } }
- src/server.ts:81-82 (registration)Registration and dispatch of 'hf_list_datasets' handler in the main HuggingFaceServer's CallToolRequestHandler switch statement.case 'hf_list_datasets': return handleListDatasets(this.client, args);
- src/server.ts:55-66 (registration)Registration of 'hf_list_datasets' tool definition (listDatasetsToolDefinition) in the ListToolsRequestHandler for tool discovery.this.server.setRequestHandler(ListToolsRequestSchema, async () => ({ tools: [ listModelsToolDefinition, getModelInfoToolDefinition, getModelTagsToolDefinition, listDatasetsToolDefinition, getDatasetInfoToolDefinition, getDatasetParquetToolDefinition, getCroissantToolDefinition, getDatasetTagsToolDefinition ], }));