Skip to main content
Glama
cfahlgren1

HF Dataset MCP

by cfahlgren1

search_datasets

Search and filter Hugging Face datasets by name, tags, author, or description to find relevant data for machine learning projects.

Instructions

Find datasets on the Hugging Face Hub by name, tag, or author

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
searchNoQuery to match against dataset names and descriptions
authorNoFilter by dataset owner (user or organization)
filterNoTag filters (e.g., task_categories:text-classification, language:en)
sortNoSort order for results
directionNoSort direction (default: desc)
limitNoMax results to return (default: 20, max: 100)

Implementation Reference

  • The handler function for the search_datasets tool, which processes arguments and fetches datasets from the Hugging Face Hub.
    async ({ search, author, filter, sort, direction, limit }) => {
      const params: Record<string, string | number | string[] | undefined> = {
        search,
        author,
        filter,
        sort,
        direction: direction === "asc" ? "1" : direction === "desc" ? "-1" : undefined,
        limit: limit ?? 20,
      };
    
      const datasets = await fetchHub<DatasetInfo[]>("/api/datasets", params);
    
      const results = datasets.map((d) => ({
        id: d.id,
        author: d.author,
        description: d.description?.slice(0, 200),
        downloads: d.downloads,
        likes: d.likes,
        trending_score: d.trendingScore,
        tags: d.tags?.slice(0, 10),
        last_modified: d.lastModified,
        private: d.private,
        gated: d.gated,
      }));
    
      return {
        content: [
          {
            type: "text" as const,
            text: JSON.stringify(results, null, 2),
          },
        ],
      };
    }
  • Zod schema defining the input parameters for the search_datasets tool.
    {
      search: z
        .string()
        .optional()
        .describe("Query to match against dataset names and descriptions"),
      author: z
        .string()
        .optional()
        .describe("Filter by dataset owner (user or organization)"),
      filter: z
        .array(z.string())
        .optional()
        .describe(
          "Tag filters (e.g., task_categories:text-classification, language:en)"
        ),
      sort: z
        .enum([
          "trending_score",
          "downloads",
          "likes",
          "created_at",
          "last_modified",
        ])
        .optional()
        .describe("Sort order for results"),
      direction: z
        .enum(["asc", "desc"])
        .optional()
        .describe("Sort direction (default: desc)"),
      limit: z
        .number()
        .int()
        .min(1)
        .max(100)
        .optional()
        .describe("Max results to return (default: 20, max: 100)"),
    },
  • Registration function for the search_datasets tool.
    export function registerSearchDatasets(server: McpServer) {
      server.tool(
        "search_datasets",
        "Find datasets on the Hugging Face Hub by name, tag, or author",
        {
          search: z
            .string()
            .optional()
            .describe("Query to match against dataset names and descriptions"),
          author: z
            .string()
            .optional()
            .describe("Filter by dataset owner (user or organization)"),
          filter: z
            .array(z.string())
            .optional()
            .describe(
              "Tag filters (e.g., task_categories:text-classification, language:en)"
            ),
          sort: z
            .enum([
              "trending_score",
              "downloads",
              "likes",
              "created_at",
              "last_modified",
            ])
            .optional()
            .describe("Sort order for results"),
          direction: z
            .enum(["asc", "desc"])
            .optional()
            .describe("Sort direction (default: desc)"),
          limit: z
            .number()
            .int()
            .min(1)
            .max(100)
            .optional()
            .describe("Max results to return (default: 20, max: 100)"),
        },
        async ({ search, author, filter, sort, direction, limit }) => {
          const params: Record<string, string | number | string[] | undefined> = {
            search,
            author,
            filter,
            sort,
            direction: direction === "asc" ? "1" : direction === "desc" ? "-1" : undefined,
            limit: limit ?? 20,
          };
    
          const datasets = await fetchHub<DatasetInfo[]>("/api/datasets", params);
    
          const results = datasets.map((d) => ({
            id: d.id,
            author: d.author,
            description: d.description?.slice(0, 200),
            downloads: d.downloads,
            likes: d.likes,
            trending_score: d.trendingScore,
            tags: d.tags?.slice(0, 10),
            last_modified: d.lastModified,
            private: d.private,
            gated: d.gated,
          }));
    
          return {
            content: [
              {
                type: "text" as const,
                text: JSON.stringify(results, null, 2),
              },
            ],
          };
        }
      );
    }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/cfahlgren1/hf-dataset-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server