search-datasets
Find and filter datasets on Hugging Face Hub using search terms, authors, or tags to access machine learning data resources.
Instructions
Search for datasets on Hugging Face Hub
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| query | No | Search term | |
| author | No | Filter by author/organization | |
| tags | No | Filter by tags | |
| limit | No | Maximum number of results to return |
Implementation Reference
- src/huggingface/server.py:98-116 (registration)Registration of the 'search-datasets' tool in the list_tools handler, defining its name, description, and input schema.types.Tool( name="search-datasets", description="Search for datasets on Hugging Face Hub", inputSchema={ "type": "object", "properties": { "query": {"type": "string", "description": "Search term"}, "author": { "type": "string", "description": "Filter by author/organization", }, "tags": {"type": "string", "description": "Filter by tags"}, "limit": { "type": "integer", "description": "Maximum number of results to return", }, }, }, ),
- src/huggingface/server.py:321-358 (handler)Handler implementation for 'search-datasets' tool within the call_tool function. Extracts parameters, calls Hugging Face API endpoint '/datasets', handles errors, formats dataset results into JSON, and returns as text content.elif name == "search-datasets": query = arguments.get("query") author = arguments.get("author") tags = arguments.get("tags") limit = arguments.get("limit", 10) params = {"limit": limit} if query: params["search"] = query if author: params["author"] = author if tags: params["filter"] = tags data = await make_hf_request("datasets", params) if "error" in data: return [ types.TextContent( type="text", text=f"Error searching datasets: {data['error']}" ) ] # Format the results results = [] for dataset in data: dataset_info = { "id": dataset.get("id", ""), "name": dataset.get("datasetId", ""), "author": dataset.get("author", ""), "tags": dataset.get("tags", []), "downloads": dataset.get("downloads", 0), "likes": dataset.get("likes", 0), "lastModified": dataset.get("lastModified", ""), } results.append(dataset_info) return [types.TextContent(type="text", text=json.dumps(results, indent=2))]