get_dataset_info
Retrieve comprehensive metadata for a specific dataset, including title, description, organization, tags, resource count, dates, and license details to evaluate content before accessing files.
Instructions
Get detailed information about a specific dataset.
Returns comprehensive metadata including title, description, organization, tags, resource count, creation/update dates, license, and other details. Use this after finding a dataset with search_datasets to get more context before exploring its resources.
Typical workflow:
Use search_datasets to find datasets of interest
Use get_dataset_info to get detailed information about a specific dataset
Use list_dataset_resources to see what files are available in the dataset
Args: dataset_id: The ID of the dataset to get information about (obtained from search_datasets)
Returns: Formatted text with detailed dataset information
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| dataset_id | Yes |
Implementation Reference
- tools/get_dataset_info.py:9-104 (handler)The main handler for the get_dataset_info tool. Fetches dataset details from the API using a helper function and constructs a formatted string with key metadata like title, description, organization, tags, resources count, dates, license, etc. Includes error handling for not found and other exceptions.async def get_dataset_info(dataset_id: str) -> str: """ Get detailed information about a specific dataset. Returns comprehensive metadata including title, description, organization, tags, resource count, creation/update dates, license, and other details. Use this after finding a dataset with search_datasets to get more context before exploring its resources. Typical workflow: 1. Use search_datasets to find datasets of interest 2. Use get_dataset_info to get detailed information about a specific dataset 3. Use list_dataset_resources to see what files are available in the dataset Args: dataset_id: The ID of the dataset to get information about (obtained from search_datasets) Returns: Formatted text with detailed dataset information """ try: # Get full dataset data from API v1 via helper data = await datagouv_api_client.get_dataset_details(dataset_id) content_parts = [f"Dataset Information: {data.get('title', 'Unknown')}", ""] if data.get("id"): content_parts.append(f"ID: {data.get('id')}") if data.get("slug"): content_parts.append(f"Slug: {data.get('slug')}") content_parts.append( f"URL: {env_config.get_base_url('site')}datasets/{data.get('slug')}/" ) if data.get("description_short"): content_parts.append("") content_parts.append(f"Description: {data.get('description_short')}") if data.get("description") and data.get("description") != data.get( "description_short" ): content_parts.append("") content_parts.append( f"Full description: {data.get('description')[:500]}..." ) if data.get("organization"): org = data.get("organization", {}) if isinstance(org, dict): content_parts.append("") content_parts.append(f"Organization: {org.get('name', 'Unknown')}") if org.get("id"): content_parts.append(f" Organization ID: {org.get('id')}") # Handle tags tags = [] for tag in data.get("tags", []): if isinstance(tag, str): tags.append(tag) elif isinstance(tag, dict): tag_name = tag.get("name", "") if tag_name: tags.append(tag_name) if tags: content_parts.append("") content_parts.append(f"Tags: {', '.join(tags[:10])}") # Resources info resources = data.get("resources", []) content_parts.append("") content_parts.append(f"Resources: {len(resources)} file(s)") # Dates if data.get("created_at"): content_parts.append("") content_parts.append(f"Created: {data.get('created_at')}") if data.get("last_update"): content_parts.append(f"Last updated: {data.get('last_update')}") # License if data.get("license"): content_parts.append("") content_parts.append(f"License: {data.get('license')}") # Frequency if data.get("frequency"): content_parts.append(f"Update frequency: {data.get('frequency')}") return "\n".join(content_parts) except httpx.HTTPStatusError as e: if e.response.status_code == 404: return f"Error: Dataset with ID '{dataset_id}' not found." return f"Error: HTTP {e.response.status_code} - {str(e)}" except Exception as e: # noqa: BLE001 return f"Error: {str(e)}"
- tools/__init__.py:14-23 (registration)Top-level registration function that calls register_get_dataset_info_tool(mcp) among others to register all tools with the MCP server.def register_tools(mcp: FastMCP) -> None: """Register all MCP tools with the provided FastMCP instance.""" register_search_datasets_tool(mcp) register_query_resource_data_tool(mcp) register_get_dataset_info_tool(mcp) register_list_dataset_resources_tool(mcp) register_get_resource_info_tool(mcp) register_download_and_parse_resource_tool(mcp) register_get_metrics_tool(mcp)
- tools/get_dataset_info.py:7-8 (registration)Local registration function that defines and registers the get_dataset_info tool using the @mcp.tool() decorator.def register_get_dataset_info_tool(mcp: FastMCP) -> None: @mcp.tool()
- tools/get_dataset_info.py:9-28 (schema)Function signature and docstring defining the tool's input (dataset_id: str) and output (str), used by MCP for schema generation.async def get_dataset_info(dataset_id: str) -> str: """ Get detailed information about a specific dataset. Returns comprehensive metadata including title, description, organization, tags, resource count, creation/update dates, license, and other details. Use this after finding a dataset with search_datasets to get more context before exploring its resources. Typical workflow: 1. Use search_datasets to find datasets of interest 2. Use get_dataset_info to get detailed information about a specific dataset 3. Use list_dataset_resources to see what files are available in the dataset Args: dataset_id: The ID of the dataset to get information about (obtained from search_datasets) Returns: Formatted text with detailed dataset information """
- helpers/datagouv_api_client.py:62-79 (helper)Helper function called by the handler to retrieve raw dataset data from data.gouv.fr API v1 endpoint.async def get_dataset_details( dataset_id: str, session: httpx.AsyncClient | None = None ) -> dict[str, Any]: """ Fetch the complete dataset payload from the API v1 endpoint. """ own = session is None if own: session = httpx.AsyncClient() assert session is not None try: base_url: str = env_config.get_base_url("datagouv_api") url = f"{base_url}1/datasets/{dataset_id}/" return await _fetch_json(session, url) finally: if own: await session.aclose()