get_dataset_info

Retrieve comprehensive metadata for a specific dataset, including title, description, organization, tags, resource count, dates, and license details to evaluate content before accessing files.

Instructions

Get detailed information about a specific dataset.

Returns comprehensive metadata including title, description, organization, tags, resource count, creation/update dates, license, and other details. Use this after finding a dataset with search_datasets to get more context before exploring its resources.

Typical workflow:

Use search_datasets to find datasets of interest
Use get_dataset_info to get detailed information about a specific dataset
Use list_dataset_resources to see what files are available in the dataset

Args: dataset_id: The ID of the dataset to get information about (obtained from search_datasets)

Returns: Formatted text with detailed dataset information

Input Schema

TableJSON Schema

Name	Required	Description	Default
`dataset_id`	Yes

Output Schema

TableJSON Schema

Name	Required	Description	Default
`result`	Yes

Implementation Reference

tools/get_dataset_info.py:9-104 (handler)

The main handler for the get_dataset_info tool. Fetches dataset details from the API using a helper function and constructs a formatted string with key metadata like title, description, organization, tags, resources count, dates, license, etc. Includes error handling for not found and other exceptions.

async def get_dataset_info(dataset_id: str) -> str:
    """
    Get detailed information about a specific dataset.

    Returns comprehensive metadata including title, description, organization,
    tags, resource count, creation/update dates, license, and other details.
    Use this after finding a dataset with search_datasets to get more context
    before exploring its resources.

    Typical workflow:
    1. Use search_datasets to find datasets of interest
    2. Use get_dataset_info to get detailed information about a specific dataset
    3. Use list_dataset_resources to see what files are available in the dataset

    Args:
        dataset_id: The ID of the dataset to get information about (obtained from search_datasets)

    Returns:
        Formatted text with detailed dataset information
    """
    try:
        # Get full dataset data from API v1 via helper
        data = await datagouv_api_client.get_dataset_details(dataset_id)

        content_parts = [f"Dataset Information: {data.get('title', 'Unknown')}", ""]

        if data.get("id"):
            content_parts.append(f"ID: {data.get('id')}")
        if data.get("slug"):
            content_parts.append(f"Slug: {data.get('slug')}")
            content_parts.append(
                f"URL: {env_config.get_base_url('site')}datasets/{data.get('slug')}/"
            )

        if data.get("description_short"):
            content_parts.append("")
            content_parts.append(f"Description: {data.get('description_short')}")

        if data.get("description") and data.get("description") != data.get(
            "description_short"
        ):
            content_parts.append("")
            content_parts.append(
                f"Full description: {data.get('description')[:500]}..."
            )

        if data.get("organization"):
            org = data.get("organization", {})
            if isinstance(org, dict):
                content_parts.append("")
                content_parts.append(f"Organization: {org.get('name', 'Unknown')}")
                if org.get("id"):
                    content_parts.append(f"  Organization ID: {org.get('id')}")

        # Handle tags
        tags = []
        for tag in data.get("tags", []):
            if isinstance(tag, str):
                tags.append(tag)
            elif isinstance(tag, dict):
                tag_name = tag.get("name", "")
                if tag_name:
                    tags.append(tag_name)
        if tags:
            content_parts.append("")
            content_parts.append(f"Tags: {', '.join(tags[:10])}")

        # Resources info
        resources = data.get("resources", [])
        content_parts.append("")
        content_parts.append(f"Resources: {len(resources)} file(s)")

        # Dates
        if data.get("created_at"):
            content_parts.append("")
            content_parts.append(f"Created: {data.get('created_at')}")
        if data.get("last_update"):
            content_parts.append(f"Last updated: {data.get('last_update')}")

        # License
        if data.get("license"):
            content_parts.append("")
            content_parts.append(f"License: {data.get('license')}")

        # Frequency
        if data.get("frequency"):
            content_parts.append(f"Update frequency: {data.get('frequency')}")

        return "\n".join(content_parts)

    except httpx.HTTPStatusError as e:
        if e.response.status_code == 404:
            return f"Error: Dataset with ID '{dataset_id}' not found."
        return f"Error: HTTP {e.response.status_code} - {str(e)}"
    except Exception as e:  # noqa: BLE001
        return f"Error: {str(e)}"

tools/__init__.py:14-23 (registration)

Top-level registration function that calls register_get_dataset_info_tool(mcp) among others to register all tools with the MCP server.

def register_tools(mcp: FastMCP) -> None:
    """Register all MCP tools with the provided FastMCP instance."""
    register_search_datasets_tool(mcp)
    register_query_resource_data_tool(mcp)
    register_get_dataset_info_tool(mcp)
    register_list_dataset_resources_tool(mcp)
    register_get_resource_info_tool(mcp)
    register_download_and_parse_resource_tool(mcp)
    register_get_metrics_tool(mcp)

tools/get_dataset_info.py:7-8 (registration)
Local registration function that defines and registers the get_dataset_info tool using the @mcp.tool() decorator.
```
def register_get_dataset_info_tool(mcp: FastMCP) -> None:
    @mcp.tool()
```

tools/get_dataset_info.py:9-28 (schema)

Function signature and docstring defining the tool's input (dataset_id: str) and output (str), used by MCP for schema generation.

async def get_dataset_info(dataset_id: str) -> str:
    """
    Get detailed information about a specific dataset.

    Returns comprehensive metadata including title, description, organization,
    tags, resource count, creation/update dates, license, and other details.
    Use this after finding a dataset with search_datasets to get more context
    before exploring its resources.

    Typical workflow:
    1. Use search_datasets to find datasets of interest
    2. Use get_dataset_info to get detailed information about a specific dataset
    3. Use list_dataset_resources to see what files are available in the dataset

    Args:
        dataset_id: The ID of the dataset to get information about (obtained from search_datasets)

    Returns:
        Formatted text with detailed dataset information
    """

helpers/datagouv_api_client.py:62-79 (helper)

Helper function called by the handler to retrieve raw dataset data from data.gouv.fr API v1 endpoint.

async def get_dataset_details(
    dataset_id: str, session: httpx.AsyncClient | None = None
) -> dict[str, Any]:
    """
    Fetch the complete dataset payload from the API v1 endpoint.
    """
    own = session is None
    if own:
        session = httpx.AsyncClient()
    assert session is not None
    try:
        base_url: str = env_config.get_base_url("datagouv_api")
        url = f"{base_url}1/datasets/{dataset_id}/"
        return await _fetch_json(session, url)
    finally:
        if own:
            await session.aclose()

datagouv-mcp

get_dataset_info

Instructions

Input Schema

Output Schema

Implementation Reference

Tool Definition Quality

Other Tools

Latest Blog Posts

MCP directory API