Skip to main content
Glama
bolinocroustibat

datagouv-mcp

get_dataset_info

Retrieve comprehensive metadata for a specific dataset, including title, description, organization, tags, resource count, dates, and license details to evaluate content before accessing files.

Instructions

Get detailed information about a specific dataset.

Returns comprehensive metadata including title, description, organization, tags, resource count, creation/update dates, license, and other details. Use this after finding a dataset with search_datasets to get more context before exploring its resources.

Typical workflow:

  1. Use search_datasets to find datasets of interest

  2. Use get_dataset_info to get detailed information about a specific dataset

  3. Use list_dataset_resources to see what files are available in the dataset

Args: dataset_id: The ID of the dataset to get information about (obtained from search_datasets)

Returns: Formatted text with detailed dataset information

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
dataset_idYes

Implementation Reference

  • The main handler for the get_dataset_info tool. Fetches dataset details from the API using a helper function and constructs a formatted string with key metadata like title, description, organization, tags, resources count, dates, license, etc. Includes error handling for not found and other exceptions.
    async def get_dataset_info(dataset_id: str) -> str:
        """
        Get detailed information about a specific dataset.
    
        Returns comprehensive metadata including title, description, organization,
        tags, resource count, creation/update dates, license, and other details.
        Use this after finding a dataset with search_datasets to get more context
        before exploring its resources.
    
        Typical workflow:
        1. Use search_datasets to find datasets of interest
        2. Use get_dataset_info to get detailed information about a specific dataset
        3. Use list_dataset_resources to see what files are available in the dataset
    
        Args:
            dataset_id: The ID of the dataset to get information about (obtained from search_datasets)
    
        Returns:
            Formatted text with detailed dataset information
        """
        try:
            # Get full dataset data from API v1 via helper
            data = await datagouv_api_client.get_dataset_details(dataset_id)
    
            content_parts = [f"Dataset Information: {data.get('title', 'Unknown')}", ""]
    
            if data.get("id"):
                content_parts.append(f"ID: {data.get('id')}")
            if data.get("slug"):
                content_parts.append(f"Slug: {data.get('slug')}")
                content_parts.append(
                    f"URL: {env_config.get_base_url('site')}datasets/{data.get('slug')}/"
                )
    
            if data.get("description_short"):
                content_parts.append("")
                content_parts.append(f"Description: {data.get('description_short')}")
    
            if data.get("description") and data.get("description") != data.get(
                "description_short"
            ):
                content_parts.append("")
                content_parts.append(
                    f"Full description: {data.get('description')[:500]}..."
                )
    
            if data.get("organization"):
                org = data.get("organization", {})
                if isinstance(org, dict):
                    content_parts.append("")
                    content_parts.append(f"Organization: {org.get('name', 'Unknown')}")
                    if org.get("id"):
                        content_parts.append(f"  Organization ID: {org.get('id')}")
    
            # Handle tags
            tags = []
            for tag in data.get("tags", []):
                if isinstance(tag, str):
                    tags.append(tag)
                elif isinstance(tag, dict):
                    tag_name = tag.get("name", "")
                    if tag_name:
                        tags.append(tag_name)
            if tags:
                content_parts.append("")
                content_parts.append(f"Tags: {', '.join(tags[:10])}")
    
            # Resources info
            resources = data.get("resources", [])
            content_parts.append("")
            content_parts.append(f"Resources: {len(resources)} file(s)")
    
            # Dates
            if data.get("created_at"):
                content_parts.append("")
                content_parts.append(f"Created: {data.get('created_at')}")
            if data.get("last_update"):
                content_parts.append(f"Last updated: {data.get('last_update')}")
    
            # License
            if data.get("license"):
                content_parts.append("")
                content_parts.append(f"License: {data.get('license')}")
    
            # Frequency
            if data.get("frequency"):
                content_parts.append(f"Update frequency: {data.get('frequency')}")
    
            return "\n".join(content_parts)
    
        except httpx.HTTPStatusError as e:
            if e.response.status_code == 404:
                return f"Error: Dataset with ID '{dataset_id}' not found."
            return f"Error: HTTP {e.response.status_code} - {str(e)}"
        except Exception as e:  # noqa: BLE001
            return f"Error: {str(e)}"
  • Top-level registration function that calls register_get_dataset_info_tool(mcp) among others to register all tools with the MCP server.
    def register_tools(mcp: FastMCP) -> None:
        """Register all MCP tools with the provided FastMCP instance."""
        register_search_datasets_tool(mcp)
        register_query_resource_data_tool(mcp)
        register_get_dataset_info_tool(mcp)
        register_list_dataset_resources_tool(mcp)
        register_get_resource_info_tool(mcp)
        register_download_and_parse_resource_tool(mcp)
        register_get_metrics_tool(mcp)
  • Local registration function that defines and registers the get_dataset_info tool using the @mcp.tool() decorator.
    def register_get_dataset_info_tool(mcp: FastMCP) -> None:
        @mcp.tool()
  • Function signature and docstring defining the tool's input (dataset_id: str) and output (str), used by MCP for schema generation.
    async def get_dataset_info(dataset_id: str) -> str:
        """
        Get detailed information about a specific dataset.
    
        Returns comprehensive metadata including title, description, organization,
        tags, resource count, creation/update dates, license, and other details.
        Use this after finding a dataset with search_datasets to get more context
        before exploring its resources.
    
        Typical workflow:
        1. Use search_datasets to find datasets of interest
        2. Use get_dataset_info to get detailed information about a specific dataset
        3. Use list_dataset_resources to see what files are available in the dataset
    
        Args:
            dataset_id: The ID of the dataset to get information about (obtained from search_datasets)
    
        Returns:
            Formatted text with detailed dataset information
        """
  • Helper function called by the handler to retrieve raw dataset data from data.gouv.fr API v1 endpoint.
    async def get_dataset_details(
        dataset_id: str, session: httpx.AsyncClient | None = None
    ) -> dict[str, Any]:
        """
        Fetch the complete dataset payload from the API v1 endpoint.
        """
        own = session is None
        if own:
            session = httpx.AsyncClient()
        assert session is not None
        try:
            base_url: str = env_config.get_base_url("datagouv_api")
            url = f"{base_url}1/datasets/{dataset_id}/"
            return await _fetch_json(session, url)
        finally:
            if own:
                await session.aclose()

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/bolinocroustibat/datagouv-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server