Skip to main content
Glama

get_dataset_info

Retrieve comprehensive metadata for a specific dataset, including title, description, organization, tags, resource count, dates, and license details to evaluate content before accessing files.

Instructions

Get detailed information about a specific dataset.

Returns comprehensive metadata including title, description, organization, tags, resource count, creation/update dates, license, and other details. Use this after finding a dataset with search_datasets to get more context before exploring its resources.

Typical workflow:

  1. Use search_datasets to find datasets of interest

  2. Use get_dataset_info to get detailed information about a specific dataset

  3. Use list_dataset_resources to see what files are available in the dataset

Args: dataset_id: The ID of the dataset to get information about (obtained from search_datasets)

Returns: Formatted text with detailed dataset information

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
dataset_idYes

Implementation Reference

  • The main handler for the get_dataset_info tool. Fetches dataset details from the API using a helper function and constructs a formatted string with key metadata like title, description, organization, tags, resources count, dates, license, etc. Includes error handling for not found and other exceptions.
    async def get_dataset_info(dataset_id: str) -> str: """ Get detailed information about a specific dataset. Returns comprehensive metadata including title, description, organization, tags, resource count, creation/update dates, license, and other details. Use this after finding a dataset with search_datasets to get more context before exploring its resources. Typical workflow: 1. Use search_datasets to find datasets of interest 2. Use get_dataset_info to get detailed information about a specific dataset 3. Use list_dataset_resources to see what files are available in the dataset Args: dataset_id: The ID of the dataset to get information about (obtained from search_datasets) Returns: Formatted text with detailed dataset information """ try: # Get full dataset data from API v1 via helper data = await datagouv_api_client.get_dataset_details(dataset_id) content_parts = [f"Dataset Information: {data.get('title', 'Unknown')}", ""] if data.get("id"): content_parts.append(f"ID: {data.get('id')}") if data.get("slug"): content_parts.append(f"Slug: {data.get('slug')}") content_parts.append( f"URL: {env_config.get_base_url('site')}datasets/{data.get('slug')}/" ) if data.get("description_short"): content_parts.append("") content_parts.append(f"Description: {data.get('description_short')}") if data.get("description") and data.get("description") != data.get( "description_short" ): content_parts.append("") content_parts.append( f"Full description: {data.get('description')[:500]}..." ) if data.get("organization"): org = data.get("organization", {}) if isinstance(org, dict): content_parts.append("") content_parts.append(f"Organization: {org.get('name', 'Unknown')}") if org.get("id"): content_parts.append(f" Organization ID: {org.get('id')}") # Handle tags tags = [] for tag in data.get("tags", []): if isinstance(tag, str): tags.append(tag) elif isinstance(tag, dict): tag_name = tag.get("name", "") if tag_name: tags.append(tag_name) if tags: content_parts.append("") content_parts.append(f"Tags: {', '.join(tags[:10])}") # Resources info resources = data.get("resources", []) content_parts.append("") content_parts.append(f"Resources: {len(resources)} file(s)") # Dates if data.get("created_at"): content_parts.append("") content_parts.append(f"Created: {data.get('created_at')}") if data.get("last_update"): content_parts.append(f"Last updated: {data.get('last_update')}") # License if data.get("license"): content_parts.append("") content_parts.append(f"License: {data.get('license')}") # Frequency if data.get("frequency"): content_parts.append(f"Update frequency: {data.get('frequency')}") return "\n".join(content_parts) except httpx.HTTPStatusError as e: if e.response.status_code == 404: return f"Error: Dataset with ID '{dataset_id}' not found." return f"Error: HTTP {e.response.status_code} - {str(e)}" except Exception as e: # noqa: BLE001 return f"Error: {str(e)}"
  • Top-level registration function that calls register_get_dataset_info_tool(mcp) among others to register all tools with the MCP server.
    def register_tools(mcp: FastMCP) -> None: """Register all MCP tools with the provided FastMCP instance.""" register_search_datasets_tool(mcp) register_query_resource_data_tool(mcp) register_get_dataset_info_tool(mcp) register_list_dataset_resources_tool(mcp) register_get_resource_info_tool(mcp) register_download_and_parse_resource_tool(mcp) register_get_metrics_tool(mcp)
  • Local registration function that defines and registers the get_dataset_info tool using the @mcp.tool() decorator.
    def register_get_dataset_info_tool(mcp: FastMCP) -> None: @mcp.tool()
  • Function signature and docstring defining the tool's input (dataset_id: str) and output (str), used by MCP for schema generation.
    async def get_dataset_info(dataset_id: str) -> str: """ Get detailed information about a specific dataset. Returns comprehensive metadata including title, description, organization, tags, resource count, creation/update dates, license, and other details. Use this after finding a dataset with search_datasets to get more context before exploring its resources. Typical workflow: 1. Use search_datasets to find datasets of interest 2. Use get_dataset_info to get detailed information about a specific dataset 3. Use list_dataset_resources to see what files are available in the dataset Args: dataset_id: The ID of the dataset to get information about (obtained from search_datasets) Returns: Formatted text with detailed dataset information """
  • Helper function called by the handler to retrieve raw dataset data from data.gouv.fr API v1 endpoint.
    async def get_dataset_details( dataset_id: str, session: httpx.AsyncClient | None = None ) -> dict[str, Any]: """ Fetch the complete dataset payload from the API v1 endpoint. """ own = session is None if own: session = httpx.AsyncClient() assert session is not None try: base_url: str = env_config.get_base_url("datagouv_api") url = f"{base_url}1/datasets/{dataset_id}/" return await _fetch_json(session, url) finally: if own: await session.aclose()

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/bolinocroustibat/datagouv-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server