Skip to main content
Glama
datagouv

datagouv-mcp

by datagouv

list_dataset_resources

Discover and access all files within a dataset from France's open data platform. View file details like format, size, and type to identify data for analysis.

Instructions

List all resources (files) in a dataset with their metadata.

Returns information about each resource including ID, title, format, size, and type. This is a key step before querying data from resources.

Typical workflow:

  1. Use search_datasets to find datasets

  2. Use list_dataset_resources to see what files are in a dataset

  3. Use get_resource_info to check if a resource is available via Tabular API

  4. Use query_resource_data (for Tabular API) or download_and_parse_resource (for large/unsupported files)

Args: dataset_id: The ID of the dataset to list resources from (obtained from search_datasets or get_dataset_info)

Returns: Formatted text listing all resources with their metadata, including resource IDs for data queries

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
dataset_idYes

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • The main asynchronous tool handler that lists all resources in a dataset. Fetches initial resource list and metadata via datagouv_api_client.get_resources_for_dataset, then details for each resource, formats human-readable output with ID, title, format, size (human-readable), MIME, type, URL.
    async def list_dataset_resources(dataset_id: str) -> str:
        """
        List all resources (files) in a dataset with their metadata.
    
        Returns information about each resource including ID, title, format, size,
        and type. This is a key step before querying data from resources.
    
        Typical workflow:
        1. Use search_datasets to find datasets
        2. Use list_dataset_resources to see what files are in a dataset
        3. Use get_resource_info to check if a resource is available via Tabular API
        4. Use query_resource_data (for Tabular API) or download_and_parse_resource (for large/unsupported files)
    
        Args:
            dataset_id: The ID of the dataset to list resources from (obtained from search_datasets or get_dataset_info)
    
        Returns:
            Formatted text listing all resources with their metadata, including resource IDs for data queries
        """
        try:
            result = await datagouv_api_client.get_resources_for_dataset(dataset_id)
            dataset = result.get("dataset", {})
            resources = result.get("resources", [])
    
            if not dataset.get("id"):
                return f"Error: Dataset with ID '{dataset_id}' not found."
    
            dataset_title = dataset.get("title", "Unknown")
    
            content_parts = [
                f"Resources in dataset: {dataset_title}",
                f"Dataset ID: {dataset_id}",
                f"Total resources: {len(resources)}\n",
            ]
    
            if not resources:
                content_parts.append("This dataset has no resources.")
                return "\n".join(content_parts)
    
            # Get detailed info for each resource
            async with httpx.AsyncClient() as session:
                for i, (resource_id, resource_title) in enumerate(resources, 1):
                    content_parts.append(f"{i}. {resource_title or 'Untitled'}")
                    content_parts.append(f"   Resource ID: {resource_id}")
    
                    try:
                        resource_data = await datagouv_api_client.get_resource_details(
                            resource_id, session=session
                        )
                        resource = resource_data.get("resource", {})
    
                        if resource.get("format"):
                            content_parts.append(f"   Format: {resource.get('format')}")
                        if resource.get("filesize"):
                            size = resource.get("filesize")
                            if isinstance(size, int):
                                # Format size in human-readable format
                                if size < 1024:
                                    size_str = f"{size} B"
                                elif size < 1024 * 1024:
                                    size_str = f"{size / 1024:.1f} KB"
                                elif size < 1024 * 1024 * 1024:
                                    size_str = f"{size / (1024 * 1024):.1f} MB"
                                else:
                                    size_str = f"{size / (1024 * 1024 * 1024):.1f} GB"
                                content_parts.append(f"   Size: {size_str}")
                        if resource.get("mime"):
                            content_parts.append(
                                f"   MIME type: {resource.get('mime')}"
                            )
                        if resource.get("type"):
                            content_parts.append(f"   Type: {resource.get('type')}")
                        if resource.get("url"):
                            content_parts.append(f"   URL: {resource.get('url')}")
                    except Exception as e:  # noqa: BLE001
                        logger.warning(
                            f"Could not fetch details for resource {resource_id}: {e}"
                        )
    
                    content_parts.append("")
    
            return "\n".join(content_parts)
    
        except httpx.HTTPStatusError as e:
            return f"Error: HTTP {e.response.status_code} - {str(e)}"
        except Exception as e:  # noqa: BLE001
            return f"Error: {str(e)}"
  • Local registration function for the tool using the @mcp.tool() decorator on the handler function.
    def register_list_dataset_resources_tool(mcp: FastMCP) -> None:
        @mcp.tool()
  • tools/__init__.py:9-23 (registration)
    Central tool registration: imports the register function and calls it within register_tools(mcp), which sets up all tools.
    from tools.list_dataset_resources import register_list_dataset_resources_tool
    from tools.query_resource_data import register_query_resource_data_tool
    from tools.search_datasets import register_search_datasets_tool
    
    
    def register_tools(mcp: FastMCP) -> None:
        """Register all MCP tools with the provided FastMCP instance."""
        register_search_datasets_tool(mcp)
        register_query_resource_data_tool(mcp)
        register_get_dataset_info_tool(mcp)
        register_list_dataset_resources_tool(mcp)
        register_get_resource_info_tool(mcp)
        register_download_and_parse_resource_tool(mcp)
        register_get_metrics_tool(mcp)
  • Key helper utility invoked by the handler to fetch the dataset's resources list (ID and title tuples) and basic dataset metadata from the data.gouv.fr API v1 endpoint.
    async def get_resources_for_dataset(
        dataset_id: str, session: httpx.AsyncClient | None = None
    ) -> dict[str, Any]:
        """
        Get all resources for a given dataset.
    
        Returns:
            dict with 'dataset' metadata and 'resources' list of resource IDs and titles
        """
        own = session is None
        if own:
            session = httpx.AsyncClient()
        try:
            ds = await get_dataset_metadata(dataset_id, session=session)
            base_url: str = env_config.get_base_url("datagouv_api")
            # Fetch resources from API v1
            url = f"{base_url}1/datasets/{dataset_id}/"
            data = await _fetch_json(session, url)
            resources: list[dict[str, Any]] = data.get("resources", [])
            res_list: list[tuple[str, str]] = [
                (res.get("id"), res.get("title", "") or res.get("name", ""))
                for res in resources
                if res.get("id")
            ]
            return {"dataset": ds, "resources": res_list}
        finally:
            if own and session:
                await session.aclose()
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses that this is a read operation (lists resources), describes the return format (formatted text with metadata), and mentions it's part of a workflow. However, it doesn't cover potential behavioral aspects like pagination, rate limits, or error conditions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a clear purpose statement, workflow context, and separate Args/Returns sections. While slightly longer than minimal, every sentence adds value. The workflow section could be more concise, but overall it's efficiently organized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema (so return values are documented elsewhere), no annotations, and a simple single parameter, the description provides excellent context. It explains the tool's role in the ecosystem, how to use it with siblings, and adds parameter semantics missing from the schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has 0% description coverage for its single parameter. The description compensates by explaining that dataset_id is 'The ID of the dataset to list resources from (obtained from search_datasets or get_dataset_info)', adding crucial context about where to obtain this value. It doesn't provide format examples, but adds meaningful semantic information.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'List' and resource 'all resources (files) in a dataset with their metadata', distinguishing it from siblings like search_datasets (finds datasets) or get_resource_info (checks availability). It specifies it's for listing files with metadata, not querying or downloading them.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool in a typical workflow (step 2 after search_datasets) and when to use alternatives like get_resource_info or query_resource_data. It clearly positions this as a key step before querying data from resources.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/datagouv/datagouv-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server