Skip to main content
Glama
datagouv

datagouv-mcp

by datagouv

query_resource_data

Fetch structured data rows from specific tabular files on France's open data platform using the Tabular API, retrieving up to 200 rows per query to analyze CSV or XLSX resources without downloading entire files.

Instructions

Query data from a specific resource (file) via the Tabular API.

The Tabular API is data.gouv.fr's API for parsing and querying the content of resources (files) on the platform. It allows you to access structured data from tabular files (CSV, XLSX, etc.) without downloading the entire file. This tool fetches rows from a specific resource using this API.

Each call retrieves up to 200 rows (the maximum allowed by the API).

Note: The Tabular API has size limits (CSV > 100 MB, XLSX > 12.5 MB are not supported). For larger files or unsupported formats, use download_and_parse_resource. You can use get_resource_info to check if a resource is available via Tabular API.

Recommended workflow:

  1. Use search_datasets to find the appropriate dataset

  2. Use list_dataset_resources to see available resources (files) in the dataset

  3. (Optional) Use get_resource_info to verify Tabular API availability

  4. Use query_resource_data with the chosen resource_id to fetch data

  5. If the answer is not in the first page, use query_resource_data with page=2, page=3, etc.

Args: question: The question or description of what data you're looking for (for context) resource_id: Resource ID (use list_dataset_resources to find resource IDs) page: Page number to retrieve (default: 1). Use this to navigate through large datasets. Each page contains up to 200 rows.

Returns: Formatted text with the data found from the resource, including pagination info

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
questionYes
resource_idYes
pageNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • The core handler function for the 'query_resource_data' tool. It queries the Tabular API for a given resource_id, retrieves up to 200 rows per page, includes resource/dataset metadata, shows sample data, and provides pagination guidance.
    async def query_resource_data(
        question: str,
        resource_id: str,
        page: int = 1,
    ) -> str:
        """
        Query data from a specific resource (file) via the Tabular API.
    
        The Tabular API is data.gouv.fr's API for parsing and querying the content of
        resources (files) on the platform. It allows you to access structured data from
        tabular files (CSV, XLSX, etc.) without downloading the entire file. This tool
        fetches rows from a specific resource using this API.
    
        Each call retrieves up to 200 rows (the maximum allowed by the API).
    
        Note: The Tabular API has size limits (CSV > 100 MB, XLSX > 12.5 MB are not
        supported). For larger files or unsupported formats, use download_and_parse_resource.
        You can use get_resource_info to check if a resource is available via Tabular API.
    
        Recommended workflow:
        1. Use search_datasets to find the appropriate dataset
        2. Use list_dataset_resources to see available resources (files) in the dataset
        3. (Optional) Use get_resource_info to verify Tabular API availability
        4. Use query_resource_data with the chosen resource_id to fetch data
        5. If the answer is not in the first page, use query_resource_data with page=2, page=3, etc.
    
        Args:
            question: The question or description of what data you're looking for (for context)
            resource_id: Resource ID (use list_dataset_resources to find resource IDs)
            page: Page number to retrieve (default: 1). Use this to navigate through large datasets.
                  Each page contains up to 200 rows.
    
        Returns:
            Formatted text with the data found from the resource, including pagination info
        """
        try:
            # Get resource metadata to display context
            try:
                resource_metadata = await datagouv_api_client.get_resource_metadata(
                    resource_id
                )
                resource_title = resource_metadata.get("title", "Unknown")
                dataset_id = resource_metadata.get("dataset_id")
            except Exception:  # noqa: BLE001
                resource_title = "Unknown"
                dataset_id = None
    
            # Get dataset title if available
            dataset_title = "Unknown"
            if dataset_id:
                try:
                    dataset_metadata = await datagouv_api_client.get_dataset_metadata(
                        str(dataset_id)
                    )
                    dataset_title = dataset_metadata.get("title", "Unknown")
                except Exception:  # noqa: BLE001
                    pass
    
            content_parts = [
                f"Querying resource: {resource_title}",
                f"Resource ID: {resource_id}",
            ]
            if dataset_id:
                content_parts.append(f"Dataset: {dataset_title} (ID: {dataset_id})")
            content_parts.extend(
                [
                    f"Question: {question}",
                    "",
                ]
            )
    
            # Fetch data via the Tabular API (always use max page size of 200)
            page_size = 200
            logger.info(
                f"Querying Tabular API for resource: {resource_title} "
                f"(ID: {resource_id}), page: {page}, page_size: {page_size}"
            )
    
            try:
                tabular_data = await tabular_api_client.fetch_resource_data(
                    resource_id, page=page, page_size=page_size
                )
                rows = tabular_data.get("data", [])
                meta = tabular_data.get("meta", {})
                total_count = meta.get("total")
                page_info = meta.get("page")
                page_size_meta = meta.get("page_size")
    
                if not rows:
                    content_parts.append(
                        "⚠️  No rows available (resource may be empty or filtered)."
                    )
                    return "\n".join(content_parts)
    
                if total_count is not None:
                    content_parts.append(f"Total rows (Tabular API): {total_count}")
                    # Calculate total pages
                    if page_size_meta and page_size_meta > 0:
                        total_pages = (
                            total_count + page_size_meta - 1
                        ) // page_size_meta
                        content_parts.append(
                            f"Total pages: {total_pages} (page size: {page_size_meta})"
                        )
                content_parts.append(
                    f"Retrieved: {len(rows)} row(s) from page {page_info or page}"
                )
    
                # Show column names
                if rows:
                    columns = [str(k) if k is not None else "" for k in rows[0].keys()]
                    content_parts.append(f"Columns: {', '.join(columns)}")
    
                # Show sample data (first few rows)
                content_parts.append("\nSample data (first 3 rows):")
                for i, row in enumerate(rows[:3], 1):
                    content_parts.append(f"  Row {i}:")
                    for key, value in row.items():
                        val_str = str(value) if value is not None else ""
                        if len(val_str) > 100:
                            val_str = val_str[:100] + "..."
                        content_parts.append(f"    {key}: {val_str}")
    
                if len(rows) > 3:
                    content_parts.append(
                        f"  ... ({len(rows) - 3} more row(s) available)"
                    )
    
                links = tabular_data.get("links", {})
                if links.get("next"):
                    next_page = page + 1
                    content_parts.append("")
                    content_parts.append(
                        f"📄 More data available! To see the next page, call query_resource_data "
                        f"again with page={next_page} (and the same resource_id and question)."
                    )
                    if total_count and page_size_meta:
                        remaining_pages = (
                            (total_count + page_size_meta - 1) // page_size_meta
                        ) - page
                        if remaining_pages > 1:
                            content_parts.append(
                                f"   There are {remaining_pages} more page(s) available after this one."
                            )
    
            except tabular_api_client.ResourceNotAvailableError as e:
                logger.warning(f"Resource not available: {resource_id} - {str(e)}")
                content_parts.append(f"⚠️  {str(e)}")
            except httpx.HTTPStatusError as e:
                error_details = f"HTTP {e.response.status_code}: {str(e)}"
                if e.request:
                    error_details += f" - URL: {e.request.url}"
                logger.error(
                    f"Tabular API HTTP error for resource {resource_id}: {error_details}"
                )
                content_parts.append(f"❌ Tabular API error ({error_details})")
            except Exception as e:  # noqa: BLE001
                logger.exception(f"Unexpected error querying resource {resource_id}")
                content_parts.append(f"❌ Error querying resource: {str(e)}")
    
            return "\n".join(content_parts)
    
        except httpx.HTTPStatusError as e:
            return f"Error: HTTP {e.response.status_code} - {str(e)}"
        except Exception as e:  # noqa: BLE001
            return f"Error: {str(e)}"
  • Imports and calls the register_query_resource_data_tool function as part of registering all tools in the MCP server.
    from tools.query_resource_data import register_query_resource_data_tool
    from tools.search_datasets import register_search_datasets_tool
    
    
    def register_tools(mcp: FastMCP) -> None:
        """Register all MCP tools with the provided FastMCP instance."""
        register_search_datasets_tool(mcp)
        register_query_resource_data_tool(mcp)
        register_get_dataset_info_tool(mcp)
        register_list_dataset_resources_tool(mcp)
        register_get_resource_info_tool(mcp)
        register_download_and_parse_resource_tool(mcp)
        register_get_metrics_tool(mcp)
  • The registration function that decorates the handler with @mcp.tool(), effectively registering 'query_resource_data' with the FastMCP instance.
    def register_query_resource_data_tool(mcp: FastMCP) -> None:
        @mcp.tool()
        async def query_resource_data(
  • The docstring defines the tool's input schema (question: str, resource_id: str, page: int=1) and output description, used by MCP for tool schema generation.
    """
    Query data from a specific resource (file) via the Tabular API.
    
    The Tabular API is data.gouv.fr's API for parsing and querying the content of
    resources (files) on the platform. It allows you to access structured data from
    tabular files (CSV, XLSX, etc.) without downloading the entire file. This tool
    fetches rows from a specific resource using this API.
    
    Each call retrieves up to 200 rows (the maximum allowed by the API).
    
    Note: The Tabular API has size limits (CSV > 100 MB, XLSX > 12.5 MB are not
    supported). For larger files or unsupported formats, use download_and_parse_resource.
    You can use get_resource_info to check if a resource is available via Tabular API.
    
    Recommended workflow:
    1. Use search_datasets to find the appropriate dataset
    2. Use list_dataset_resources to see available resources (files) in the dataset
    3. (Optional) Use get_resource_info to verify Tabular API availability
    4. Use query_resource_data with the chosen resource_id to fetch data
    5. If the answer is not in the first page, use query_resource_data with page=2, page=3, etc.
    
    Args:
        question: The question or description of what data you're looking for (for context)
        resource_id: Resource ID (use list_dataset_resources to find resource IDs)
        page: Page number to retrieve (default: 1). Use this to navigate through large datasets.
              Each page contains up to 200 rows.
    
    Returns:
        Formatted text with the data found from the resource, including pagination info
    """
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behaviors: 'Each call retrieves up to 200 rows (the maximum allowed by the API)', 'The Tabular API has size limits (CSV > 100 MB, XLSX > 12.5 MB are not supported)', and pagination behavior with the page parameter. It doesn't mention rate limits or authentication needs, but covers most critical operational constraints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and well-structured, with clear sections (overview, limitations, workflow, args, returns). Every sentence adds value, though the workflow section is somewhat detailed. It's front-loaded with the core purpose, but could be slightly more concise in the workflow explanation.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (querying tabular data with pagination), no annotations, 0% schema coverage, but with an output schema present, the description is remarkably complete. It covers purpose, limitations, workflow with siblings, parameter semantics, and behavioral constraints. The output schema handles return values, so the description appropriately focuses on usage context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must fully compensate. It provides clear semantics for all three parameters: 'question: The question or description of what data you're looking for (for context)', 'resource_id: Resource ID (use list_dataset_resources to find resource IDs)', and 'page: Page number to retrieve (default: 1)... Each page contains up to 200 rows.' This adds substantial meaning beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Query data from a specific resource (file) via the Tabular API' and 'fetches rows from a specific resource using this API.' It specifies the verb ('query', 'fetch'), resource ('resource', 'tabular files'), and distinguishes from siblings by mentioning the Tabular API and contrasting with download_and_parse_resource for larger files.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool versus alternatives: 'For larger files or unsupported formats, use download_and_parse_resource' and 'You can use get_resource_info to check if a resource is available via Tabular API.' It also includes a detailed recommended workflow with sibling tools, clearly outlining the context and prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/datagouv/datagouv-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server