What can you do with this server?

This MCP server enables AI chatbots to search, explore, and analyze datasets from data.gouv.fr (France's national Open Data platform) through conversational interfaces. Capabilities: * Search datasets - Find datasets by keywords, searching through titles, descriptions, and tags with paginated results * Get dataset details - Retrieve comprehensive metadata including title, description, organization, tags, license, creation/update dates, and resource counts * List dataset resources - View all files within a dataset with their metadata (ID, title, format, size, type) * Get resource information - Access detailed metadata about specific files including format, size, MIME type, URL, and Tabular API availability * Query tabular data - Fetch structured data from CSV/XLSX files via the Tabular API without downloading (up to 200 rows per request with pagination; supports files within size limits: CSV ≤ 100 MB, XLSX ≤ 12.5 MB) * Download and parse resources - Process large files or unsupported formats by downloading and parsing them locally (supports CSV, CSV.GZ, JSON, JSONL, XLSX with configurable limits) * Get usage metrics - Retrieve monthly visit and download statistics for datasets and resources (production environment only) Key features: * Read-only access to data.gouv.fr's extensive open data catalog * Works with multiple AI chatbot clients (Claude, ChatGPT, Gemini, Mistral, Cursor, VS Code, etc.) * Can be used via hosted endpoint or self-hosted locally * Complete workflow from dataset discovery to data analysis

How do I use datagouv-mcp?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@datagouv-mcp find datasets about air quality in Paris from the last year" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

data.gouv.fr MCP Server

CircleCI License: MIT

Model Context Protocol (MCP) server that allows AI chatbots to search, explore, and analyze datasets from data.gouv.fr, the French national Open Data platform, directly through conversation.

🤔 What is this?

The data.gouv.fr MCP server is a tool that allows AI chatbots (like Claude, Gemini, or Cursor) to interact with datasets from data.gouv.fr. Instead of manually browsing the website, you can simply ask questions like "Quels jeux de données sont disponibles sur les prix de l'immobilier ?" or "Montre-moi les dernières données de population pour Paris" and get instant answers. A hosted endpoint is available at https://mcp.data.gouv.fr/mcp, and you can also run the server locally if you prefer.

The server is built using the official Python SDK for MCP servers and clients and uses the Streamable HTTP transport protocol.

🌐 Connect your chatbot to the MCP server

Use the hosted endpoint https://mcp.data.gouv.fr/mcp (recommended). If you self-host, swap in your own URL.

The MCP server configuration depends on your client. Use the appropriate configuration format for your client:

ChatGPT

To connect an internet-hosted MCP server using ChatGPT Web (paid plans only: Plus/Pro/Team/Enterprise), open ChatGPT in your browser, go to Settings, then Apps and connectors, open Advanced settings, and enable Developer mode; next, return to Settings, open Connectors, select Browse connectors, click Add a new connector, and set the URL to https://mcp.data.gouv.fr/mcp then save to activate the connector so its tools become available in ChatGPT.

Claude Desktop

Add the following to your Claude Desktop configuration file (typically ~/Library/Application Support/Claude/claude_desktop_config.json on MacOS, or %APPDATA%\Claude\claude_desktop_config.json on Windows):

{ "mcpServers": { "datagouv": { "command": "npx", "args": [ "mcp-remote", "https://mcp.data.gouv.fr/mcp" ] } } }

Claude Code

Use the claude mcp command to add the MCP server:

claude mcp add --transport http datagouv https://mcp.data.gouv.fr/mcp

Gemini CLI

Add the following to your ~/.gemini/settings.json file:

{ "mcpServers": { "datagouv": { "transport": "http", "httpUrl": "https://mcp.data.gouv.fr/mcp" } } }

Mistral Vibe CLI

Edit your Vibe config (default ~/.vibe/config.toml) and add the MCP server:

[[mcp_servers]] name = "datagouv" transport = "streamable-http" url = "https://mcp.data.gouv.fr/mcp"

See the full Vibe MCP options in the official docs: MCP server configuration.

AnythingLLM

Locate the anythingllm_mcp_servers.json file in your AnythingLLM storage plugins directory:
- Mac: ~/Library/Application Support/anythingllm-desktop/storage/plugins/anythingllm_mcp_servers.json
- Linux: ~/.config/anythingllm-desktop/storage/plugins/anythingllm_mcp_servers.json
- Windows: C:\Users\<username>\AppData\Roaming\anythingllm-desktop\storage\plugins\anythingllm_mcp_servers.json
Add the following configuration:

{ "mcpServers": { "datagouv": { "type": "streamable", "url": "https://mcp.data.gouv.fr/mcp" } } }

For more details, see the AnythingLLM MCP documentation.

VS Code

Add the following to your VS Code settings.json:

{ "servers": { "datagouv": { "url": "https://mcp.data.gouv.fr/mcp", "type": "http" } } }

Cursor

Cursor supports MCP servers through its settings. To configure the server:

Open Cursor Settings
Search for "MCP" or "Model Context Protocol"
Add a new MCP server with the following configuration:

{ "mcpServers": { "datagouv": { "url": "https://mcp.data.gouv.fr/mcp", "transport": "http" } } }

Windsurf

Add the following to your ~/.codeium/mcp_config.json:

{ "mcpServers": { "datagouv": { "command": "npx", "args": [ "-y", "mcp-remote", "https://mcp.data.gouv.fr/mcp" ] } } }

Note:

The hosted endpoint is https://mcp.data.gouv.fr/mcp. If you run the server yourself, replace it with your own URL (see “Run locally” below for the default local endpoint).
This MCP server only exposes read-only tools for now, so no API key is required.

🖥️ Run locally

1. Run the MCP server

Before starting, clone this repository and browse into it:

git clone git@github.com:datagouv/datagouv-mcp.git cd datagouv-mcp

Docker is required for the recommended setup. Install it via Docker Desktop or any compatible Docker Engine before continuing.

🐳 With Docker (Recommended)

# With default settings (port 8000, prod environment) docker compose up -d # With custom environment variables MCP_PORT=8007 DATAGOUV_ENV=demo docker compose up -d # Stop docker compose down

Environment variables:

MCP_HOST: host to bind to (defaults to 0.0.0.0). Set to 127.0.0.1 for local development to follow MCP security best practices.
MCP_PORT: port for the MCP HTTP server (defaults to 8000 when unset).
DATAGOUV_ENV: prod (default) or demo. This controls which data.gouv.fr environement it uses the data from (https://www.data.gouv.fr or https://demo.data.gouv.fr). By default the MCP server talks to the production data.gouv.fr. Set DATAGOUV_ENV=demo if you specifically need the demo environment.

⚙️ Manual Installation

You will need uv to install dependencies and run the server.

Install dependencies

uv sync

Prepare the environment file

Copy the example environment file to create your own .env file:

cp .env.example .env

Then optionnaly edit .env and set the variables that matter for your run:

MCP_HOST=127.0.0.1 # (defaults to 0.0.0.0, use 127.0.0.1 for local dev) MCP_PORT=8007 # (defaults to 8000 when unset) DATAGOUV_ENV=prod # Allowed values: demo | prod (defaults to prod when unset)

Load the variables with your preferred method, e.g.:

set -a && source .env && set +a

Start the HTTP MCP server

uv run main.py

2. Connect your chatbot to the local MCP server

Follow the steps in Connect your chatbot to the MCP server and simply swap the hosted URL for your local endpoint (default: http://127.0.0.1:${MCP_PORT:-8000}/mcp).

🚚 Transport support

This MCP server uses FastMCP and implements the Streamable HTTP transport only.

STDIO and SSE are not supported.

📋 Available Endpoints

Streamable HTTP transport (standards-compliant):

POST /mcp - JSON-RPC messages (client → server)
GET /health - Simple JSON health probe ({"status":"ok","timestamp":"..."})

🛠️ Available Tools

The MCP server provides tools to interact with data.gouv.fr datasets:

search_datasets - Search for datasets by keywords. Returns datasets with metadata (title, description, organization, tags, resource count).
Parameters: query (required), page (optional, default: 1), page_size (optional, default: 20, max: 100)
get_dataset_info - Get detailed information about a specific dataset (metadata, organization, tags, dates, license, etc.).
Parameters: dataset_id (required)
list_dataset_resources - List all resources (files) in a dataset with their metadata (format, size, type, URL).
Parameters: dataset_id (required)
get_resource_info - Get detailed information about a specific resource (format, size, MIME type, URL, dataset association, Tabular API availability).
Parameters: resource_id (required)
query_resource_data - Query data from a specific resource via the Tabular API. Fetches rows from a resource to answer questions.
Parameters: question (required), resource_id (required), page (optional, default: 1), page_size (optional, default: 20, max: 200)
Note: Recommended workflow: 1) Use search_datasets to find the dataset, 2) Use list_dataset_resources to see available resources, 3) Use query_resource_data with default page_size (20) to preview data structure. For small datasets (<500 rows), increase page_size or paginate. For large datasets (>1000 rows), use download_and_parse_resource instead. Works for CSV/XLS resources within Tabular API size limits (CSV ≤ 100 MB, XLSX ≤ 12.5 MB).
download_and_parse_resource - Download and parse a resource that is not accessible via Tabular API (files too large, formats not supported, external URLs).
Parameters: resource_id (required), max_rows (optional, default: 20), max_size_mb (optional, default: 500)
Supported formats: CSV, CSV.GZ, JSON, JSONL. Useful for files exceeding Tabular API limits or formats not supported by Tabular API. Start with default max_rows (20) to preview, then call again with higher max_rows if you need all data.
get_metrics - Get metrics (visits, downloads) for a dataset and/or a resource.
Parameters: dataset_id (optional), resource_id (optional), limit (optional, default: 12, max: 100)
Returns monthly statistics including visits and downloads, sorted by month in descending order (most recent first). At least one of dataset_id or resource_id must be provided. Note: This tool only works with the production environment (DATAGOUV_ENV=prod). The Metrics API does not have a demo/preprod environment.

🧪 Tests

✅ Automated Tests with pytest

Run the tests with pytest (these cover helper modules; the MCP server wiring is best exercised via the MCP Inspector):

# Run all tests uv run pytest # Run with verbose output uv run pytest -v # Run specific test file uv run pytest tests/test_tabular_api.py # Run with custom resource ID RESOURCE_ID=3b6b2281-b9d9-4959-ae9d-c2c166dff118 uv run pytest tests/test_tabular_api.py # Run with prod environment DATAGOUV_ENV=prod uv run pytest

🔍 Interactive Testing with MCP Inspector

Use the official MCP Inspector to interactively test the server tools and resources.

Prerequisites:

Node.js with npx available

Steps:

Start the MCP server (see above)
In another terminal, launch the inspector:
npx @modelcontextprotocol/inspector --http-url "http://127.0.0.1:${MCP_PORT}/mcp"
Adjust the URL if you exposed the server on another host/port.

🤝 Contributing

🧹 Code Linting and Formatting

This project follows PEP 8 style guidelines using Ruff for linting and formatting.

Either running these commands manually or

# Lint and sort imports, and format code uv run ruff check --select I --fix && uv run ruff format

🔗 Pre-commit Hooks

This repository uses a pre-commit hook which lint and format code before each commit. Installing the pre-commit hook is strongly recommended so the checks run automatically.

Install pre-commit hooks:

uv run pre-commit install

The pre-commit hook that automatically:

Check YAML syntax
Fix end-of-file issues
Remove trailing whitespace
Check for large files
Run Ruff linting and formatting

🏷️ Releases and versioning

The release process uses the tag_version.sh script to create git tags, GitHub releases and update CHANGELOG.md automatically. Package version numbers are automatically derived from git tags using setuptools_scm, so no manual version updates are needed in pyproject.toml.

Prerequisites: GitHub CLI must be installed and authenticated, and you must be on the main branch with a clean working directory.

# Create a new release ./tag_version.sh <version> # Example ./tag_version.sh 2.5.0 # Dry run to see what would happen ./tag_version.sh 2.5.0 --dry-run

The script automatically:

Extracts commits since the last tag and formats them for CHANGELOG.md
Identifies breaking changes (commits with !: in the subject)
Creates a git tag and pushes it to the remote repository
Creates a GitHub release with the changelog content

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

datagouv-mcp