Skip to main content
Glama
README.md10.5 kB
# data.gouv.fr MCP Server Model Context Protocol (MCP) server that allows AI chatbots to search, explore, and analyze datasets from [data.gouv.fr](https://www.data.gouv.fr), the French national Open Data platform, directly through conversation. ## What is this? The data.gouv.fr MCP server is a tool that allows AI chatbots (like Claude, Gemini, or Cursor) to interact with datasets from [data.gouv.fr](https://www.data.gouv.fr). Instead of manually browsing the website, you can simply ask questions like "Quels jeux de données sont disponibles sur les prix de l'immobilier ?" or "Montre-moi les dernières données de population pour Paris" and get instant answers. This is currently a **proof of concept (POC)** and is meant to be run **locally on your machine** for now, until it is put into production later. Since it runs locally, you'll need a few basic tech skills to set it up, but Docker makes the process straightforward. The server is built using the [official Python SDK for MCP servers and clients](https://github.com/modelcontextprotocol/python-sdk) and uses the Streamable HTTP transport protocol. ## 1. Run the MCP server Before starting, clone this repository and browse into it: ```shell git clone git@github.com:datagouv/datagouv-mcp.git cd datagouv-mcp ``` Docker is required for the recommended setup. Install it via [Docker Desktop](https://www.docker.com/products/docker-desktop/) or any compatible Docker Engine before continuing. ### 🐳 With Docker (Recommended) ```shell # With default settings (port 8000, prod environment) docker compose up -d # With custom environment variables MCP_PORT=8007 DATAGOUV_ENV=demo docker compose up -d # Stop docker compose down ``` **Environment variables:** - `MCP_PORT`: port for the MCP HTTP server (defaults to `8000` when unset). - `DATAGOUV_ENV`: `prod` (default) or `demo`. This controls which data.gouv.fr environement it uses the data from (https://www.data.gouv.fr or https://demo.data.gouv.fr). By default the MCP server talks to the production data.gouv.fr. Set `DATAGOUV_ENV=demo` if you specifically need the demo environment. ### Manual Installation You will need [uv](https://github.com/astral-sh/uv) to install dependencies and run the server. 1. **Install dependencies** ```shell uv sync ``` 2. **Prepare the environment file** Copy the [example environment file](.env.example) to create your own `.env` file: ```shell cp .env.example .env ``` Then optionnaly edit `.env` and set the variables that matter for your run: ``` MCP_PORT=8007 # (defaults to 8000 when unset) DATAGOUV_ENV=prod # Allowed values: demo | prod (defaults to prod when unset) ``` Load the variables with your preferred method, e.g.: ```shell set -a && source .env && set +a ``` 3. **Start the HTTP MCP server** ```shell uv run main.py ``` ## 2. Connect your chatbot to the MCP server The MCP server configuration depends on your client. Use the appropriate configuration format for your client: ### Cursor Cursor supports MCP servers through its settings. To configure the server: 1. Open Cursor Settings 2. Search for "MCP" or "Model Context Protocol" 3. Add a new MCP server with the following configuration: ```json { "mcpServers": { "datagouv": { "url": "http://127.0.0.1:8000/mcp", "transport": "http" } } } ``` ### Gemini CLI Add the following to your `~/.gemini/settings.json` file: ```json { "mcpServers": { "datagouv": { "transport": "http", "httpUrl": "http://127.0.0.1:8000/mcp" } } } ``` ### Claude Desktop Add the following to your Claude Desktop configuration file (typically `~/Library/Application Support/Claude/claude_desktop_config.json` on MacOS, or `%APPDATA%\Claude\claude_desktop_config.json` on Windows): ```json { "mcpServers": { "datagouv": { "command": "npx", "args": [ "mcp-remote", "http://127.0.0.1:8000/mcp" ] } } } ``` ### Claude Code Use the `claude mcp` command to add the MCP server: ```shell claude mcp add --transport http datagouv http://127.0.0.1:8000/mcp ``` ### VS Code Add the following to your VS Code `settings.json`: ```json { "servers": { "datagouv": { "url": "http://127.0.0.1:8000/mcp", "type": "http" } } } ``` ### Windsurf Add the following to your `~/.codeium/mcp_config.json`: ```json { "mcpServers": { "datagouv": { "command": "npx", "args": [ "-y", "mcp-remote", "http://127.0.0.1:8000/mcp" ] } } } ``` **Note:** - Replace `http://127.0.0.1:8000/mcp` with your actual server URL if running on a different host or port. For production deployments, use `https://` and configure the appropriate hostname. - This MCP server only exposes read-only tools for now, so no API key is required. ## 🚚 Transport support This MCP server uses FastMCP and implements the **Streamable HTTP transport only**. **STDIO and SSE are not supported**. ## 📋 Available Endpoints **Streamable HTTP transport (standards-compliant):** - `POST /mcp` - JSON-RPC messages (client → server) - `GET /health` - Simple JSON health probe (`{"status":"ok","timestamp":"..."}`) ## 🛠️ Available Tools The MCP server provides tools to interact with data.gouv.fr datasets: - **`search_datasets`** - Search for datasets by keywords. Returns datasets with metadata (title, description, organization, tags, resource count). Parameters: `query` (required), `page` (optional, default: 1), `page_size` (optional, default: 20, max: 100) - **`get_dataset_info`** - Get detailed information about a specific dataset (metadata, organization, tags, dates, license, etc.). Parameters: `dataset_id` (required) - **`list_dataset_resources`** - List all resources (files) in a dataset with their metadata (format, size, type, URL). Parameters: `dataset_id` (required) - **`get_resource_info`** - Get detailed information about a specific resource (format, size, MIME type, URL, dataset association, Tabular API availability). Parameters: `resource_id` (required) - **`query_dataset_data`** - Query data from a dataset via the Tabular API. Finds a dataset, retrieves its resources, and fetches rows to answer questions. Parameters: `question` (required), `dataset_id` (optional), `dataset_query` (optional), `limit_per_resource` (optional, default: 100) Note: Either `dataset_id` or `dataset_query` must be provided. Works for CSV/XLS resources within Tabular API size limits (CSV ≤ 100 MB, XLSX ≤ 12.5 MB). - **`download_and_parse_resource`** - Download and parse a resource that is not accessible via Tabular API (files too large, formats not supported, external URLs). Parameters: `resource_id` (required), `max_rows` (optional, default: 1000), `max_size_mb` (optional, default: 500) Supported formats: CSV, CSV.GZ, JSON, JSONL. Useful for files exceeding Tabular API limits or formats not supported by Tabular API. - **`get_metrics`** - Get metrics (visits, downloads) for a dataset and/or a resource. Parameters: `dataset_id` (optional), `resource_id` (optional), `limit` (optional, default: 12, max: 100) Returns monthly statistics including visits and downloads, sorted by month in descending order (most recent first). At least one of `dataset_id` or `resource_id` must be provided. **Note:** This tool only works with the production environment (`DATAGOUV_ENV=prod`). The Metrics API does not have a demo/preprod environment. ## 🧪 Tests ### Automated Tests with pytest Run the tests with pytest (these cover helper modules; the MCP server wiring is best exercised via the MCP Inspector): ```shell # Run all tests uv run pytest # Run with verbose output uv run pytest -v # Run specific test file uv run pytest tests/test_tabular_api.py # Run with custom resource ID RESOURCE_ID=3b6b2281-b9d9-4959-ae9d-c2c166dff118 uv run pytest tests/test_tabular_api.py # Run with prod environment DATAGOUV_ENV=prod uv run pytest ``` ### Interactive Testing with MCP Inspector Use the official [MCP Inspector](https://modelcontextprotocol.io/docs/tools/inspector) to interactively test the server tools and resources. Prerequisites: - Node.js with `npx` available Steps: 1. Start the MCP server (see above) 2. In another terminal, launch the inspector: ```shell npx @modelcontextprotocol/inspector --http-url "http://127.0.0.1:${MCP_PORT}/mcp" ``` Adjust the URL if you exposed the server on another host/port. ## 🤝 Contributing ### 🧹 Code Linting and Formatting This project follows PEP 8 style guidelines using [Ruff](https://astral.sh/ruff/) for linting and formatting. **Either running these commands manually or [installing the pre-commit hook](#-pre-commit-hooks) is required before submitting contributions.** ```shell # Lint and sort imports, and format code uv run ruff check --select I --fix && uv run ruff format ``` ### 🔗 Pre-commit Hooks This repository uses a [pre-commit](https://pre-commit.com/) hook which lint and format code before each commit. **Installing the pre-commit hook is required for contributions.** **Install pre-commit hooks:** ```shell uv run pre-commit install ``` The pre-commit hook that automatically: - Check YAML syntax - Fix end-of-file issues - Remove trailing whitespace - Check for large files - Run Ruff linting and formatting ### 🏷️ Releases and versioning The release process uses the [`tag_version.sh`](tag_version.sh) script to create git tags, GitHub releases and update [CHANGELOG.md](CHANGELOG.md) automatically. Package version numbers are automatically derived from git tags using [setuptools_scm](https://github.com/pypa/setuptools_scm), so no manual version updates are needed in `pyproject.toml`. **Prerequisites**: [GitHub CLI](https://cli.github.com/) must be installed and authenticated, and you must be on the main branch with a clean working directory. ```shell # Create a new release ./tag_version.sh <version> # Example ./tag_version.sh 2.5.0 # Dry run to see what would happen ./tag_version.sh 2.5.0 --dry-run ``` The script automatically: - Extracts commits since the last tag and formats them for CHANGELOG.md - Identifies breaking changes (commits with `!:` in the subject) - Creates a git tag and pushes it to the remote repository - Creates a GitHub release with the changelog content ## 📄 License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/bolinocroustibat/datagouv-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server