# CKAN MCP Server (Python)
A Model Context Protocol (MCP) server that exposes the 600 plus[^1][^2] global CKAN open-data portals to AI assistants, CLI tools, and other MCP-aware clients. It bundles curated portal presets, strict Pydantic models, and a batteries-included tool suite so analysts, application developers, and operators can explore public datasets without writing bespoke CKAN integrations.
## Persona-centric Guidance
| Persona | Primary questions | Suggested section |
| --- | --- | --- |
| Curious evaluators | "What can this server do?" "Which CKAN actions are covered?" | [Potential users](#potential-users-what-this-server-does) |
| Data analysts / MCP end-users | "How do I set it up locally?" "How do I connect to a remote MCP server?" | [Data analysts](#data-analysts-getting-insights-fast) |
| Contributors / maintainers | "How is the code organized?" "How do I run tests?" | [Developers](#developers-extend-and-contribute) |
| Platform / infra teams | "Can I deploy this to Cloud Run?" | [Production deployment](#production-deployment-google-cloud-run-example) |
---
## Potential Users: What This Server Does
### Why it exists
- **Purpose-built CKAN interface**: Wraps the CKAN Action & Datastore APIs behind MCP tools that AI agents and CLI clients understand.
- **Consistent insights**: Presents dataset summaries, freshness analysis, schemas, and download helpers so exploratory conversations stay grounded in real CKAN metadata.
- **Portal-aware behavior**: Curated overrides (transport method, dataset URL templates, helper prompts) keep the experience consistent across CKAN portals that deviate from defaults.
### Tool catalog (14 tools)
| Category | Tool | What it returns |
| --- | --- | --- |
| Session configuration | `ckan_api_initialise` | Selects a portal (country/location + overrides) and stores API keys/session metadata. |
| | `ckan_api_availability` | Lists the configured CKAN portals and reports the current session's selection (when set). |
| | `audit_ckan_api` | Probes GET/POST behavior, datastore aliases, and helper metadata; emits recommended overrides for future sessions. |
| Dataset retrieval | `get_package` | Full CKAN dataset metadata (resources, organization, extras). |
| | `list_datasets` | Paginated package list with optional total counts. |
| | `search_datasets` | Action API `package_search` wrapper with passthrough Solr parameters. |
| | `get_data_categories` | Organizations and groups for navigation. |
| Datastore access | `get_first_datastore_resource_records` | Pulls preview rows from the first active datastore resource. |
| | `get_resource_records` | Targeted datastore search with filters, sorts, distinct, etc. |
| | `download_dataset_locally` | Metadata-rich archive/download helper with MIME detection, extraction, and how-to snippets. |
| Analysis | `find_relevant_datasets` | Weighted scoring across title/description/tags/org/resource metadata. |
| | `analyze_dataset_updates` | Frequency heuristics plus CKAN update timestamps. |
| | `analyze_dataset_structure` | Schema summaries, record counts, sample fields. |
| | `get_dataset_insights` | Combines discovery, updates, structure, and helper prompts into one rich response. |
### Architecture at a glance
- `src/ckan_mcp/main.py` – MCP server entry point supporting stdio and HTTP (SSE) transports.
- `src/ckan_mcp/ckan_tools.py` – Tool implementations, transport probes, download helpers, and archive extraction.
- `src/ckan_mcp/helpers.py` – Relevance scoring, update frequency analysis, summary builders.
- `src/ckan_mcp/types.py` – Strict Pydantic models with `extra="allow"` for portal-specific metadata.
- `src/ckan_mcp/config_selection.py` & `src/ckan_mcp/data/ckan_config_selection.json` – Curated CKAN portal catalog and overrides consumed by `ckan_api_initialise`.
- `tests/` & `test_runner.py` – Pytest suite plus quick smoke runner mirroring production behaviors.
Tip: start with `ckan_api_initialise` to choose a portal, then call the analysis tools to see the depth of insights returned.
---
## Data Analysts: Getting Insights Fast
### Shared prerequisites
- CKAN portal URL (or use the curated list during initialization).
- Python 3.11+ and [`uv`](https://github.com/astral-sh/uv) or `pip` for installing dependencies.
- [curl](https://curl.se/) and the POSIX [`file`](https://man7.org/linux/man-pages/man1/file.1.html) command on your `PATH` (the `download_dataset_locally` tool shells out to both binaries).
- An MCP-compatible client (Claude CLI, Gemini CLI, etc.).
### Option A – Run the MCP server locally
1. **Clone & create a virtual environment**
```bash
git clone https://github.com/<org>/ckan-mcp.git
cd ckan-mcp
uv venv venv
source venv/bin/activate
```
2. **Install runtime dependencies**
```bash
uv pip install -e .
```
(Add `".[dev]"` for development tooling and `".[examples]"` if you want to run the sample scripts that load `.env` files.)
3. **Optional defaults** – export CKAN env vars if you always talk to the same portal:
```bash
export CKAN_BASE_URL="https://ckan0.cf.opendata.inter.prod-toronto.ca/api/3/action"
export CKAN_SITE_URL="https://ckan0.cf.opendata.inter.prod-toronto.ca"
export CKAN_DATASET_URL_TEMPLATE="https://ckan0.cf.opendata.inter.prod-toronto.ca/dataset/{name}"
```
These are fallback values; interactive sessions normally rely on `ckan_api_initialise` to pick a portal.
4. **Launch in stdio mode** (best for desktop MCP clients):
```bash
python -m ckan_mcp.main
```
5. **Connect your MCP client** – example Claude CLI snippet (see [core environment variables](#core-environment-variables) for transport overrides):
```json
{
"mcpServers": {
"ckan-mcp": {
"command": "python",
"args": ["-m", "ckan_mcp.main"],
"env": {
"CKAN_MCP_LOCAL_DATASTORE": "~/dataset-store/",
}
}
}
}
```
6. **Start a session** – ask your assistant to "Initialize a CKAN connection"; it will call `ckan_api_initialise` and then the discovery tools.
### Option B – Use a remote MCP server with a local client
This flow is perfect when someone else operates the MCP server on shared infrastructure and you only need local MCP tooling.
1. **Server operator** sets up HTTP transport:
```bash
export CKAN_MCP_MODE=http
export CKAN_MCP_HOST=0.0.0.0
export CKAN_MCP_PORT=8000
python -m ckan_mcp.main
```
or run `docker compose up --build` to expose `http://localhost:8000/mcp` and front it with your preferred reverse proxy.
2. **Expose the `/mcp` endpoint** via HTTPS (Cloud Run, Fly.io, Tailscale, etc.) and share the URL with analysts.
3. **Analyst registers the remote MCP server** (Claude CLI example):
```bash
claude mcp add --transport http ckan-mcp https://mcp.example.com/mcp
claude mcp list
```
Gemini CLI uses `gemini mcp add --transport http ckan-mcp https://mcp.example.com/mcp` with the same URL.
4. **Use normally** – all CLI/desktop prompts now tunnel through the remote MCP server. The analyst still decides which CKAN portal to inspect via `ckan_api_initialise`.
### Day-to-day usage tips
- `ckan_api_availability` lists every CKAN portal packaged with this MCP build and reiterates which portal is currently selected (if any) before issuing expensive searches.
- `find_relevant_datasets` quickly surfaces top matches for natural-language prompts; follow up with `get_dataset_insights` for a detailed brief.
- `download_dataset_locally` writes metadata, datastore previews, and shell instructions to `~/.cache/ckan-mcp/...` so you can pivot to pandas immediately.
---
## Developers: Extend and Contribute
### Repository map
```
src/ckan_mcp/
├── main.py # MCP entry point + HTTP transport
├── ckan_tools.py # Tool implementations & download helpers
├── helpers.py # Scoring, frequency, and summary helpers
├── types.py # Pydantic models
├── config_selection.py # Catalog loader & helper utilities
├── data/
│ └── ckan_config_selection.json # Curated CKAN catalog & overrides
└── __init__.py
```
Supporting files: `pyproject.toml` (uv/poetry style metadata), `tests/`, `test_runner.py`, `examples/` for fixtures, and Docker/Make targets for container workflows.
### Local development workflow
1. **Activate the virtualenv** and install dev dependencies:
```bash
source venv/bin/activate
uv pip install -e ".[dev]"
```
2. **Run formatters and linters** (Black first, then Ruff as required by the project guidelines):
```bash
black src/ tests/
ruff check src/ tests/ --fix
```
3. **Type checking**:
```bash
mypy src/
```
4. **Tests**:
```bash
pytest tests/ -v
python test_runner.py # lightweight smoke run
# Live integration tests against the curated CKAN portals
CKAN_RUN_INTEGRATION_TESTS=1 pytest tests/ -m integration -v
```
Integration tests talk to the public CKAN portal configured via `CKAN_TEST_COUNTRY`/`CKAN_TEST_LOCATION` (defaults to Canada/Toronto) and accept overrides such as `CKAN_TEST_BASE_URL`, `CKAN_TEST_SITE_URL`, `CKAN_TEST_DATASET_URL_TEMPLATE`, or `CKAN_TEST_SEARCH_TERMS` for custom portals.
5. **GitHub Actions verification** (optional, requires the [GitHub CLI](https://cli.github.com/) authenticated against `openascot/ckan-mcp-private`):
```bash
# trigger the full workflow (lint/unit + integration jobs) for your current branch
gh workflow run ci.yml --ref "$(git rev-parse --abbrev-ref HEAD)"
# tail the logs for the most recent run
gh run watch
gh run view --log-failed
```
The workflow only runs when triggered manually; the `quality-checks` job runs Black/Ruff/mypy, and the dependent `pytest-suite` job reuses `.github/workflows/pytest.yml` to execute the standard pytest run plus the integration suite (with `CKAN_RUN_INTEGRATION_TESTS=1`). Trigger the standalone `Pytest` workflow directly if you only need the testing jobs.
5. **Docker-based workflow** (optional, HTTP transport exposed at `http://localhost:8000/mcp`):
```bash
make dev # foreground dev stack with reload
make quick-start # background stack, uses docker-compose.dev.yml
make dev-tools # helper container via the tools profile
make shell # attach to the running dev app container
make test-production # builds docker-compose.yml and curls /mcp
```
Follow the AGENTS.md guidance for naming, docstrings, and how to place fixtures under `examples/`. Always update or add pytest coverage alongside new tools or helper behaviors.
See [CHANGELOG.md](CHANGELOG.md) for release history and public milestone notes.
### Contributing checklist
- Create a feature branch and keep commits focused.
- Add or update tests under `tests/` mirroring the target module name (e.g., `tests/test_ckan_tools.py`).
- Run `pytest`, `black`, `ruff`, and `mypy` locally (or via the docker helpers) before opening a PR.
- Document new environment variables or tool behaviors in this README or `EVALUATION_GUIDE.md` as appropriate.
---
## Production Deployment (Google Cloud Run Example)
Cloud Run pairs nicely with the built-in HTTP transport. The following example assumes you have the Google Cloud CLI configured and Artifact Registry enabled.
1. **Build and push the container** (uses the included multi-stage `Dockerfile`):
```bash
export PROJECT_ID="my-gcp-project"
gcloud auth configure-docker
gcloud builds submit --tag gcr.io/$PROJECT_ID/ckan-mcp
```
2. **Deploy to Cloud Run**:
```bash
gcloud run deploy ckan-mcp \
--image gcr.io/$PROJECT_ID/ckan-mcp \
--region us-central1 \
--platform managed \
--allow-unauthenticated \
--port 8000 \
--set-env-vars CKAN_MCP_MODE=http,CKAN_MCP_HTTP_PATH=/mcp,CKAN_MCP_HTTP_ALLOW_ORIGINS=* \
--set-env-vars CKAN_BASE_URL=https://ckan0.cf.opendata.inter.prod-toronto.ca/api/3/action,CKAN_SITE_URL=https://ckan0.cf.opendata.inter.prod-toronto.ca
```
Adjust env vars for your preferred portal or omit them so analysts always call `ckan_api_initialise`.
3. **Share the endpoint** – Cloud Run will emit a URL such as `https://ckan-mcp-12345-uc.a.run.app`. Provide the `/mcp` path to clients (`https://ckan-mcp-12345-uc.a.run.app/mcp`).
4. **Register with MCP clients** – same `claude mcp add --transport http ...` flow as in the analyst section.
5. **Operational tips**:
- Set `CKAN_MCP_HTTP_JSON_RESPONSE=true` if your proxy expects JSON instead of SSE.
- Use Secret Manager to supply `CKAN_API_KEY` for locked-down portals.
- Monitor Cloud Run metrics; the server makes outbound HTTPS calls to CKAN only when tools are invoked.
---
## Configuration Reference
### Core environment variables
| Variable | Default | Purpose |
| --- | --- | --- |
| `CKAN_BASE_URL` | none | Optional default Action API base; sessions can override via `ckan_api_initialise`. |
| `CKAN_SITE_URL` | none | Root site URL used for dataset links. |
| `CKAN_DATASET_URL_TEMPLATE` | none | Overrides dataset page URL format (`{name}` and `{id}` supported). |
| `CKAN_API_KEY` | none | API key used when the selected portal requires authentication. |
| `CKAN_MCP_MODE` | `stdio` | `stdio` for CLI integrations, `http` for streamable HTTP transport. |
| `CKAN_MCP_HOST` | `0.0.0.0` (HTTP mode) | Bind host when `CKAN_MCP_MODE=http`. |
| `CKAN_MCP_PORT` | `8000` | Bind port for HTTP mode. |
| `CKAN_MCP_HTTP_PATH` | `/mcp` | Mount path for HTTP transport (used both by builtin HTTP server and Cloud Run deployments). |
| `CKAN_MCP_HTTP_ALLOW_ORIGINS` | `*` | CORS allowlist for HTTP mode. |
| `CKAN_MCP_HTTP_JSON_RESPONSE` | `false` | Emit JSON responses instead of SSE when `true`. |
| `CKAN_MCP_HTTP_LOG_LEVEL` | `info` | Log verbosity for HTTP transport. |
| `CKAN_MCP_LOCAL_DATASTORE` | `./` (current directory) | Local directory path where downloaded datasets are stored. Defaults to
current working directory if not set. |
### CKAN portal overrides
The curated catalog in `src/ckan_mcp/data/ckan_config_selection.json` contains entries such as Toronto, NYC, etc. Each location can provide overrides like:
- `action_transport`: force GET vs POST for `/api/3/action` calls.
- `datastore_id_alias`: whether `datastore_search` accepts `id` instead of `resource_id`.
- `requires_api_key`: block initialization until an API key is supplied.
- `helper_prompt`: user-facing reminder echoed in tool responses.
- Pagination settings (`default_search_rows`, `max_search_rows`, `default_preview_limit`).
Call `audit_ckan_api` after selecting a portal to get automatically generated override recommendations, helper prompt text, and config snippets that can be pasted back into the catalog or used ad hoc via `ckan_api_initialise(overrides={...})`.
## Sources / References
[^1]: DataShades, *CKAN Instances*, accessed November 30, 2025, https://datashades.info/.
[^2]: commondataio/dataportals-registry, accessed November 30, 2025, https://raw.githubusercontent.com/commondataio/dataportals-registry/refs/heads/main/data/datasets/bysoftware/ckan.jsonl.