CKAN MCP Server (Python)
A Model Context Protocol (MCP) server that exposes the 600 plus12 global CKAN open-data portals to AI assistants, CLI tools, and other MCP-aware clients. It bundles curated portal presets, strict Pydantic models, and a batteries-included tool suite so analysts, application developers, and operators can explore public datasets without writing bespoke CKAN integrations.
Persona-centric Guidance
Persona | Primary questions | Suggested section |
Curious evaluators | "What can this server do?" "Which CKAN actions are covered?" | |
Data analysts / MCP end-users | "How do I set it up locally?" "How do I connect to a remote MCP server?" | |
Contributors / maintainers | "How is the code organized?" "How do I run tests?" | |
Platform / infra teams | "Can I deploy this to Cloud Run?" |
Potential Users: What This Server Does
Why it exists
Purpose-built CKAN interface: Wraps the CKAN Action & Datastore APIs behind MCP tools that AI agents and CLI clients understand.
Consistent insights: Presents dataset summaries, freshness analysis, schemas, and download helpers so exploratory conversations stay grounded in real CKAN metadata.
Portal-aware behavior: Curated overrides (transport method, dataset URL templates, helper prompts) keep the experience consistent across CKAN portals that deviate from defaults.
Tool catalog (14 tools)
Category | Tool | What it returns |
Session configuration |
| Selects a portal (country/location + overrides) and stores API keys/session metadata. |
| Lists the configured CKAN portals and reports the current session's selection (when set). | |
| Probes GET/POST behavior, datastore aliases, and helper metadata; emits recommended overrides for future sessions. | |
Dataset retrieval |
| Full CKAN dataset metadata (resources, organization, extras). |
| Paginated package list with optional total counts. | |
| Action API | |
| Organizations and groups for navigation. | |
Datastore access |
| Pulls preview rows from the first active datastore resource. |
| Targeted datastore search with filters, sorts, distinct, etc. | |
| Metadata-rich archive/download helper with MIME detection, extraction, and how-to snippets. | |
Analysis |
| Weighted scoring across title/description/tags/org/resource metadata. |
| Frequency heuristics plus CKAN update timestamps. | |
| Schema summaries, record counts, sample fields. | |
| Combines discovery, updates, structure, and helper prompts into one rich response. |
Architecture at a glance
src/ckan_mcp/main.py– MCP server entry point supporting stdio and HTTP (SSE) transports.src/ckan_mcp/ckan_tools.py– Tool implementations, transport probes, download helpers, and archive extraction.src/ckan_mcp/helpers.py– Relevance scoring, update frequency analysis, summary builders.src/ckan_mcp/types.py– Strict Pydantic models withextra="allow"for portal-specific metadata.src/ckan_mcp/config_selection.py&src/ckan_mcp/data/ckan_config_selection.json– Curated CKAN portal catalog and overrides consumed byckan_api_initialise.tests/&test_runner.py– Pytest suite plus quick smoke runner mirroring production behaviors.
Tip: start with ckan_api_initialise to choose a portal, then call the analysis tools to see the depth of insights returned.
Data Analysts: Getting Insights Fast
Shared prerequisites
CKAN portal URL (or use the curated list during initialization).
Python 3.11+ and
uvorpipfor installing dependencies.curl and the POSIX
filecommand on yourPATH(thedownload_dataset_locallytool shells out to both binaries).An MCP-compatible client (Claude CLI, Gemini CLI, etc.).
Option A – Run the MCP server locally
Clone & create a virtual environment
git clone https://github.com/<org>/ckan-mcp.git cd ckan-mcp uv venv venv source venv/bin/activateInstall runtime dependencies
uv pip install -e .(Add
".[dev]"for development tooling and".[examples]"if you want to run the sample scripts that load.envfiles.)Optional defaults – export CKAN env vars if you always talk to the same portal:
export CKAN_BASE_URL="https://ckan0.cf.opendata.inter.prod-toronto.ca/api/3/action" export CKAN_SITE_URL="https://ckan0.cf.opendata.inter.prod-toronto.ca" export CKAN_DATASET_URL_TEMPLATE="https://ckan0.cf.opendata.inter.prod-toronto.ca/dataset/{name}"These are fallback values; interactive sessions normally rely on
ckan_api_initialiseto pick a portal.Launch in stdio mode (best for desktop MCP clients):
python -m ckan_mcp.mainConnect your MCP client – example Claude CLI snippet (see core environment variables for transport overrides):
{ "mcpServers": { "ckan-mcp": { "command": "python", "args": ["-m", "ckan_mcp.main"], "env": { "CKAN_MCP_LOCAL_DATASTORE": "~/dataset-store/", } } } }Start a session – ask your assistant to "Initialize a CKAN connection"; it will call
ckan_api_initialiseand then the discovery tools.
Option B – Use a remote MCP server with a local client
This flow is perfect when someone else operates the MCP server on shared infrastructure and you only need local MCP tooling.
Server operator sets up HTTP transport:
export CKAN_MCP_MODE=http export CKAN_MCP_HOST=0.0.0.0 export CKAN_MCP_PORT=8000 python -m ckan_mcp.mainor run
docker compose up --buildto exposehttp://localhost:8000/mcpand front it with your preferred reverse proxy.Expose the via HTTPS (Cloud Run, Fly.io, Tailscale, etc.) and share the URL with analysts.
Analyst registers the remote MCP server (Claude CLI example):
claude mcp add --transport http ckan-mcp https://mcp.example.com/mcp claude mcp listGemini CLI uses
gemini mcp add --transport http ckan-mcp https://mcp.example.com/mcpwith the same URL.Use normally – all CLI/desktop prompts now tunnel through the remote MCP server. The analyst still decides which CKAN portal to inspect via
ckan_api_initialise.
Day-to-day usage tips
ckan_api_availabilitylists every CKAN portal packaged with this MCP build and reiterates which portal is currently selected (if any) before issuing expensive searches.find_relevant_datasetsquickly surfaces top matches for natural-language prompts; follow up withget_dataset_insightsfor a detailed brief.download_dataset_locallywrites metadata, datastore previews, and shell instructions to~/.cache/ckan-mcp/...so you can pivot to pandas immediately.
Developers: Extend and Contribute
Repository map
Supporting files: pyproject.toml (uv/poetry style metadata), tests/, test_runner.py, examples/ for fixtures, and Docker/Make targets for container workflows.
Local development workflow
Activate the virtualenv and install dev dependencies:
source venv/bin/activate uv pip install -e ".[dev]"Run formatters and linters (Black first, then Ruff as required by the project guidelines):
black src/ tests/ ruff check src/ tests/ --fixType checking:
mypy src/Tests:
pytest tests/ -v python test_runner.py # lightweight smoke run # Live integration tests against the curated CKAN portals CKAN_RUN_INTEGRATION_TESTS=1 pytest tests/ -m integration -vIntegration tests talk to the public CKAN portal configured via
CKAN_TEST_COUNTRY/CKAN_TEST_LOCATION(defaults to Canada/Toronto) and accept overrides such asCKAN_TEST_BASE_URL,CKAN_TEST_SITE_URL,CKAN_TEST_DATASET_URL_TEMPLATE, orCKAN_TEST_SEARCH_TERMSfor custom portals.GitHub Actions verification (optional, requires the GitHub CLI authenticated against
openascot/ckan-mcp-private):# trigger the full workflow (lint/unit + integration jobs) for your current branch gh workflow run ci.yml --ref "$(git rev-parse --abbrev-ref HEAD)" # tail the logs for the most recent run gh run watch gh run view --log-failedThe workflow only runs when triggered manually; the
quality-checksjob runs Black/Ruff/mypy, and the dependentpytest-suitejob reuses.github/workflows/pytest.ymlto execute the standard pytest run plus the integration suite (withCKAN_RUN_INTEGRATION_TESTS=1). Trigger the standalonePytestworkflow directly if you only need the testing jobs.Docker-based workflow (optional, HTTP transport exposed at
http://localhost:8000/mcp):make dev # foreground dev stack with reload make quick-start # background stack, uses docker-compose.dev.yml make dev-tools # helper container via the tools profile make shell # attach to the running dev app container make test-production # builds docker-compose.yml and curls /mcp
Follow the AGENTS.md guidance for naming, docstrings, and how to place fixtures under examples/. Always update or add pytest coverage alongside new tools or helper behaviors.
See CHANGELOG.md for release history and public milestone notes.
Contributing checklist
Create a feature branch and keep commits focused.
Add or update tests under
tests/mirroring the target module name (e.g.,tests/test_ckan_tools.py).Run
pytest,black,ruff, andmypylocally (or via the docker helpers) before opening a PR.Document new environment variables or tool behaviors in this README or
EVALUATION_GUIDE.mdas appropriate.
Production Deployment (Google Cloud Run Example)
Cloud Run pairs nicely with the built-in HTTP transport. The following example assumes you have the Google Cloud CLI configured and Artifact Registry enabled.
Build and push the container (uses the included multi-stage
Dockerfile):export PROJECT_ID="my-gcp-project" gcloud auth configure-docker gcloud builds submit --tag gcr.io/$PROJECT_ID/ckan-mcpDeploy to Cloud Run:
gcloud run deploy ckan-mcp \ --image gcr.io/$PROJECT_ID/ckan-mcp \ --region us-central1 \ --platform managed \ --allow-unauthenticated \ --port 8000 \ --set-env-vars CKAN_MCP_MODE=http,CKAN_MCP_HTTP_PATH=/mcp,CKAN_MCP_HTTP_ALLOW_ORIGINS=* \ --set-env-vars CKAN_BASE_URL=https://ckan0.cf.opendata.inter.prod-toronto.ca/api/3/action,CKAN_SITE_URL=https://ckan0.cf.opendata.inter.prod-toronto.caAdjust env vars for your preferred portal or omit them so analysts always call
ckan_api_initialise.Share the endpoint – Cloud Run will emit a URL such as
https://ckan-mcp-12345-uc.a.run.app. Provide the/mcppath to clients (https://ckan-mcp-12345-uc.a.run.app/mcp).Register with MCP clients – same
claude mcp add --transport http ...flow as in the analyst section.Operational tips:
Set
CKAN_MCP_HTTP_JSON_RESPONSE=trueif your proxy expects JSON instead of SSE.Use Secret Manager to supply
CKAN_API_KEYfor locked-down portals.Monitor Cloud Run metrics; the server makes outbound HTTPS calls to CKAN only when tools are invoked.
Configuration Reference
Core environment variables
Variable | Default | Purpose |
| none | Optional default Action API base; sessions can override via |
| none | Root site URL used for dataset links. |
| none | Overrides dataset page URL format ( |
| none | API key used when the selected portal requires authentication. |
|
|
|
|
| Bind host when |
|
| Bind port for HTTP mode. |
|
| Mount path for HTTP transport (used both by builtin HTTP server and Cloud Run deployments). |
|
| CORS allowlist for HTTP mode. |
|
| Emit JSON responses instead of SSE when |
|
| Log verbosity for HTTP transport. |
|
| Local directory path where downloaded datasets are stored. Defaults to |
current working directory if not set. |
CKAN portal overrides
The curated catalog in src/ckan_mcp/data/ckan_config_selection.json contains entries such as Toronto, NYC, etc. Each location can provide overrides like:
action_transport: force GET vs POST for/api/3/actioncalls.datastore_id_alias: whetherdatastore_searchacceptsidinstead ofresource_id.requires_api_key: block initialization until an API key is supplied.helper_prompt: user-facing reminder echoed in tool responses.Pagination settings (
default_search_rows,max_search_rows,default_preview_limit).
Call audit_ckan_api after selecting a portal to get automatically generated override recommendations, helper prompt text, and config snippets that can be pasted back into the catalog or used ad hoc via ckan_api_initialise(overrides={...}).
Sources / References
DataShades, CKAN Instances, accessed November 30, 2025, https://datashades.info/.
↩commondataio/dataportals-registry, accessed November 30, 2025, https://raw.githubusercontent.com/commondataio/dataportals-registry/refs/heads/main/data/datasets/bysoftware/ckan.jsonl.
↩
This server cannot be installed