Search CKAN Datasets
ckan_package_searchSearch datasets on any CKAN open data portal using advanced Solr query syntax. Supports filters, facets, sorting, and pagination to find relevant data.
Instructions
Search for datasets (packages) on a CKAN server using Solr query syntax.
Supports full Solr search capabilities including filters, facets, and sorting. Use this to discover datasets matching specific criteria.
Note on parser behavior: Some CKAN portals use a restrictive default query parser that can break long OR queries. For those portals, this tool may force the query into 'text:(...)' based on per-portal config. You can override with 'query_parser' to force or disable this behavior per request.
Important - Date field semantics:
issued: publisher's content publish date when available (best proxy for "created/published")
modified: publisher's content update date when available
metadata_created: CKAN record creation timestamp (publish time on source portals, harvest time on aggregators; fallback for "created" if issued missing)
metadata_modified: CKAN record update timestamp (publish time on source portals, harvest time on aggregators; use for "updated/modified in last X")
Natural language mapping (important for tool callers):
"created"/"published" -> prefer issued; fallback to metadata_created
"updated"/"modified" -> prefer modified; fallback to metadata_modified
For "recent in last X", consider using content_recent (issued with metadata_created fallback)
Content-recent helper:
content_recent: if true, rewrites the query to use issued with a fallback to metadata_created when issued is missing.
content_recent_days: window for content_recent (default 30 days).
Args:
server_url (string): Base URL of CKAN server (e.g., "https://dati.gov.it/opendata")
q (string): Search query using Solr syntax (default: ":" for all)
fq (string): Filter query (e.g., "organization:comune-palermo") IMPORTANT — Solr fq syntax rules:
OR inside a single field: use field:(val1 OR val2), NOT field:val1 OR field:val2. Wrong: fq=type:"A" OR type:"B" → silently ignored, returns entire catalog. Right: fq=type:("A" OR "B")
CKAN extras fields are indexed as extras_fieldname, not fieldname. e.g. to filter on extra field "hvd_category" use fq=extras_hvd_category:""
rows (number): Number of results to return (default: 10, max: 1000)
start (number): Offset for pagination (default: 0)
page (number): Page number (1-based); alias for start. Overrides start if provided.
page_size (number): Results per page when using page (default: 10, max: 1000)
sort (string): Sort field and direction (e.g., "metadata_modified desc")
facet_field (array): Fields to facet on (e.g., ["organization", "tags"])
facet_limit (number): Max facet values per field (default: 50)
include_drafts (boolean): Include draft datasets (default: false)
query_parser ('default' | 'text'): Override search parser behavior
response_format ('markdown' | 'json'): Output format
Returns: Search results with:
count: Number of results found
results: Array of dataset objects
facets: Facet counts (if facet_field specified)
search_facets: Detailed facet information
Query Syntax (parameter q): Boolean operators: - AND / &&: "water AND climate" - OR / ||: "health OR sanità" - NOT / !: "data NOT personal" - +required -excluded: "+title:water -title:sea" - Grouping: "(title:water OR title:climate) AND tags:environment"
Wildcards: - : "title:environment" (matches environmental, environments, etc.) - Note: Left truncation (*water) not supported
Fuzzy search (edit distance):
- : "title:rest" or "title:rest~1" (finds "test", "best", "rest")
Proximity search (words within N positions): - "phrase"~N: "title:"climate change"~5"
Range queries: - Inclusive [a TO b]: "num_resources:[5 TO 10]" - Exclusive {a TO b}: "num_resources:{0 TO 100}" - One side open: "metadata_modified:[2024-01-01T00:00:00Z TO *]"
Date math: - NOW-1YEAR, NOW-6MONTHS, NOW-7DAYS, NOW-1HOUR - NOW/DAY, NOW/MONTH (round down) - Combined: "metadata_modified:[NOW-2MONTHS TO NOW]" - Example: "metadata_created:[NOW-1YEAR TO *]" - IMPORTANT: NOW syntax works on metadata_modified and metadata_created fields - For 'modified' and 'issued' fields, NOW syntax is auto-converted to ISO dates - Manual ISO dates always work: "modified:[2026-01-15T00:00:00Z TO *]"
Field existence: - Exists: "field:" or "field:[ TO ]" - Not exists: "NOT field:" or "-field:*"
Boosting (relevance scoring): - Boost term: "title:water^2 OR notes:water" (title matches score higher) - Constant score: "title:water^=1.5"
Examples:
Search all: { q: ":" }
By tag: { q: "tags:sanità" }
Boolean: { q: "(title:water OR title:climate) AND NOT title:sea" }
Wildcard: { q: "title:environment*" }
Fuzzy: { q: "title:health~2" }
Proximity: { q: "notes:"open data"~3" }
Date range: { q: "metadata_modified:[2024-01-01T00:00:00Z TO 2024-12-31T23:59:59Z]" }
Date math: { q: "metadata_modified:[NOW-6MONTHS TO *]" }
Date math (auto-converted): { q: "modified:[NOW-30DAYS TO NOW]" }
Published in 2025 (content date): { fq: "issued:[2025-01-01T00:00:00Z TO 2025-12-31T23:59:59Z]" }
First appeared on portal in 2025: { fq: "metadata_created:[2025-01-01T00:00:00Z TO 2025-12-31T23:59:59Z]" }
Recent content (issued w/ fallback): { q: ":", content_recent: true, content_recent_days: 180 }
Field exists: { q: "organization:* AND num_resources:[1 TO *]" }
Boosting: { q: "title:climate^2 OR notes:climate" }
Filter org: { fq: "organization:regione-siciliana" }
Filter extras field (correct): { fq: "extras_hvd_category:"http://data.europa.eu/bna/c_ac64a52d"" }
Filter extras OR (correct): { fq: "extras_hvd_category:("http://data.europa.eu/bna/c_ac64a52d" OR "http://data.europa.eu/bna/c_dd313021")" }
Get facets: { facet_field: ["organization"], rows: 0 }
Query language: Before searching a portal, check its locale via ckan_status_show (field: "Portal Locale" / locale_default). Translate query terms to the portal's language — searching in English on a non-English portal returns 0 results. Examples: locale "it" → Italian terms; "uk_UA" → Ukrainian (Cyrillic); "fr_FR" → French. Exception: multilingual portals (e.g. data.europa.eu, open.canada.ca) accept EN + native terms joined with OR.
Typical workflow: ckan_status_show (check locale) → ckan_package_search (query in portal's language) → ckan_package_show (get full metadata + resource IDs) → ckan_datastore_search (query tabular data)
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| q | No | Search query in Solr syntax | *:* |
| fq | No | Filter query in Solr syntax; applied after scoring, does not affect relevance. CKAN extras fields use prefix 'extras_' (e.g. extras_hvd_category). For OR on same field use field:(val1 OR val2), never field:val1 OR field:val2 (silently breaks). Examples: 'organization:comune-palermo', 'res_format:CSV', 'extras_hvd_category:("uri1" OR "uri2")'. | |
| page | No | Page number (1-based); alias for start. Overrides start if provided. | |
| rows | No | Number of results to return | |
| sort | No | Sort field and direction (e.g., 'metadata_modified desc') | |
| start | No | Offset for pagination | |
| page_size | No | Results per page when using page (default: 10) | |
| server_url | Yes | Base URL of the CKAN server | |
| facet_field | No | Fields to facet on | |
| facet_limit | No | Maximum facet values per field | |
| query_parser | No | Override search parser ('text' forces text:(...) on non-fielded queries) | |
| content_recent | No | Use issued date with fallback to metadata_created for recent content | |
| include_drafts | No | Include draft datasets | |
| response_format | No | Output format: 'markdown' for human-readable or 'json' for machine-readable | markdown |
| content_recent_days | No | Day window for content_recent (default 30) |