Find Relevant CKAN Datasets
ckan_find_relevant_datasetsFind and rank open datasets by relevance to a natural language query, using weighted scoring across title, tags, notes, and organization fields.
Instructions
Find and rank datasets by relevance to a query using weighted fields.
Use this instead of ckan_package_search when you want relevance-ranked results with explicit scoring across title, notes, tags, and organization fields. Use ckan_package_search instead when you need Solr filter syntax, facets, or pagination.
Uses package_search for discovery and applies a local scoring model.
Args:
server_url (string): Base URL of CKAN server (e.g., "https://dati.gov.it/opendata")
query (string): Natural language or keyword query (e.g., "mobilità urbana", "air quality")
limit (number): Number of datasets to return (default: 10)
weights (object): Field weights for scoring — higher weight = more influence on rank Default: title=4, tags=3, notes=2, organization=1, holder=4, publisher=2 Note on holder vs organization: on federated catalogs (e.g. dati.gov.it),
organizationis the harvesting catalog (e.g. Regione Puglia), whileholder(DCAT-AP_IT dct:rightsHolder) is the actual data owner (e.g. Comune di Lecce). Queries like "datasets from a specific Comune" matchholdercorrectly; matching onlyorganizationmisses datasets harvested via aggregators.publisher(dct:publisher) is scored separately at lower weight as it can contain technical roles ("Redazione OD") rather than the institutional owner.query_parser ('default' | 'text'): Override search parser behavior
response_format ('markdown' | 'json'): Output format
Returns: Ranked datasets with relevance scores and per-field score breakdowns
Examples:
{ server_url: "https://dati.gov.it/opendata", query: "mobilità" }
{ server_url: "...", query: "trasporti", limit: 5, weights: { title: 5, notes: 2 } }
{ server_url: "...", query: "defibrillatori Comune di Lecce", weights: { holder: 5 } }
Typical workflow: ckan_find_relevant_datasets → ckan_package_show (inspect top results) → ckan_datastore_search (query data)
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Number of datasets to return | |
| query | Yes | Natural language or keyword query to match against dataset title, notes, tags, organization, holder and publisher | |
| weights | No | Per-field scoring weights; unspecified fields use defaults | |
| server_url | Yes | Base URL of the CKAN server (e.g., https://dati.gov.it/opendata) | |
| query_parser | No | Override search parser ('text' forces text:(...) on non-fielded queries) | |
| response_format | No | Output format: 'markdown' for human-readable or 'json' for machine-readable | markdown |