Skip to main content
Glama
sprine

ontario-data-mcp

by sprine

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault
ONTARIO_DATA_TIMEOUTNoHTTP timeout in seconds30
ONTARIO_DATA_CACHE_DIRNoDuckDB storage + log file location~/.cache/ontario-data
ONTARIO_DATA_RATE_LIMITNoMax CKAN requests per second10

Capabilities

Features and capabilities supported by this server

CapabilityDetails
tools
{
  "listChanged": true
}
prompts
{
  "listChanged": false
}
resources
{
  "subscribe": false,
  "listChanged": false
}
extensions
{
  "io.modelcontextprotocol/ui": {}
}
experimental
{}

Tools

Functions exposed to the LLM to take actions

NameDescription
search_datasetsA

Search for datasets across all open data portals (Ontario, Toronto, Ottawa).

Results include portal-prefixed IDs (e.g. toronto:abc123). Use these IDs with get_dataset_info to view full metadata, then download_resource to cache locally. Use get_resource_schema to check column types before writing queries.

Searches fan out to all portals by default. Narrow to one portal with the portal param.

Args: query: Search terms (e.g. "covid cases", "housing prices", "school enrollment") organization: Filter by ministry/org (e.g. "health", "education") resource_format: Filter by file format (e.g. "CSV", "JSON", "SHP") update_frequency: Filter by frequency (e.g. "yearly", "monthly", "daily") sort_by: Sort order (default: relevance) limit: Max results per portal (1-50) portal: Narrow to one portal (e.g. "ontario", "toronto"). Default: all portals.

list_organizationsA

List government ministries and organizations with dataset counts across all portals.

Args: include_counts: Include dataset counts per organization portal: Narrow to one portal. Default: all portals.

list_topicsA

List top tags/topics used across data portals, ranked by dataset count.

Args: query: Optional filter to match tag names portal: Narrow to one portal. Default: all portals. limit: Max tags to return (default 100). Use a higher value to see more.

find_related_datasetsA

Find datasets related to a given dataset by shared tags and organization.

Searches within the same portal as the source dataset only.

Args: dataset_id: Prefixed dataset ID (e.g. "toronto:ttc-ridership") or bare ID limit: Max related datasets to return

list_portalsA

List all available data portals with their platform type and descriptions.

get_dataset_infoA

Get full metadata for a dataset including all resources.

Args: dataset_id: Prefixed dataset ID (e.g. "toronto:ttc-ridership") or bare ID

list_resourcesA

List all resources (files) in a dataset with their formats and sizes.

Args: dataset_id: Prefixed dataset ID (e.g. "toronto:ttc-ridership") or bare ID

get_resource_schemaA

Get the column schema and sample values for a datastore resource.

Args: resource_id: Prefixed resource ID (e.g. "toronto:abc123") or bare ID sample_size: Number of sample rows to include

compare_datasetsA

Compare metadata side-by-side for multiple datasets (can be cross-portal).

Args: dataset_ids: List of prefixed dataset IDs (e.g. ["toronto:abc", "ontario:def"]) to compare (2-5)

download_resourceA

Download a dataset resource and cache it locally in DuckDB for fast querying.

Supports CSV, XLSX, JSON, and datastore-active resources. If already cached, returns staleness info so you can decide whether to refresh. Numeric columns stored as text are automatically cast to DOUBLE.

Workflow: search_datasets → get_dataset_info → download_resource → query_cached.

Args: resource_id: Prefixed resource ID (e.g. "toronto:abc123") or bare ID

cache_infoA

Get cache statistics and list all cached datasets.

Returns size, table count, and details for every cached resource.

cache_manageA

Manage the local DuckDB cache: remove or clear cached data.

Args: action: One of "remove" (single resource) or "clear" (all) resource_id: Required for "remove" action. Prefixed or bare ID accepted.

refresh_cacheA

Re-download cached resources to get the latest data.

Args: resource_id: Specific resource to refresh (prefixed or bare ID), or omit to refresh all

query_resourceA

Query a resource via the CKAN Datastore API (remote, no download needed).

Only works for resources with datastore_active=True.

Args: resource_id: Prefixed resource ID (e.g. "toronto:abc123") or bare ID filters: Column filters as {column: value} pairs fields: List of columns to return (default: all) sort: Sort string (e.g. "date desc", "name asc") limit: Max rows (1-1000) offset: Row offset for pagination

sql_queryA

Run a SQL query against the CKAN Datastore (remote).

NOTE: Prefer download_resource + query_cached for repeated queries — the remote API has rate limits (429 errors). Use this tool only for quick one-off queries on datastore-active resources.

Use resource IDs as table names in double quotes. Example: SELECT "Column Name" FROM "resource-id-here" WHERE "Year" > 2020 LIMIT 10

Args: sql: SQL query string (read-only, SELECT only) portal: Portal to query (default: "ontario"). Required because SQL embeds resource IDs directly.

query_cachedA

Run a SQL query against locally cached data in DuckDB.

Use table names from download_resource or cache_info. Supports full DuckDB SQL: aggregations, window functions, CTEs, JOINs across tables.

Use SUM(quantity_col) not COUNT() when rows contain per-row counts (e.g. a "count" or "number_of" column). COUNT() counts rows, not quantities. Column names vary across resources in the same dataset — always DESCRIBE first. Use SELECT * RENAME ("old col" AS new_col) to normalize column names when joining across datasets. Values containing semicolons should be matched with LIKE patterns, not = equality. Quote table names with double quotes in SQL.

Args: sql: SQL query (e.g. SELECT * FROM "ds_my_table_abc12345" LIMIT 10)

preview_dataA

Quick preview of the first N rows of a resource (fetched remotely).

Args: resource_id: Prefixed resource ID (e.g. "toronto:abc123") or bare ID rows: Number of rows to preview (1-100)

check_freshnessA

Check if a dataset is current by comparing its update frequency to its last modification date.

Args: dataset_id: Prefixed dataset ID (e.g. "toronto:ttc-ridership") or bare ID

profile_dataA

Statistical profile and quality check of a cached dataset.

Uses DuckDB SUMMARIZE for column-level statistics (min, max, avg, std, nulls, unique counts). Also checks for duplicate rows.

Args: resource_id: Resource ID (must be cached)

validate_resultA

Validate that a claim is supported by query results.

Call this after making a data claim to verify it against the source. Re-executes the SQL, extracts numbers and terms from the claim, and checks them against the actual data.

Args: sql: The SQL query that produced the data backing the claim claim: The natural-language claim to verify (e.g. "Toronto had 12,345 building permits in 2023")

load_geodataA

Download and cache a geospatial resource (SHP, KML, GeoJSON) into DuckDB with spatial support.

Args: resource_id: Prefixed resource ID (e.g. "toronto:abc123") or bare ID force_refresh: Re-download even if cached

spatial_queryA

Run spatial queries against cached geospatial data.

Args: resource_id: Resource ID (must be cached via load_geodata) operation: "contains_point", "within_bbox", or "within_radius" latitude: Latitude for point queries (-90 to 90) longitude: Longitude for point queries (-180 to 180) radius_km: Radius in kilometers (for within_radius, must be > 0) bbox: Bounding box as [min_lng, min_lat, max_lng, max_lat] (for within_bbox) limit: Max results

list_geo_datasetsA

Find all datasets that contain geospatial resources (SHP, KML, GeoJSON).

Args: format_filter: Filter to specific format: "SHP", "KML", "GEOJSON", or None for all limit: Max results per portal portal: Narrow to one portal. Default: all portals.

Prompts

Interactive templates invoked by user choice

NameDescription
explore_topicGuided exploration of a topic across all open data portals. Searches for datasets, summarizes what's available, and suggests deep dives.
data_investigationDeep investigation of a specific dataset: schema, quality, statistics, insights.
compare_dataSide-by-side analysis of multiple datasets (comma-separated IDs, can be cross-portal).

Resources

Contextual data attached and managed by the client

NameDescription
cache_indexList of all locally cached datasets with freshness info.
portal_statsOverview statistics across all data portals.
duckdb_sql_guideDuckDB SQL reference for Ontario open data analysis.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/sprine/ontario-data-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server