ontario-data-mcp
Server Configuration
Describes the environment variables required to run the server.
| Name | Required | Description | Default |
|---|---|---|---|
| ONTARIO_DATA_TIMEOUT | No | HTTP timeout in seconds | 30 |
| ONTARIO_DATA_CACHE_DIR | No | DuckDB storage + log file location | ~/.cache/ontario-data |
| ONTARIO_DATA_RATE_LIMIT | No | Max CKAN requests per second | 10 |
Capabilities
Features and capabilities supported by this server
| Capability | Details |
|---|---|
| tools | {
"listChanged": true
} |
| prompts | {
"listChanged": false
} |
| resources | {
"subscribe": false,
"listChanged": false
} |
| extensions | {
"io.modelcontextprotocol/ui": {}
} |
| experimental | {} |
Tools
Functions exposed to the LLM to take actions
| Name | Description |
|---|---|
| search_datasetsA | Search for datasets across all open data portals (Ontario, Toronto, Ottawa). Results include portal-prefixed IDs (e.g. toronto:abc123). Use these IDs with get_dataset_info to view full metadata, then download_resource to cache locally. Use get_resource_schema to check column types before writing queries. Searches fan out to all portals by default. Narrow to one portal with the portal param. Args: query: Search terms (e.g. "covid cases", "housing prices", "school enrollment") organization: Filter by ministry/org (e.g. "health", "education") resource_format: Filter by file format (e.g. "CSV", "JSON", "SHP") update_frequency: Filter by frequency (e.g. "yearly", "monthly", "daily") sort_by: Sort order (default: relevance) limit: Max results per portal (1-50) portal: Narrow to one portal (e.g. "ontario", "toronto"). Default: all portals. |
| list_organizationsA | List government ministries and organizations with dataset counts across all portals. Args: include_counts: Include dataset counts per organization portal: Narrow to one portal. Default: all portals. |
| list_topicsA | List top tags/topics used across data portals, ranked by dataset count. Args: query: Optional filter to match tag names portal: Narrow to one portal. Default: all portals. limit: Max tags to return (default 100). Use a higher value to see more. |
| find_related_datasetsA | Find datasets related to a given dataset by shared tags and organization. Searches within the same portal as the source dataset only. Args: dataset_id: Prefixed dataset ID (e.g. "toronto:ttc-ridership") or bare ID limit: Max related datasets to return |
| list_portalsA | List all available data portals with their platform type and descriptions. |
| get_dataset_infoA | Get full metadata for a dataset including all resources. Args: dataset_id: Prefixed dataset ID (e.g. "toronto:ttc-ridership") or bare ID |
| list_resourcesA | List all resources (files) in a dataset with their formats and sizes. Args: dataset_id: Prefixed dataset ID (e.g. "toronto:ttc-ridership") or bare ID |
| get_resource_schemaA | Get the column schema and sample values for a datastore resource. Args: resource_id: Prefixed resource ID (e.g. "toronto:abc123") or bare ID sample_size: Number of sample rows to include |
| compare_datasetsA | Compare metadata side-by-side for multiple datasets (can be cross-portal). Args: dataset_ids: List of prefixed dataset IDs (e.g. ["toronto:abc", "ontario:def"]) to compare (2-5) |
| download_resourceA | Download a dataset resource and cache it locally in DuckDB for fast querying. Supports CSV, XLSX, JSON, and datastore-active resources. If already cached, returns staleness info so you can decide whether to refresh. Numeric columns stored as text are automatically cast to DOUBLE. Workflow: search_datasets → get_dataset_info → download_resource → query_cached. Args: resource_id: Prefixed resource ID (e.g. "toronto:abc123") or bare ID |
| cache_infoA | Get cache statistics and list all cached datasets. Returns size, table count, and details for every cached resource. |
| cache_manageA | Manage the local DuckDB cache: remove or clear cached data. Args: action: One of "remove" (single resource) or "clear" (all) resource_id: Required for "remove" action. Prefixed or bare ID accepted. |
| refresh_cacheA | Re-download cached resources to get the latest data. Args: resource_id: Specific resource to refresh (prefixed or bare ID), or omit to refresh all |
| query_resourceA | Query a resource via the CKAN Datastore API (remote, no download needed). Only works for resources with datastore_active=True. Args: resource_id: Prefixed resource ID (e.g. "toronto:abc123") or bare ID filters: Column filters as {column: value} pairs fields: List of columns to return (default: all) sort: Sort string (e.g. "date desc", "name asc") limit: Max rows (1-1000) offset: Row offset for pagination |
| sql_queryA | Run a SQL query against the CKAN Datastore (remote). NOTE: Prefer download_resource + query_cached for repeated queries — the remote API has rate limits (429 errors). Use this tool only for quick one-off queries on datastore-active resources. Use resource IDs as table names in double quotes. Example: SELECT "Column Name" FROM "resource-id-here" WHERE "Year" > 2020 LIMIT 10 Args: sql: SQL query string (read-only, SELECT only) portal: Portal to query (default: "ontario"). Required because SQL embeds resource IDs directly. |
| query_cachedA | Run a SQL query against locally cached data in DuckDB. Use table names from download_resource or cache_info. Supports full DuckDB SQL: aggregations, window functions, CTEs, JOINs across tables. Use SUM(quantity_col) not COUNT() when rows contain per-row counts (e.g. a "count" or "number_of" column). COUNT() counts rows, not quantities. Column names vary across resources in the same dataset — always DESCRIBE first. Use SELECT * RENAME ("old col" AS new_col) to normalize column names when joining across datasets. Values containing semicolons should be matched with LIKE patterns, not = equality. Quote table names with double quotes in SQL. Args: sql: SQL query (e.g. SELECT * FROM "ds_my_table_abc12345" LIMIT 10) |
| preview_dataA | Quick preview of the first N rows of a resource (fetched remotely). Args: resource_id: Prefixed resource ID (e.g. "toronto:abc123") or bare ID rows: Number of rows to preview (1-100) |
| check_freshnessA | Check if a dataset is current by comparing its update frequency to its last modification date. Args: dataset_id: Prefixed dataset ID (e.g. "toronto:ttc-ridership") or bare ID |
| profile_dataA | Statistical profile and quality check of a cached dataset. Uses DuckDB SUMMARIZE for column-level statistics (min, max, avg, std, nulls, unique counts). Also checks for duplicate rows. Args: resource_id: Resource ID (must be cached) |
| validate_resultA | Validate that a claim is supported by query results. Call this after making a data claim to verify it against the source. Re-executes the SQL, extracts numbers and terms from the claim, and checks them against the actual data. Args: sql: The SQL query that produced the data backing the claim claim: The natural-language claim to verify (e.g. "Toronto had 12,345 building permits in 2023") |
| load_geodataA | Download and cache a geospatial resource (SHP, KML, GeoJSON) into DuckDB with spatial support. Args: resource_id: Prefixed resource ID (e.g. "toronto:abc123") or bare ID force_refresh: Re-download even if cached |
| spatial_queryA | Run spatial queries against cached geospatial data. Args: resource_id: Resource ID (must be cached via load_geodata) operation: "contains_point", "within_bbox", or "within_radius" latitude: Latitude for point queries (-90 to 90) longitude: Longitude for point queries (-180 to 180) radius_km: Radius in kilometers (for within_radius, must be > 0) bbox: Bounding box as [min_lng, min_lat, max_lng, max_lat] (for within_bbox) limit: Max results |
| list_geo_datasetsA | Find all datasets that contain geospatial resources (SHP, KML, GeoJSON). Args: format_filter: Filter to specific format: "SHP", "KML", "GEOJSON", or None for all limit: Max results per portal portal: Narrow to one portal. Default: all portals. |
Prompts
Interactive templates invoked by user choice
| Name | Description |
|---|---|
| explore_topic | Guided exploration of a topic across all open data portals. Searches for datasets, summarizes what's available, and suggests deep dives. |
| data_investigation | Deep investigation of a specific dataset: schema, quality, statistics, insights. |
| compare_data | Side-by-side analysis of multiple datasets (comma-separated IDs, can be cross-portal). |
Resources
Contextual data attached and managed by the client
| Name | Description |
|---|---|
| cache_index | List of all locally cached datasets with freshness info. |
| portal_stats | Overview statistics across all data portals. |
| duckdb_sql_guide | DuckDB SQL reference for Ontario open data analysis. |
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/sprine/ontario-data-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server