socrata-mcp-server
Server Details
Search and query government open-data portals (Socrata SODA API).
- Status
- Healthy
- Last Tested
- Transport
- Streamable HTTP
- URL
- Repository
- cyanheads/socrata-mcp-server
- GitHub Stars
- 1
Glama MCP Gateway
Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.
Full call logging
Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.
Tool access control
Enable or disable individual tools per connector, so you decide what your agents can and cannot do.
Managed credentials
Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.
Usage analytics
See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.
Tool Definition Quality
Average 4.5/5 across 6 of 6 tools scored.
Each tool targets a distinct action: listing portals, searching datasets, fetching metadata, querying, and DataCanvas operations. No overlap in purpose.
All tools follow the 'socrata_verb_noun' pattern in snake_case, consistent and predictable.
6 tools cover the full workflow from portal discovery to dataset querying and large-result handling, well-scoped for the domain.
Covers portal listing, dataset search, metadata retrieval, query execution, and DataCanvas integration. No obvious gaps for standard Socrata interactions.
Available Tools
6 toolssocrata_dataframe_describeDescribe DataCanvas TablesARead-onlyIdempotentInspect
List registered tables in a DataCanvas session — schema, row count, column names, and registration time. Shows what datasets are available for SQL queries via socrata_dataframe_query. Only meaningful when CANVAS_PROVIDER_TYPE=duckdb is set. Use after socrata_query_dataset spills a large result set to canvas.
| Name | Required | Description | Default |
|---|---|---|---|
| canvas_id | No | Canvas ID returned from socrata_query_dataset. Omit to list all tables visible in the current session. |
Output Schema
| Name | Required | Description |
|---|---|---|
| tables | Yes | Tables available for SQL queries. Empty when none registered. |
| message | No | Status message when canvas is not enabled or no tables are registered. Absent when tables are present. |
| canvas_id | No | Canvas ID resolved, when canvas is enabled. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint and idempotentHint. The description adds context: the dependency on DuckDB environment, the intended sequencing after socrata_query_dataset, and the specific output fields (schema, row count, column names, registration time). This enriches the agent's understanding beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is three sentences, each serving a distinct purpose: stating what it does, linking to a sibling tool, and providing a usage condition and sequence. It is front-loaded with the core functionality and has no redundant words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the simplicity of the tool (one optional parameter, output schema present), the description covers purpose, usage context, and a prerequisite condition. It does not discuss error cases or authentication, but these are minor omissions for a read-only listing tool. The presence of an output schema means return values need not be detailed.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% and the schema description for canvas_id is already informative: 'Canvas ID returned from socrata_query_dataset. Omit to list all tables visible in the current session.' The tool description does not add further parameter details beyond the schema, so the baseline score of 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool lists registered tables in a DataCanvas session, providing schema, row count, column names, and registration time. It explicitly relates to socrata_dataframe_query, distinguishing it from sibling tools like socrata_find_datasets or socrata_query_dataset.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides specific usage guidance: 'Use after socrata_query_dataset spills a large result set to canvas' and 'Only meaningful when CANVAS_PROVIDER_TYPE=duckdb is set.' It implies an order of operations but does not compare to alternatives or state when not to use it, which would be needed for a 5.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
socrata_dataframe_queryQuery DataCanvas TableARead-onlyIdempotentInspect
Run SELECT-only SQL against a DataCanvas table populated by socrata_query_dataset. DuckDB infers types from spilled data, so numeric columns that SODA returned as strings become queryable with numeric comparisons (year > 2020, amount < 500). Only works when CANVAS_PROVIDER_TYPE=duckdb is set. Use socrata_dataframe_describe to see registered tables and their schemas.
| Name | Required | Description | Default |
|---|---|---|---|
| sql | Yes | SELECT-only SQL to run against registered canvas tables. DDL, DML, and file-reading functions are rejected. Use table names from socrata_dataframe_describe. | |
| limit | No | Max rows to return (1–10000). Default 1000. | |
| canvas_id | Yes | Canvas ID returned from socrata_query_dataset or socrata_dataframe_describe. |
Output Schema
| Name | Required | Description |
|---|---|---|
| sql | Yes | SQL that was executed. |
| rows | Yes | Query result rows. DuckDB may return native JS types (number, boolean, null) for numeric/boolean columns. |
| canvas_id | Yes | Canvas ID queried. |
| row_count | Yes | Number of rows returned. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already mark the tool as read-only and idempotent. The description adds behavioral details about DuckDB type inference (e.g., numeric columns from string data become queryable with comparisons) and reinforces the SELECT-only restriction, providing additional context beyond the annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise, consisting of three sentences that front-load the primary purpose and essential usage condition, with no redundant information. Every sentence contributes to the tool's understanding.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the presence of an output schema and high schema coverage, the description adequately covers prerequisites, type inference behavior, and complementary tools. It may lack details on error handling or performance, but for a query tool it is sufficiently complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Input schema has 100% description coverage for all three parameters. The description adds value by clarifying that SQL must be SELECT-only and that table names come from socrata_dataframe_describe, complementing the schema without repeating it.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool runs SELECT-only SQL against a DataCanvas table, specifying the resource and the type of operation. It distinguishes from sibling tools like socrata_dataframe_describe and socrata_query_dataset by noting that it works on tables populated by the latter, making its purpose unambiguous.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use the tool by stating it only works when CANVAS_PROVIDER_TYPE=duckdb is set, and directs to socrata_dataframe_describe for schema exploration. However, it does not explicitly note when not to use it, though the condition is clear.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
socrata_find_datasetsFind Socrata DatasetsARead-onlyIdempotentInspect
Search for datasets across all Socrata-powered government open-data portals, or scope to one portal with the domain parameter. Returns dataset IDs, names, abbreviated column lists, domains, and update timestamps. Use socrata_get_dataset to fetch the full typed column schema before writing queries — columnNames here are preview-only and lack type information.
| Name | Required | Description | Default |
|---|---|---|---|
| only | No | Filter by asset type. Omit to include all types. Usually "datasets" is what you want. | |
| tags | No | Filter by tags (e.g. ["covid19", "permits"]). | |
| limit | No | Number of results to return (1–100). Default 10. | |
| order | No | Sort order. Defaults to relevance. Use updated_at to surface recently-refreshed datasets. | |
| query | No | Full-text search across dataset names and descriptions. Omit to browse without filtering. | |
| domain | No | Scope search to a single portal (e.g. data.seattle.gov, data.cityofnewyork.us). Omit to search all portals. | |
| offset | No | Pagination offset. Default 0. | |
| categories | No | Filter by domain categories (e.g. ["Public Safety", "Transportation"]). |
Output Schema
| Name | Required | Description |
|---|---|---|
| query | No | Search query applied, for reference. |
| message | No | Recovery hint when results are empty — echoes filters and suggests how to broaden. Absent on non-empty result pages. |
| results | Yes | Matching datasets. Empty when no results. |
| total_count | Yes | Total matches before pagination. 0 when empty. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already provide readOnlyHint, openWorldHint, and idempotentHint. The description adds that column names lack type information (preview-only), which is critical behavioral context beyond annotations. No contradictions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is two well-structured sentences: the first states purpose and output, the second gives usage guidance. No filler, every sentence earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 8 optional parameters, full schema coverage, and an output schema, the description adequately covers return values, scoping, and limitation warnings. No gaps remain.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with detailed parameter descriptions (enum values, defaults, ranges). The description references the 'domain' parameter but does not add significant meaning beyond the schema. Baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool searches for datasets across Socrata portals, lists return fields (IDs, names, columns, etc.), and contrasts with sibling socrata_get_dataset. The verb 'search' and resource 'datasets' are explicit and distinguishable.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly advises to use socrata_get_dataset for full schema before writing queries, and notes that columnNames here are preview-only. This is a clear when-to-use and when-not-to-use directive, referencing an alternative sibling.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
socrata_get_datasetGet Dataset SchemaARead-onlyIdempotentInspect
Fetch full metadata and column schema for a Socrata dataset by ID. Returns field names, data types, descriptions, row count, and licensing. Always call this before writing a socrata_query_dataset — the column types determine correct WHERE clause syntax: Number columns accept bare literals (year=2023) while Text columns require single-quoted strings (year='2023').
| Name | Required | Description | Default |
|---|---|---|---|
| domain | No | Portal domain (e.g. data.seattle.gov). Defaults to SOCRATA_DEFAULT_DOMAIN env var or data.seattle.gov. | |
| dataset_id | Yes | Four-by-four dataset ID matching pattern like kzjm-xkqj. Obtain from socrata_find_datasets. |
Output Schema
| Name | Required | Description |
|---|---|---|
| name | Yes | Dataset display name. |
| tags | Yes | Associated tags. |
| domain | Yes | Portal domain hosting this dataset. |
| columns | Yes | Column schema. Computed region columns (:@computed_region_*) are excluded to reduce noise. |
| license | No | License name when available. |
| category | No | Domain category when available. |
| row_count | No | Approximate row count when available. |
| dataset_id | Yes | Four-by-four dataset ID. |
| description | No | Dataset description when available. |
| data_updated_at | No | ISO 8601 timestamp of last data update when available. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnlyHint=true and idempotentHint=true. The description adds valuable behavioral context: returns field names, data types, descriptions, row count, and licensing. Also warns about query implications. No contradiction.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences, front-loaded with purpose. Second sentence provides essential guidance. No redundant information. Every word earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the presence of an output schema, description does not need to detail return values. It already mentions key return fields and provides critical usage context. The tool is simple and well-described.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Input schema has 100% coverage with descriptions for both parameters. Description adds beyond schema: explains domain default behavior and source for dataset_id (from socrata_find_datasets). This adds meaningful context, justifying a score above baseline 3.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool fetches full metadata and column schema for a Socrata dataset by ID. The verb 'Fetch' and resource 'metadata and column schema' are specific. It distinguishes from siblings like socrata_find_datasets (which finds datasets) and socrata_query_dataset (which queries data).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Description explicitly directs 'Always call this before writing a socrata_query_dataset' and explains why: column types determine correct WHERE clause syntax. Provides concrete examples (Number vs Text columns). This is excellent usage guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
socrata_list_portalsList Socrata PortalsARead-onlyIdempotentInspect
List known Socrata-powered government open-data portals with their domain, organization name, and dataset count. Backed by the Discovery API domains catalog. Filtering is client-side substring match on the query parameter. Use this first when you do not know which portal to target, then pass the domain to socrata_find_datasets.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Max portals to return (1–200). Default 50. | |
| query | No | Keyword to filter portal names or organization names (case-insensitive substring match). Omit to list all portals. | |
| offset | No | Pagination offset. Default 0. |
Output Schema
| Name | Required | Description |
|---|---|---|
| message | No | Recovery hint when no portals matched the filter. Absent on non-empty pages. |
| portals | Yes | Matching portals. Empty when no results. |
| total_count | Yes | Total portals before pagination. 0 when empty. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Discloses that filtering is client-side substring match, adding behavioral context beyond annotations (readOnlyHint, idempotentHint). Could also mention data freshness or performance implications, but overall informative.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two focused sentences with front-loaded purpose and immediate usage guidance. No wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 100% schema description coverage, output schema present, and annotations covering safety, the description leaves no gaps: explains what is returned, client-side filtering, and tool workflow.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema covers all parameters; description adds value by specifying that query filtering is client-side and clarifying use of limit/offset. Enhances understanding beyond schema alone.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clearly states the verb 'List' and resource 'portals', specifies returned fields (domain, organization name, dataset count), and distinguishes itself from siblings like socrata_find_datasets by being the first step when portal is unknown.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly advises to 'Use this first when you do not know which portal to target, then pass the domain to socrata_find_datasets', providing clear when-to-use and next step.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
socrata_query_datasetQuery DatasetARead-onlyIdempotentInspect
Execute a SoQL query against any dataset on any Socrata portal. Use the search parameter for quick full-text lookup, or combine select/where/group/having/order for full analytical control. Returns rows plus the assembled SoQL string so you can learn the pattern. All SODA 2.1 row values are strings even for numeric columns — check dataType from socrata_get_dataset to determine correct WHERE quoting: Number columns use bare literals (year=2023), Text columns use single-quoted strings (year='2023'). To enumerate distinct values, use select="col, count(*) as n" with group="col" and order="n DESC". When CANVAS_PROVIDER_TYPE=duckdb and rows fill the limit, results spill to a DataCanvas table for SQL-based analysis.
| Name | Required | Description | Default |
|---|---|---|---|
| group | No | SoQL GROUP BY clause. Requires an aggregate function in select. | |
| limit | No | Max rows to return (1–5000). Default 100. Use with offset for pagination. | |
| order | No | SoQL ORDER BY clause, e.g. "total_deaths DESC" or "date ASC". | |
| where | No | SoQL WHERE clause. Check column dataType from socrata_get_dataset first — Number columns: year=2023, Text columns: year='2023'. Operators: =, !=, >, <, LIKE, IN(...), BETWEEN, IS NULL, starts_with(), contains(), AND, OR, NOT. | |
| domain | No | Portal domain (e.g. data.seattle.gov). Defaults to SOCRATA_DEFAULT_DOMAIN or data.seattle.gov. | |
| having | No | SoQL HAVING clause. Filters on aggregated results, e.g. count > 100. | |
| offset | No | Row offset for pagination. Default 0. | |
| search | No | Full-text search across all text columns ($q). For field-specific filtering, use where instead. | |
| select | No | SoQL SELECT clause — column names, aliases, aggregates: "state, sum(deaths) as total_deaths". Omit for all columns. | |
| canvas_id | No | Optional 10-char DataCanvas token from a prior call. Omit on first call when CANVAS_PROVIDER_TYPE=duckdb to mint a fresh canvas. Large result sets spill here automatically. | |
| dataset_id | Yes | Four-by-four dataset ID (e.g. kzjm-xkqj). Obtain from socrata_find_datasets. |
Output Schema
| Name | Required | Description |
|---|---|---|
| rows | Yes | Result rows. Scalar values are strings (SODA 2.1); geo/location columns return nested objects. Use column schema from socrata_get_dataset for type context. |
| domain | Yes | Portal domain queried. |
| canvas_id | No | DataCanvas token when results spilled (requires CANVAS_PROVIDER_TYPE=duckdb). Pass to socrata_dataframe_query for SQL over the full result set. |
| row_count | Yes | Rows returned in this response. |
| dataset_id | Yes | Dataset ID queried. |
| total_count | No | Total matching rows when result is truncated (row_count < total_count). Absent when the full result fits. |
| assembled_query | Yes | SoQL clauses assembled for this request — useful for learning the syntax. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate read-only and idempotent. The description adds significant behavioral details: return of SoQL string, string formatting for all values (including numeric), dependence on dataType for quoting, and spillover to DataCanvas under specific conditions. No contradictions with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is detailed but well-structured, front-loading the core purpose. Every sentence adds useful information, though the length could be trimmed slightly without losing meaning. It earns its space with critical quoting and spillover details.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (11 parameters, output schema present), the description covers all essential behavioral aspects: quoting rules, spillover mechanism, return of SoQL string, and caveats about string values. No major gaps are apparent.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema descriptions cover all parameters (100% coverage), providing a baseline of 3. The description adds extra context for several parameters, such as search being full-text ($q), domain defaulting to SOCRATA_DEFAULT_DOMAIN, and canvas_id usage for spillover, which increases value.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool executes SoQL queries on any Socrata portal, specifying both quick search and full analytical control. However, it does not explicitly differentiate from the sibling tool socrata_dataframe_query, which likely also performs queries.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Guidance is provided on when to use search vs. full SoQL clauses, and tips for quoting and distinct enumeration. It also mentions spillover behavior. However, it lacks explicit when-not-to-use or comparisons with sibling tools like socrata_dataframe_query.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:
{
"$schema": "https://glama.ai/mcp/schemas/connector.json",
"maintainers": [{ "email": "your-email@example.com" }]
}The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.
Control your server's listing on Glama, including description and metadata
Access analytics and receive server usage reports
Get monitoring and health status updates for your server
Feature your server to boost visibility and reach more users
For users:
Full audit trail – every tool call is logged with inputs and outputs for compliance and debugging
Granular tool control – enable or disable individual tools per connector to limit what your AI agents can do
Centralized credential management – store and rotate API keys and OAuth tokens in one place
Change alerts – get notified when a connector changes its schema, adds or removes tools, or updates tool definitions, so nothing breaks silently
For server owners:
Proven adoption – public usage metrics on your listing show real-world traction and build trust with prospective users
Tool-level analytics – see which tools are being used most, helping you prioritize development and documentation
Direct user feedback – users can report issues and suggest improvements through the listing, giving you a channel you would not have otherwise
The connector status is unhealthy when Glama is unable to successfully connect to the server. This can happen for several reasons:
The server is experiencing an outage
The URL of the server is wrong
Credentials required to access the server are missing or invalid
If you are the owner of this MCP connector and would like to make modifications to the listing, including providing test credentials for accessing the server, please contact support@glama.ai.
Discussions
No comments yet. Be the first to start the discussion!