Skip to main content
Glama

socrata-mcp-server

Server Details

Search and query government open-data portals (Socrata SODA API).

Status
Healthy
Last Tested
Transport
Streamable HTTP
URL
Repository
cyanheads/socrata-mcp-server
GitHub Stars
1

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client
Glama
MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.
Tool DescriptionsA

Average 4.5/5 across 6 of 6 tools scored.

Server CoherenceA
Disambiguation5/5

Each tool has a clear, distinct purpose: listing portals, searching datasets, fetching metadata, querying, and handling large results via DataFrame. No overlap or ambiguity.

Naming Consistency5/5

All tools follow the consistent pattern 'socrata_<verb>_<noun>' in snake_case. The two dataframe tools slightly diverge but still adhere to the same structure, making it predictable.

Tool Count5/5

With 6 tools, the server is well-scoped for the domain of accessing Socrata open data portals. Each tool serves a necessary step in the workflow without redundancy or missing coverage.

Completeness5/5

The tool set covers the full lifecycle: discover portals, find datasets, examine schemas, query data, and handle large results via DataFrame. No obvious gaps for the stated purpose.

Available Tools

6 tools
socrata_dataframe_describeDescribe DataCanvas TablesA
Read-onlyIdempotent
Inspect

List registered tables in a DataCanvas session — schema, row count, column names, and registration time. Shows what datasets are available for SQL queries via socrata_dataframe_query. Only meaningful when CANVAS_PROVIDER_TYPE=duckdb is set. Use after socrata_query_dataset spills a large result set to canvas.

ParametersJSON Schema
NameRequiredDescriptionDefault
canvas_idNoCanvas ID returned by socrata_query_dataset when a large result spills to canvas. Required in practice when canvas is enabled — canvases cannot be enumerated, so omitting it fails with canvas_id_required instead of listing tables.

Output Schema

ParametersJSON Schema
NameRequiredDescription
noticeNoStatus message when canvas is not enabled or no tables are registered. Absent when tables are present.
tablesYesTables available for SQL queries. Empty when none registered.
canvas_idNoCanvas ID resolved, when canvas is enabled.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint and idempotentHint. The description adds value by explaining the prerequisite (duckdb) and error behavior when canvas_id is omitted. It does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, each conveying essential information without redundancy. It is front-loaded with the primary purpose and efficiently covers prerequisites and usage context.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple parameter, existing output schema, and clear annotations, the description sufficiently covers usage context, prerequisites, and relationship to sibling tools (socrata_dataframe_query and socrata_query_dataset).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description does not add additional meaning to the parameter beyond what the schema provides; it only references the parameter indirectly.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists registered tables in a DataCanvas session, specifying details like schema, row count, column names, and registration time. It differentiates from siblings by mentioning that the listed tables are available for SQL queries via socrata_dataframe_query.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit context: 'Only meaningful when CANVAS_PROVIDER_TYPE=duckdb is set' and 'Use after socrata_query_dataset spills a large result set to canvas.' This tells the agent when to use the tool, though it could be more explicit about when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

socrata_dataframe_queryQuery DataCanvas TableA
Read-onlyIdempotent
Inspect

Run SELECT-only SQL against a DataCanvas table populated by socrata_query_dataset. DuckDB infers types from spilled data, so numeric columns that SODA returned as strings become queryable with numeric comparisons (year > 2020, amount < 500). Only works when CANVAS_PROVIDER_TYPE=duckdb is set. Use socrata_dataframe_describe to see registered tables and their schemas.

ParametersJSON Schema
NameRequiredDescriptionDefault
sqlYesSELECT-only SQL to run against registered canvas tables. DDL, DML, and file-reading functions are rejected. Use table names from socrata_dataframe_describe.
limitNoMax rows to return (1–10000). Default 1000.
canvas_idYesCanvas ID returned from socrata_query_dataset or socrata_dataframe_describe.

Output Schema

ParametersJSON Schema
NameRequiredDescription
capNoThe row limit that was applied when capped.
sqlYesSQL that was executed.
rowsYesQuery result rows. DuckDB may return native JS types (number, boolean, null) for numeric/boolean columns.
shownNoRows returned in this response when capped.
noticeNoGuidance when the SQL returned zero rows. Absent when rows are present.
canvas_idYesCanvas ID queried.
row_countYesNumber of rows returned.
truncatedNoTrue when results were capped at the limit — more rows match the query.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint, idempotentHint), the description adds valuable behavioral details: DuckDB type inference from spilled data making numeric columns queryable, and that DDL, DML, and file-reading functions are rejected. This gives the agent important expectations about tool execution.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the core action and a key benefit, followed by a condition and sibling reference. No waste, every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with an output schema, the description covers purpose, usage condition, behavioral detail, and a pointer to a complementary tool. It is complete given the tool's complexity and the richness of annotations and schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description repeats the schema's guidance for the 'sql' parameter (use table names from socrata_dataframe_describe) and adds DuckDB inference context, but does not provide new parameter-specific semantics beyond what is already in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function: 'Run SELECT-only SQL against a DataCanvas table populated by socrata_query_dataset.' It specifies the verb (Run SQL), the resource (DataCanvas table), and the condition (SELECT-only). It differentiates from siblings by referencing socrata_query_dataset for population and socrata_dataframe_describe for schema discovery.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives explicit usage context: 'Only works when CANVAS_PROVIDER_TYPE=duckdb is set' and recommends using socrata_dataframe_describe to find tables. It implies when to use (after populating with socrata_query_dataset) but does not explicitly state when not to use or list alternatives, though sibling tools are provided separately.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

socrata_find_datasetsFind Socrata DatasetsA
Read-onlyIdempotent
Inspect

Search for datasets across all Socrata-powered government open-data portals, or scope to one portal with the domain parameter. Returns dataset IDs, names, abbreviated column lists, domains, and update timestamps. Use socrata_get_dataset to fetch the full typed column schema before writing queries — columnNames here are preview-only and lack type information.

ParametersJSON Schema
NameRequiredDescriptionDefault
onlyNoFilter by asset type. Omit to include all types. Usually "datasets" is what you want.
tagsNoFilter by tags (e.g. ["covid19", "permits"]).
limitNoNumber of results to return (1–100). Default 10.
orderNoSort order. Defaults to relevance. Use updated_at to surface recently-refreshed datasets.
queryNoFull-text search across dataset names and descriptions. Omit to browse without filtering.
domainNoScope search to a single portal (e.g. data.seattle.gov, data.cityofnewyork.us). Omit to search all portals.
offsetNoPagination offset. Default 0.
categoriesNoFilter by domain categories (e.g. ["Public Safety", "Transportation"]).

Output Schema

ParametersJSON Schema
NameRequiredDescription
noticeNoRecovery hint when results are empty — echoes filters and suggests how to broaden. Absent on non-empty result pages.
resultsYesMatching datasets. Empty when no results.
totalCountYesTotal matches before pagination. 0 when empty.
effectiveQueryNoSearch query applied, for reference.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, idempotentHint. The description adds behavioral context: it returns dataset IDs, names, abbreviated column lists, domains, and update timestamps, and warns that columnNames are preview-only and lack type information. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first states core purpose with options, second gives critical usage guidance linking to sibling. Every sentence earns its place; no wasted words. Front-loaded with the primary action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 8 optional parameters, full schema coverage, and presence of output schema, the description covers purpose, return values, and necessary guidance (e.g., use sibling for full schema). It doesn't need to explain return values as output schema exists. A minor gap is no mention of pagination beyond offset/limit, but not critical.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, baseline 3. The description adds value by explaining domain scopes to a single portal, order defaults to relevance but suggests updated_at for recent datasets, and only filter 'usually datasets is what you want.'

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses specific verbs ('search for datasets') and clearly states the resource (Socrata datasets across all portals or scoped to a domain). It distinguishes from sibling tool socrata_get_dataset by noting that this tool returns preview-only column lists without type information, while the sibling fetches full typed schema.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear guidance: use this tool to find datasets, then use socrata_get_dataset to get full schema before writing queries. It explains the domain parameter for scoping and gives tips on sort order. It does not explicitly list when not to use, but the contrast with siblings is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

socrata_get_datasetGet Dataset SchemaA
Read-onlyIdempotent
Inspect

Fetch full metadata and column schema for a Socrata dataset by ID. Returns field names, data types, descriptions, row count, and licensing. Always call this before writing a socrata_query_dataset — the column types determine correct WHERE clause syntax: Number columns accept bare literals (year=2023) while Text columns require single-quoted strings (year='2023').

ParametersJSON Schema
NameRequiredDescriptionDefault
domainNoPortal domain (e.g. data.seattle.gov). Defaults to SOCRATA_DEFAULT_DOMAIN env var or data.seattle.gov.
dataset_idYesFour-by-four dataset ID matching pattern like kzjm-xkqj. Obtain from socrata_find_datasets.

Output Schema

ParametersJSON Schema
NameRequiredDescription
nameYesDataset display name.
tagsYesAssociated tags.
domainYesPortal domain hosting this dataset.
columnsYesColumn schema. Computed region columns (:@computed_region_*) are excluded to reduce noise.
licenseNoLicense name when available.
categoryNoDomain category when available.
row_countNoApproximate row count when available. See row_count_source for provenance.
dataset_idYesFour-by-four dataset ID.
descriptionNoDataset description when available.
data_updated_atNoISO 8601 timestamp of last data update when available.
row_count_sourceNoHow row_count was obtained: 'top_level_cached_contents' — reported directly by the portal's views metadata; 'column_cached_contents' — derived as the maximum per-column cached count when the top-level value is absent. Absent when row_count is absent.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true; description adds value by explaining the output's role in query construction. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with purpose, every word earns its place. No fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given output schema exists, description appropriately focuses on usage context (precondition from socrata_find_datasets, post-usage for queries). Might benefit from mentioning error cases, but overall complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Both parameters have schema descriptions (100% coverage). Description adds context: domain default behavior and that dataset_id comes from socrata_find_datasets, enhancing schema info.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description explicitly states 'Fetch full metadata and column schema for a Socrata dataset by ID', listing specific return fields. It clearly differentiates from sibling tools by specifying its role before socrata_query_dataset.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit guidance: 'Always call this before writing a socrata_query_dataset' and explains how column types affect query syntax. Does not explicitly state when not to use, but context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

socrata_list_portalsList Socrata PortalsA
Read-onlyIdempotent
Inspect

List known Socrata-powered government open-data portals with their domain, organization name, and approximate dataset count. The catalog is a curated list of 40 well-known portals; dataset counts are fetched from the Discovery API and cached for ~24 hours. Filtering is client-side substring match on the query parameter. Use this first when you do not know which portal to target, then pass the domain to socrata_find_datasets.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMax portals to return (1–200). Default 50.
queryNoKeyword to filter portal names or organization names (case-insensitive substring match). Omit to list all portals.
offsetNoPagination offset. Default 0.

Output Schema

ParametersJSON Schema
NameRequiredDescription
noticeNoRecovery hint when no portals matched the filter. Absent on non-empty pages.
portalsYesMatching portals. Empty when no results.
totalCountYesTotal portals before pagination. 0 when empty.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint and idempotentHint. The description adds valuable context: curated list of 40 portals, client-side substring filtering, and 24-hour caching of dataset counts. This goes beyond annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences: purpose, behavior details, usage guidance. No wasted words, well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given full schema coverage, annotations, and an output schema, the description perfectly complements the structured data. Caching and filtering details make it complete for an agent to decide when and how to invoke.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers all 3 parameters with descriptions. The description adds value by explaining the query parameter does case-insensitive substring match on portal or org name, bridging schema to real usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists Socrata portals with domain, org name, and dataset count. It distinguishes from sibling tools like socrata_find_datasets, which targets datasets within a portal.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises using this tool first when unsure of the portal, then passing the domain to socrata_find_datasets. Also describes filtering and caching behavior, guiding proper use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

socrata_query_datasetQuery DatasetA
Read-onlyIdempotent
Inspect

Execute a SoQL query against any dataset on any Socrata portal. Use the search parameter for quick full-text lookup, or combine select/where/group/having/order for full analytical control. Returns rows plus the assembled SoQL string so you can learn the pattern. All SODA 2.1 row values are strings even for numeric columns — check dataType from socrata_get_dataset to determine correct WHERE quoting: Number columns use bare literals (year=2023), Text columns use single-quoted strings (year='2023'). To enumerate distinct values, use select="col, count(*) as n" with group="col" and order="n DESC". When CANVAS_PROVIDER_TYPE=duckdb and rows fill the limit, results spill to a DataCanvas table for SQL-based analysis.

ParametersJSON Schema
NameRequiredDescriptionDefault
groupNoSoQL GROUP BY clause. Requires an aggregate function in select.
limitNoMax rows to return (1–5000). Default 100. Use with offset for pagination.
orderNoSoQL ORDER BY clause, e.g. "total_deaths DESC" or "date ASC".
whereNoSoQL WHERE clause. Check column dataType from socrata_get_dataset first — Number columns: year=2023, Text columns: year='2023'. Operators: =, !=, >, <, LIKE, IN(...), BETWEEN, IS NULL, starts_with(), contains(), AND, OR, NOT.
domainNoPortal domain (e.g. data.seattle.gov). Defaults to SOCRATA_DEFAULT_DOMAIN or data.seattle.gov.
havingNoSoQL HAVING clause. Filters on aggregated results, e.g. count > 100.
offsetNoRow offset for pagination. Default 0.
searchNoFull-text search across all text columns ($q). For field-specific filtering, use where instead.
selectNoSoQL SELECT clause — column names, aliases, aggregates: "state, sum(deaths) as total_deaths". Omit for all columns.
canvas_idNoOptional 10-char DataCanvas token from a prior call. Omit on first call when CANVAS_PROVIDER_TYPE=duckdb to mint a fresh canvas. Large result sets spill here automatically.
dataset_idYesFour-by-four dataset ID (e.g. kzjm-xkqj). Obtain from socrata_find_datasets.

Output Schema

ParametersJSON Schema
NameRequiredDescription
capNoThe row limit that was applied when capped.
rowsYesResult rows. Scalar values are strings (SODA 2.1); geo/location columns return nested objects. Use column schema from socrata_get_dataset for type context.
shownNoRows returned in this response when capped.
domainYesPortal domain queried.
noticeNoGuidance when the query returned zero rows — suggests narrowing or reviewing the SoQL. Absent on non-empty result sets.
canvas_idNoDataCanvas token when results spilled (requires CANVAS_PROVIDER_TYPE=duckdb). Pass to socrata_dataframe_query for SQL over the full result set.
row_countYesRows returned in this response.
truncatedNoTrue when rows filled the limit — more rows may match (see total_count when present). Spills to canvas when enabled.
dataset_idYesDataset ID queried.
total_countNoTotal matching source rows when a plain row query is truncated (row_count < total_count). Absent when the full result fits and for grouped/aggregate queries (group set), where a source-row count would not describe the returned groups.
assembled_queryYesSoQL clauses assembled for this request — useful for learning the syntax.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint as true. The description adds critical behavioral details beyond that: rows return as strings even for numeric columns, the tool returns the assembled SoQL string, and result spillover to DataCanvas occurs under certain conditions. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is six sentences long, front-loaded with the core purpose. It is well-structured and avoids redundancy, though it could be slightly more concise without losing key details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (11 parameters, 1 required, output schema exists), the description covers purpose, parameter usage guidance, quoting behavior, and spillover behavior. With an output schema present, it does not need to detail return values. It adequately addresses the main usage scenarios.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds value with usage patterns like how to use where clause based on data type, how to enumerate distinct values, and interaction between search and where. This exceeds the baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Execute a SoQL query against any dataset on any Socrata portal,' identifying the specific verb (execute/query), resource (any dataset), and scope (Socrata portal). It distinguishes from sibling tools like socrata_dataframe_describe or socrata_find_datasets by focusing on SoQL query execution.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use the search parameter versus combining other parameters, how to handle WHERE quoting based on column type, and how to enumerate distinct values. However, it does not compare directly with sibling tools or specify when not to use this tool, missing full comparative guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Sign in to create a connector for this server.