Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@DataSF MCP Serversearch for datasets about police incidents in San Francisco"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
DataSF MCP Server
A Model Context Protocol (MCP) server that provides LLMs with seamless access to San Francisco's open data portal (DataSF), powered by the Socrata platform.
Overview
This MCP server enables AI assistants like Claude to search, explore, and query San Francisco's public datasets through a simple, standardized interface. It handles the complexity of the Socrata API, provides intelligent column name correction, and includes schema caching for optimal performance.
Key Features
π Dataset Search & Discovery - Find datasets by keywords or browse by category
π Schema Retrieval - Get column names and data types before querying
π¬ SoQL Query Execution - Run SQL-like queries against any dataset
π― Fuzzy Column Matching - Auto-corrects typos in column names
β‘ Schema Caching - Reduces API calls with intelligent caching
π Optional Authentication - Supports Socrata App Tokens for higher rate limits
β Property-Based Testing - Comprehensive correctness guarantees
Available Tools
1. search_datasf
Search for datasets by keywords.
Parameters:
query(string, required): Search keywords (1-500 characters)limit(number, optional): Max results (default: 5, max: 20)
Example:
2. list_datasf
Browse available datasets, optionally filtered by category.
Parameters:
category(string, optional): Filter by categorylimit(number, optional): Max results (default: 5, max: 20)
Example:
3. get_schema
Get the schema (columns and data types) for a specific dataset.
Parameters:
dataset_id(string, required): Dataset 4x4 ID (format:xxxx-xxxx)
Example:
4. query_datasf
Execute a SoQL (Socrata Query Language) query against a dataset.
Parameters:
dataset_id(string, required): Dataset 4x4 IDsoql(string, required): SoQL query (1-4000 characters)auto_correct(boolean, optional): Enable column name correction (default: true)
Example:
Installation
Prerequisites
Node.js 18 or higher
npm or yarn
Local Setup (Optional)
If you want to run or modify the server locally:
Clone the repository:
Install dependencies:
Run the server:
The server uses tsx to run TypeScript directly without a build step.
Usage
Testing with MCP Inspector
For the MCP Inspector, you'll need to use the local installation:
In the inspector UI, use:
Command:
tsxArguments:
src/index.ts(or absolute path if running from outside the directory)
Quick Start with npx (Recommended)
The easiest way to use the server is directly from GitHub using npx:
This will automatically download and run the latest version from GitHub without any manual installation.
Local Installation
Alternatively, clone and install locally:
Then use the absolute path in your MCP configuration (see below).
Configuration for Claude Desktop
Add to your Claude Desktop config file:
Windows: %APPDATA%\Claude\claude_desktop_config.json
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json
Option 1: Using npx (recommended)
Option 2: Using local installation
Important: Replace /absolute/path/to/datasf-mcp with the actual full path to where you cloned this project.
Configuration for Kiro IDE
Create or edit .kiro/settings/mcp.json:
Option 1: Using npx from GitHub (recommended)
Option 2: Using local installation
Getting a Socrata App Token
The server works without authentication for public data, but an App Token increases rate limits:
Visit https://data.sfgov.org/
Sign up for a free account
Navigate to Developer Settings
Create a new App Token
Add it to your MCP configuration
Development
Project Structure
Available Scripts
npm run build- Compile TypeScript to JavaScriptnpm start- Run the compiled servernpm test- Run all testsnpm run test:watch- Run tests in watch mode
Running Tests
The project uses property-based testing with fast-check to ensure correctness across a wide range of inputs.
Architecture
The server follows a modular architecture:
MCP Server - Handles protocol communication via stdio
Socrata Client - Manages HTTP requests to Socrata APIs
Validator - Validates all inputs using Zod schemas
Fuzzy Matcher - Corrects column name typos using Fuse.js
Schema Cache - Caches dataset schemas in memory (5-minute TTL)
Error Handler - Classifies and formats errors for LLM consumption
Example Queries
Once configured in your LLM, you can ask questions like:
"Search for datasets about housing in San Francisco"
"What's the schema for the police incidents dataset (wg3w-h783)?"
"Show me the top 10 incident categories from the police incidents dataset"
"Find all building permits issued in 2024"
"What datasets are available about transportation?"
API Endpoints Used
The server interacts with three Socrata APIs:
Discovery API:
https://api.us.socrata.com/api/catalog/v1- Dataset search and browsingViews API:
https://data.sfgov.org/api/views/{id}.json- Schema retrievalResource API:
https://data.sfgov.org/resource/{id}.json- Data querying
Error Handling
The server provides descriptive error messages for:
Validation errors - Invalid input format or length
Not found - Dataset doesn't exist
Rate limiting - Too many requests (add App Token to resolve)
Timeouts - Request exceeded 30 seconds
API errors - Socrata-specific errors (e.g., SoQL syntax errors)
Contributing
Contributions are welcome! The project uses:
TypeScript for type safety
Zod for runtime validation
fast-check for property-based testing
Vitest as the test runner
License
MIT
Resources
Troubleshooting
Server not starting
Ensure you ran
npm run buildfirstCheck that Node.js 18+ is installed
Tools not showing up in LLM
Verify the path in your config is absolute
Restart your LLM application after adding the config
Check the LLM's logs for connection errors
Rate limiting errors
Add a Socrata App Token to your configuration
Reduce the frequency of requests
Column name errors in queries
Use
get_schemafirst to see valid column namesEnable
auto_correct: true(default) for automatic typo correction