NYC Open Data Capital Projects MCP Server
The NYC Open Data Capital Projects MCP Server lets you query, analyze, and explore NYC capital projects data — covering schedules, budgets, history, and variances — using 16 domain-aware tools backed by a local DuckDB database.
Query & Explore Data
Run custom SQL queries against the local DuckDB, including tables for project state, schedule/budget history, funding, variance, and agency/category dimensions
Check data freshness — view per-dataset row counts, current reporting period, and key caveats (data is refreshed ~3×/year: Jan/May/Sep)
Look up field definitions — access the official NYC Open Data data dictionary for any column or dataset
Browse & Resolve
List agencies and categories — see active agencies, their schedule-executor roles, and budget/line counts by program type
Resolve project references — map any project name, PID, FMS ID, or partial identifier to matching schedule and budget entities
Drill Into Projects
Get project schedules — retrieve phase, lifecycle status, signed schedule variance, delay reasons, and linked budget lines
Get project budgets — retrieve total budget, spend-to-date, variance, and linked schedules for a given FMS line
Aggregate & Analyze
Break down schedules or budgets — aggregate counts, averages, or totals by agency, sponsor, borough, phase, or category with role-aware attribution
Track schedule changes — identify projects newly completed or newly delayed between any two reporting periods
Analyze delay reasons — view the distribution of delay drivers for the current period or across all history
Measure budget changes — compute budget/spend deltas for an agency or specific FMS line between two reporting periods
Rank projects — rank schedules or budget lines by total budget, spend %, variance, or cumulative change, with optional filters
Analyze project durations — compute duration distributions between actual milestones, grouped by agency, borough, or lifecycle status
Browse the project portfolio — filter and list projects by category, borough, community board, lifecycle status, and/or agency
Domain Rules Built In
PID ↔ FMS many-to-many relationships handled correctly
Role-aware agency attribution: sponsor (owner) view for normal agencies; managing (builder) view for DDC/DCAS/EDC
Signed, neutral variance reporting (e.g., "moved 45 days later", "budget grew $2.1M")
3×-a-year reporting cadence acknowledged in all period-based tools
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@NYC Open Data Capital Projects MCP ServerShow me the capital projects with largest schedule delays"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
NYC Open Data — Capital Projects MCP Server
A local MCP server over the NYC Capital Projects Dashboard (CPD) datasets on NYC Open Data. It ingests four public Socrata datasets into a single local DuckDB and exposes 16 tools so an AI assistant can answer schedule, budget, and lifecycle questions about NYC capital projects — with the domain rules (PID↔FMS many-to-many, role-aware agency attribution, signed variance reporting) baked into the tools instead of left for the caller to rediscover.
Source datasets (Socrata)
ID | Dataset |
| Citywide Capital Project List Detail (the schedule↔budget edge) |
| Citywide Budget & Spend by FY |
| Citywide Budget Spend History & Variance |
| Citywide Schedule History & Variance |
Related MCP server: Personal Task Manager MCP
What's an MCP — and why use one?
MCP (Model Context Protocol) is a standard way to give an AI assistant a new, trusted skill. Instead of pasting a spreadsheet into a chat and hoping the model reads it right, you hand it a set of well-defined tools it can call — with the domain rules already baked in. It's the difference between telling an analyst "here's a spreadsheet, good luck" and hiring one who already knows the data cold.
Why not just ChatGPT + a CSV? Public data is messy in ways a generic chatbot can't see. Ask a raw LLM "what's NYC's biggest library project?" and it'll happily double-count a budget line shared by several projects, call a long-finished branch "still under construction" because its funding line is still open, or — seeing only three reporting periods a year (Jan / May / Sep) — assume months of data have gone missing. It sounds confident — and it's wrong. This server encodes the guardrails once — the PID↔FMS many-to-many, role-aware agency attribution, the 3×-a-year reporting cadence, signed reporting — so every answer is consistent, sourced, and reproducible.
What one prompt can build
This isn't only a query tool. Point an AI agent at it and a single prompt produces a polished, self-contained interactive HTML report — with the domain rules already applied. Three real examples (one prompt → one file; click to open the live report):
1 · Schedule ↔ Budget topology — the many-to-many anatomy of the portfolio
Analyze the schedule and budget many-to-many relationship across NYC capital projects and build a single interactive HTML report — the 1:1-vs-fan-out split, the outlier extremes, a per-agency breakdown, and budget concentration.
▶ Open the report — fan-out rings, a bipartite diagram, the "tangled few" outliers (hover to see the real schedules and budget lines), an agency scatter, and a budget concentration curve.
2 · Parks projects over $50M — every big build, and what funds it
Build an interactive one-file HTML report on NYC Parks projects over $50M. For each budget line, show every schedule associated with it, with phase and forecast completion.
▶ Open the report — 23 budget lines; hover any to reveal its linked schedules. Quietly applies the category taxonomy, so the $1.9B "Park Pedestrian Bridges" route to Bridges, not Parks.
3 · Budget & schedule change monitor — what moved this period, by agency
Build an interactive one-file HTML monitor of NYC capital projects' budget and schedule changes by managing agency, with a click-through detail view for each project's schedule and budget history.
▶ Open the report — KPIs, a trend chart, a sortable watchlist, and a per-project popup with schedule-variance bars and a stacked budget-vs-spend chart.
Each report was generated from the prompt shown, then lightly polished. The figures are a snapshot of reporting period 202601 — browse all three in the report gallery.
🚀 Quick Start
Want the data without the setup? If your AI can run commands on your computer, just ask it to install everything for you.
✅ Let your AI install it (easiest)
Works with AI agents that can run terminal commands — Claude Code, Claude cowork (Claude Desktop's local-agent mode), Codex CLI, or another coding agent like Cursor.
Start your AI agent on this computer.
Paste the message below.
Approve each step (Allow, or press y) as it clones, installs, and connects the server.
Message to paste:
Install the MCP server at
https://github.com/WillHsiaoNYC/NYC-Opendata-Capital-Projects-MCP on this
machine — follow its README to clone the repo, install it with uv, run
`od-cpd init` to download the four NYC Open Data datasets into a local
database, and wire it into my MCP client config. Then run a verification
query to confirm it works.What "done" looks like: your AI reports the loaded reporting period (e.g.
202601), confirms od-cpd is connected with its 16 tools, and answers a
test question like "What's the biggest NYC capital project right now?" Takes a
few minutes, mostly the dataset download.
🖥️ Claude Desktop (chat)
Claude Desktop can use a local server but can't install one itself. Run the
Manual install, then add od-cpd to its own config (with
the absolute path to uv) and fully restart — see
Connect an MCP client.
☁️ claude.ai or ChatGPT (web)
These connect only to remote MCP servers, not a local one like this — so they
can't run od-cpd directly. Use one of the options above.
Manual install
Requires Python ≥ 3.12 and uv.
git clone https://github.com/WillHsiaoNYC/NYC-Opendata-Capital-Projects-MCP.git
cd NYC-Opendata-Capital-Projects-MCP
uv sync
uv run od-cpd init # download + materialize all 4 datasets → ./var/cpd.duckdb
uv run od-cpd status # confirm the loaded reporting periodOptional: set OD_CPD_SOCRATA_APP_TOKEN to a free
Socrata app token to avoid
anonymous rate limits during ingest.
Connect an MCP client
The server speaks stdio. Use the absolute path to uv (run which uv to
find it) — GUI apps like Claude Desktop don't inherit your shell PATH, so a
bare uv command fails silently.
Claude Code — from inside the repo folder:
claude mcp add od-cpd --env PYTHONPATH="$(pwd)/src" -- \
"$(which uv)" run --directory "$(pwd)" od-cpd-serverClaude Desktop — edit its config (macOS:
~/Library/Application Support/Claude/claude_desktop_config.json), add the block
below, then fully quit (⌘Q) and reopen Claude Desktop:
{
"mcpServers": {
"od-cpd": {
"command": "/absolute/path/to/uv",
"args": ["run", "--directory", "/absolute/path/to/repo", "od-cpd-server"],
"env": { "PYTHONPATH": "/absolute/path/to/repo/src" }
}
}
}(PYTHONPATH keeps the launch robust when uv's editable install is flaky — e.g.
on iCloud-synced paths.)
Keeping data fresh
The source datasets report on a Jan / May / Sep cycle, and Socrata
typically publishes each period ~2.5–3 months later (so new data usually
lands around April, August, and December). od-cpd update is a no-op when
nothing is newer, so it's safe to run any time — check around those months:
uv run od-cpd status # what period is loaded now
uv run od-cpd update # re-ingest only if Socrata is newerTo keep it fresh automatically, schedule update (e.g. monthly via cron):
# 9am on the 1st of each month
0 9 1 * * cd /path/to/repo && uv run od-cpd updateWhat's inside
docs/FEATURES.md— the canonical inventory: all 16 tools and every domain rule the server encodes. Start here.
The headline domain rules, briefly:
"Project" is ambiguous. A PID identifies a schedule; an FMS ID identifies a budget line. They are many-to-many (~3% fan out), so the tools list all counterparts rather than silently picking one.
Agency attribution is role-aware. "Agency X's projects" means the sponsor (owner) view for normal agencies, but the managing (builder) view for the three construction-manager agencies (DDC/DCAS/EDC).
Values are reported signed and neutral ("moved 45 days later", "budget grew $2.1M") rather than only surfacing one direction.
Layout
src/od_cpd/— ingest, materialization, and the MCP server + toolsdata/— curated agency/category dictionaries (YAML, tracked)tests/— unit tests + golden evals (uv run pytest)var/,exports/— runtime DuckDB + exports (gitignored, regenerable)
Develop
uv run pytest # fallback: PYTHONPATH=src python -m pytestClassification is dictionary-driven: edit data/agencies.yaml /
data/categories.yaml (not Python) to adjust agency or category mappings,
then re-materialize. See CLAUDE.md for the atomic-swap pattern that applies
materialization changes without re-downloading.
Data caveats
This is an independent project, not affiliated with the City of New York.
Figures reflect whatever reporting period the underlying Socrata datasets
carry at ingest time; always check dataset_info for the current period and
per-dataset caveats.
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
- Your AI Chatbot Just Exposed Your CEO's Salary to an InternBy Om-Shree-0709 on .Agent IdentityMCP SecurityOAuth Delegation
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/WillHsiaoNYC/NYC-Opendata-Capital-Projects-MCP'
If you have feedback or need assistance with the MCP directory API, please join our Discord server