Stata MCP Server

README.md•17.2 KiB

# Stata MCP Server <a href="https://cursor.com/en-US/install-mcp?name=mcp-stata&config=eyJjb21tYW5kIjogInV2eCAtLXJlZnJlc2ggLS1yZWZyZXNoLXBhY2thZ2UgbWNwLXN0YXRhIC0tZnJvbSBtY3Atc3RhdGFAbGF0ZXN0IG1jcC1zdGF0YSJ9"><img src="https://cursor.com/deeplink/mcp-install-dark.svg" alt="Install MCP Server" height="20"></a>  <a href="https://pypi.org/project/mcp-stata/"><img src="https://img.shields.io/pypi/v/mcp-stata?style=flat&color=black" alt="PyPI - Version" height="20"></a> A [Model Context Protocol](https://github.com/modelcontextprotocol) (MCP) server that connects AI agents to a local Stata installation. > If you'd like a fully integrated VS Code extension to run Stata code without leaving your IDE, and also allow AI agent interaction, check out my other project: [<img src="https://raw.githubusercontent.com/tmonk/stata-workbench/refs/heads/main/img/icon.png" height="12px"> Stata Workbench](https://github.com/tmonk/stata-workbench/). Built by <a href="https://tdmonk.com">Thomas Monk</a>, London School of Economics.  This server enables LLMs to: - **Execute Stata code**: run any Stata command (e.g. `sysuse auto`, `regress price mpg`). - **Inspect data**: retrieve dataset summaries and variable codebooks. - **Export graphics**: generate and view Stata graphs (histograms, scatterplots). - **Streaming graph caching**: automatically cache graphs during command execution for instant exports. - **Verify results**: programmatically check stored results (`r()`, `e()`) for accurate validation. ## Prerequisites - **Stata 17+** (Stata MP, SE, or BE). Must be licensed and installed locally. - **Python 3.11+** - **uv** (recommended) > **Note on `pystata`**: This server uses the proprietary `pystata` module that is included with your Stata installation. There is a third-party package named `pystata` on PyPI that is **not** the official Stata package and should not be installed. MCP-Stata handles finding and loading the official module from your Stata directory automatically. ## Installation ### Run as a published tool with `uvx` ```bash uvx --refresh --refresh-package mcp-stata --from mcp-stata@latest mcp-stata ``` `uvx` is an alias for `uv tool run` and runs the tool in an isolated, cached environment. ## Configuration This server attempts to automatically discover your Stata installation (supporting standard paths and StataNow). If auto-discovery fails, set the `STATA_PATH` environment variable to your Stata executable: ```bash # macOS example export STATA_PATH="/Applications/StataNow/StataMP.app/Contents/MacOS/stata-mp" # Windows example (cmd.exe) set STATA_PATH="C:\Program Files\Stata18\StataMP-64.exe" ``` If you encounter write permission issues with temporary files (common on Windows), you can override the temporary directory location by setting `MCP_STATA_TEMP`: ```bash # Example export MCP_STATA_TEMP="/path/to/writable/temp" ``` The server will automatically try the following locations in order of preference: 1. `MCP_STATA_TEMP` environment variable 2. System temporary directory 3. `~/.mcp-stata/temp` 4. Current working directory subdirectory (`.tmp/`) ### Startup Do Files When a session starts, MCP-Stata loads startup do files in the same order as native Stata: 1. **`MCP_STATA_STARTUP_DO_FILE`** (env var) — one or more custom do files, separated by `:` (Unix) or `;` (Windows). 2. **`sysprofile.do`** — the first one found along the Stata search path. 3. **`profile.do`** — the first one found along the Stata search path. The search path mirrors native Stata: Stata install directory, current working directory, then the ado-path (PERSONAL, SITE, PLUS, OLDPLACE, ...). Only the first `sysprofile.do` and first `profile.do` found are executed, matching native Stata behavior. All paths are deduplicated so the same file is never run twice. If a command clears programs (`clear all`, `clear programs`, or `program drop _all`), MCP-Stata automatically re-executes the startup files so that any programs they defined remain available. To disable this and let `clear all` behave exactly as in native Stata (programs are lost), set: ``` MCP_STATA_NO_RELOAD_ON_CLEAR=1 ``` If you prefer, add these variables to your MCP config's `env` for any IDE shown below. It's optional and only needed when discovery cannot find Stata. Optional `env` example (add inside your MCP server entry): ```json "env": { "STATA_PATH": "/Applications/StataNow/StataMP.app/Contents/MacOS/stata-mp", "MCP_STATA_STARTUP_DO_FILE": "/path/to/my/startup.do", "MCP_STATA_NO_RELOAD_ON_CLEAR": "1" } ``` ## IDE Setup (MCP) This MCP server uses the **stdio** transport (the IDE launches the process and communicates over stdin/stdout). --- ### Claude Desktop Open Claude Desktop → **Settings** → **Developer** → **Edit Config**. Config file locations include: * macOS: `~/Library/Application Support/Claude/claude_desktop_config.json` * Windows: `%APPDATA%\Claude\claude_desktop_config.json` #### Published tool (uvx) ```json { "mcpServers": { "mcp-stata": { "command": "uvx", "args": [ "--refresh", "--refresh-package", "mcp-stata", "--from", "mcp-stata@latest", "mcp-stata" ] } } } ``` After editing, fully quit and restart Claude Desktop to reload MCP servers. --- ### Cursor Cursor supports MCP config at: * Global: `~/.cursor/mcp.json` * Project: `.cursor/mcp.json` #### Published tool (uvx) ```json { "mcpServers": { "mcp-stata": { "command": "uvx", "args": [ "--refresh", "--refresh-package", "mcp-stata", "--from", "mcp-stata@latest", "mcp-stata" ] } } } ``` --- ### Windsurf Windsurf supports MCP plugins and also allows manual editing of `mcp_config.json`. After adding/editing a server, use the UI’s refresh so it re-reads the config. A common location is `~/.codeium/windsurf/mcp_config.json`. #### Published tool (uvx) ```json { "mcpServers": { "mcp-stata": { "command": "uvx", "args": [ "--refresh", "--refresh-package", "mcp-stata", "--from", "mcp-stata@latest", "mcp-stata" ] } } } ``` --- ### Google Antigravity In Antigravity, MCP servers are managed from the MCP store/menu; you can open **Manage MCP Servers** and then **View raw config** to edit `mcp_config.json`. #### Published tool (uvx) ```json { "mcpServers": { "mcp-stata": { "command": "uvx", "args": [ "--refresh", "--refresh-package", "mcp-stata", "--from", "mcp-stata@latest", "mcp-stata" ] } } } ``` --- ### Visual Studio Code VS Code supports MCP servers via a `.vscode/mcp.json` file. The top-level key is **`servers`** (not `mcpServers`). Create `.vscode/mcp.json`: #### Published tool (uvx) ```json { "servers": { "mcp-stata": { "type": "stdio", "command": "uvx", "args": [ "--refresh", "--refresh-package", "mcp-stata", "--from", "mcp-stata@latest", "mcp-stata" ] } } } ``` VS Code documents `.vscode/mcp.json` and the `servers` schema, including `type` and `command`/`args`. --- ## Skills - Skill file (for Claude/Codex): [skill/SKILL.md](skill/SKILL.md) ## Tools Available (from server.py) * `run_command(code, echo=True, as_json=True, trace=False, raw=False, max_output_lines=None, session_id="default")`: Execute Stata syntax in the specified session. - Always writes output to a temporary log file and emits a single `notifications/logMessage` containing `{"event":"log_path","path":"..."}` so the client can tail it locally. - May emit `notifications/progress` when the client provides a progress token/callback. * `read_log(path, offset=0, max_bytes=65536)`: Read a slice of a previously-provided log file (JSON: `path`, `offset`, `next_offset`, `data`). * `find_in_log(path, query, start_offset=0, max_bytes=5_000_000, before=2, after=2, case_sensitive=False, regex=False, max_matches=50)`: Search a log file for text and return context windows. - Returns JSON with `matches` (context lines, line indices), `next_offset`, and `truncated` if `max_matches` is hit. - Supports literal or regex search with bounded read window for large logs. * `load_data(source, clear=True, as_json=True, raw=False, max_output_lines=None, session_id="default")`: Heuristic loader (sysuse/webuse/use/path/URL) for the specified session. * `get_ui_channel(session_id="default")`: Return a short-lived localhost HTTP endpoint + bearer token for the UI-only data browser, targeting the specified session. * `describe(session_id="default")`: View dataset structure via Stata `describe`. * `list_graphs(session_id="default")`: See available graphs in memory (JSON list with an `active` flag). * `export_graph(graph_name=None, format="pdf", session_id="default")`: Export a graph to a file path. * `export_graphs_all(session_id="default")`: Export all in-memory graphs. Returns file paths. * `get_help(topic, plain_text=False, session_id="default")`: Markdown-rendered Stata help. * `codebook(variable, as_json=True, trace=False, raw=False, max_output_lines=None, session_id="default")`: Variable-level metadata. * `run_do_file(path, echo=True, as_json=True, trace=False, raw=False, max_output_lines=None, session_id="default")`: Execute a .do file in the specified session. * `get_stored_results(session_id="default")`: Get `r()` and `e()` scalars/macros as JSON. * `get_variable_list(session_id="default")`: JSON list of variables and labels. * `create_session(session_id)`: Manually create a new Stata session. * `list_sessions()`: List all active sessions and their status. * `stop_session(session_id)`: Terminate a specific session. * `break_session(session_id="default")`: Interrupt/Break the currently running command in a specific session. Use this if a command is taking too long and you want to stop it without closing the session and losing your data. ### Cancellation - Clients may cancel an in-flight request by sending the MCP notification `notifications/cancelled` with `params.requestId` set to the original tool call ID. - Client guidance: 1. Pass a `_meta.progressToken` when invoking the tool if you want progress updates (optional). 2. If you need to cancel, send `notifications/cancelled` with the same requestId. You may also stop tailing the log file path once you receive cancellation confirmation (the tool call will return an error indicating cancellation). 3. Be prepared for partial output in the log file; cancellation is best-effort and depends on Stata surfacing `BreakError`. Resources exposed for MCP clients: * `stata://data/summary` → `summarize` * `stata://data/metadata` → `describe` * `stata://graphs/list` → graph list (resource handler delegates to `list_graphs` tool) * `stata://variables/list` → variable list (resource wrapper) * `stata://results/stored` → stored r()/e() results ## UI-only Data Browser (Local HTTP API) This server also hosts a **localhost-only HTTP API** intended for a VS Code extension UI to browse data at high volume (paging, filtering) without sending large payloads over MCP. Important properties: - **Loopback only**: binds to `127.0.0.1`. - **Bearer auth**: every request requires an `Authorization: Bearer <token>` header. - **Short-lived tokens**: clients should call `get_ui_channel()` to obtain a fresh token as needed. - **Session Isolate**: caches (views, sorting) are isolated per `sessionId`. - **No Stata dataset mutation** for browsing/filtering: - No generated variables. - Paging uses `sfi.Data.get`. - Filtering is evaluated in Python over chunked reads. ### Discovery via MCP (`get_ui_channel`) Call the MCP tool `get_ui_channel()` and parse the JSON: ```json { "baseUrl": "http://127.0.0.1:53741", "token": "...", "expiresAt": 1730000000, "capabilities": { "dataBrowser": true, "filtering": true, "sorting": true, "arrowStream": true } } ``` Server-enforced limits (current defaults): - **maxLimit**: 500 - **maxVars**: 32,767 - **maxChars**: 500 - **maxRequestBytes**: 1,000,000 - **maxArrowLimit**: 1,000,000 (specific to `/v1/arrow`) ### Endpoints All endpoints are under `baseUrl` and require the bearer token. - `GET /v1/dataset?sessionId=default` - Returns dataset identity and basic state (`id`, `frame`, `n`, `k`) for the given session. - `GET /v1/vars?sessionId=default` - Returns full variable list with labels, types, and formats. - `POST /v1/page` - Paged data retrieval. Supports `sortBy`, `filterExpr` (ephemeral), and `sessionId`. - `POST /v1/arrow` - Returns a binary Arrow IPC stream (same input as `/v1/page`). - `POST /v1/views` - Create a long-lived filtered view. Returns a `viewId`. Requires `sessionId`. - `POST /v1/views/<viewId>/page` - Paged retrieval from a previously created view. Supports `sortBy` and `sessionId`. - `POST /v1/views/:viewId/arrow` - Returns a binary Arrow IPC stream from a filtered view. - `DELETE /v1/views/:viewId` - Deletes a view handle. - `POST /v1/filters/validate` - Validates a filter expression. ### Paging request example ```bash curl -sS \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{"datasetId":"...","frame":"default","offset":0,"limit":50,"vars":["price","mpg"],"includeObsNo":true,"maxChars":200}' \ "$BASE_URL/v1/page" ``` #### Sorting The `/v1/page` and `/v1/views/:viewId/page` endpoints support sorting via the optional `sortBy` parameter: ```bash # Sort by price ascending curl -sS \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{"datasetId":"...","offset":0,"limit":50,"vars":["price","mpg"],"sortBy":["price"]}' \ "$BASE_URL/v1/page" # Sort by price descending curl -sS \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{"datasetId":"...","offset":0,"limit":50,"vars":["price","mpg"],"sortBy":["-price"]}' \ "$BASE_URL/v1/page" # Multi-variable sort: foreign ascending, then price descending curl -sS \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{"datasetId":"...","offset":0,"limit":50,"vars":["foreign","price","mpg"],"sortBy":["foreign","-price"]}' \ "$BASE_URL/v1/page" ``` **Sort specification format:** - `sortBy` is an array of strings (variable names with optional prefix) - No prefix or `+` prefix = ascending order (e.g., `"price"` or `"+price"`) - `-` prefix = descending order (e.g., `"-price"`) - Multiple variables are supported for multi-level sorting - Uses the native Rust sorter when available, with a Polars fallback **Sorting with filtered views:** - Sorting is fully supported with filtered views - The sort is computed in-memory over the sort columns, then filtered indices are re-applied - Example: Filter for `price < 5000`, then sort descending by price ```bash # Create a filtered view curl -sS \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{"datasetId":"...","frame":"default","filterExpr":"price < 5000"}' \ "$BASE_URL/v1/views" # Returns: {"view": {"id": "view_abc123", "filteredN": 37}} # Get sorted page from filtered view curl -sS \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{"offset":0,"limit":50,"vars":["price","mpg"],"sortBy":["-price"]}' \ "$BASE_URL/v1/views/view_abc123/page" ``` Notes: - `datasetId` is used for cache invalidation. If the dataset changes due to running Stata commands, the server will report a new dataset id and view handles become invalid. - Filter expressions are evaluated in Python using values read from Stata via `sfi.Data.get`. Use boolean operators like `==`, `!=`, `<`, `>`, and `and`/`or` (Stata-style `&`/`|` are also accepted). - Sorting does **not** mutate the dataset order in Stata; it computes sorted indices for the response and caches them for subsequent requests. - The Rust sorter is the primary implementation; Polars is used only as a fallback when the native extension is unavailable. ## License This project is licensed under the GNU Affero General Public License v3.0 or later. See the LICENSE file for the full text. ## Error reporting - All tools that execute Stata commands support JSON envelopes (`as_json=true`) carrying: - `rc` (from r()/c(rc)), `stdout`, `stderr`, `message`, optional `line` (when Stata reports it), `command`, optional `log_path` (for log-file streaming), and a `snippet` excerpt of error output. - Stata-specific cues are preserved: - `r(XXX)` codes are parsed when present in output. - “Red text” is captured via stderr where available. - `trace=true` adds `set trace on` around the command/do-file to surface program-defined errors; the trace is turned off afterward. ## Logging Set `MCP_STATA_LOGLEVEL` (e.g., `DEBUG`, `INFO`) to control server logging. Logs include discovery details (edition/path) and command-init traces for easier troubleshooting. ## Development & Contributing For detailed information on building, testing, and contributing to this project, see [CONTRIBUTING.md](CONTRIBUTING.md). Quick setup: ```bash # Install dependencies uv sync --extra dev --no-install-project # Run tests (requires Stata) pytest # Run tests without Stata pytest -v -m "not requires_stata" # Build the package python -m build ``` [![Tests](https://github.com/tmonk/mcp-stata/actions/workflows/build-test.yml/badge.svg)](https://github.com/tmonk/mcp-stata/actions/workflows/build-test.yml)

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/tmonk/mcp-stata'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

README.md•17.2 KiB