RunPod MCP Server
OfficialThis MCP server enables comprehensive management of RunPod cloud infrastructure through the Model Context Protocol, allowing AI assistants like Claude to programmatically interact with the RunPod REST API.
Capabilities:
Pod Management: Create, list, retrieve details, update, start, stop, and delete pods with configurable options including GPU type/count, container images, Docker settings, environment variables, ports, storage volumes, data centers, and container disk sizes. Filter pods by name, GPU type, compute type, and data center.
Serverless Endpoint Management: Create, list, retrieve details, update, and delete serverless endpoints with auto-scaling configurations (min/max workers, scaler type, idle timeout), GPU configuration, data center selection, and template-based deployment.
Template Management: Create, list, retrieve details, update, and delete templates for reusable container configurations with Docker settings, environment variables, volumes, ports, and serverless options.
Network Volume Management: Create, list, retrieve details, update (name and size), and delete network volumes (1-4000 GB) for persistent storage across data centers.
Container Registry Authentication: Create, list, retrieve details, and delete container registry authentication credentials (username and password) for accessing private Docker images.
Required runtime environment for the MCP server implementation
Enables management of NVIDIA GPU-powered computing resources through RunPod's platform
Supports deployment of PyTorch environments through RunPod's container infrastructure
Facilitates creation and management of Ubuntu-based development environments on RunPod
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@RunPod MCP Serverlist all my pods"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Runpod MCP server
This is the official Runpod Model Context Protocol (MCP) server, published to npm as @runpod/mcp-server. It lets MCP clients such as Claude Code, Claude Desktop, Cursor, Windsurf, and VS Code manage your Runpod Pods, Serverless endpoints, templates, network volumes, and more.
Quick start
The fastest way to get connected is the guided installer pointed at the hosted Runpod MCP server at https://mcp.getrunpod.io/. The installer detects the agents you have installed, asks which ones to configure, and writes the configuration for you:
npx @runpod/mcp-server@latest addIt supports Claude Code, Claude Desktop, Cursor, Windsurf, and Visual Studio Code, and offers two connection modes:
Hosted (recommended). Points the agent at the hosted server and authenticates with the "Sign in with Runpod" OAuth flow, so no API key is stored on disk.
Local. Runs the server through
npxand stores aRUNPOD_API_KEYin the agent's config.
To undo the changes later, run:
npx @runpod/mcp-server@latest removeConnect to the hosted server manually
If you'd rather configure your client by hand, point it at the hosted server over HTTP (no local process, no API key stored).
Claude Code — add it as an HTTP server:
claude mcp add --transport http runpod -s user https://mcp.getrunpod.io/Other clients (Cursor, VS Code, Claude Desktop connectors, …) — use a URL-based MCP entry:
{
"mcpServers": {
"runpod": {
"url": "https://mcp.getrunpod.io/"
}
}
}The hosted server uses the "Sign in with Runpod" OAuth flow for authentication, so no API key is stored. An OAuth-capable client starts the sign-in flow automatically on first connect. See Sign in with Runpod (authorize flow) for details.
Prefer your own API key instead of OAuth? Append
--header "Authorization: Bearer YOUR_API_KEY"to theclaude mcp addcommand (or add aheadersblock in the JSON), and the server forwards that key to the Runpod API directly.
Requirements
Node.js 18 or higher.
A Runpod account and API key: https://www.runpod.io/console/user/settings
Related MCP server: RunPod MCP Server
Run locally with npx
To run the server as a local stdio process with your own API key:
RUNPOD_API_KEY=YOUR_API_KEY npx -y @runpod/mcp-server@latestInstall via Smithery
npx -y @smithery/cli install @runpod/runpod-mcp-ts --client claudeLocal MCP client setup
Local MCP clients should use the default package entrypoint, which is the stdio server. The caller sets RUNPOD_API_KEY in the environment, and the server forwards it directly to the Runpod API.
Claude Code
claude mcp add runpod -s user \
-e RUNPOD_API_KEY=YOUR_API_KEY \
-- npx -y @runpod/mcp-server@latestFor a project-local server:
claude mcp add runpod -s project \
-e RUNPOD_API_KEY=YOUR_API_KEY \
-- npx -y @runpod/mcp-server@latestVerify with claude mcp list. In an active session, use /mcp to reconnect.
Claude Desktop
Local MCP servers in Claude Desktop still use claude_desktop_config.json.
macOS:
~/Library/Application Support/Claude/claude_desktop_config.json
Windows:
%APPDATA%\\Claude\\claude_desktop_config.json
{
"mcpServers": {
"runpod": {
"command": "npx",
"args": ["-y", "@runpod/mcp-server@latest"],
"env": {
"RUNPOD_API_KEY": "YOUR_API_KEY"
}
}
}
}Restart Claude Desktop after saving.
For remote clients such as Claude's connector, use the hosted HTTP server instead (the Quick start above). The client then runs the OAuth "Sign in with Runpod" flow, and the resulting Runpod API key is forwarded to the Runpod API on each request.
Cursor
Add this to .cursor/mcp.json in your project or ~/.cursor/mcp.json globally:
{
"mcpServers": {
"runpod": {
"command": "npx",
"args": ["-y", "@runpod/mcp-server@latest"],
"env": {
"RUNPOD_API_KEY": "YOUR_API_KEY"
}
}
}
}VS Code, Windsurf, Cline, JetBrains, and other local clients
Use the same pattern:
command:
npxargs:
["-y", "@runpod/mcp-server@latest"]env:
RUNPOD_API_KEY=YOUR_API_KEY
For a broader list of MCP clients, see https://modelcontextprotocol.io/clients
Usage examples
List all Pods
Can you list all my Runpod Pods?Create a new Pod
Create a new Runpod Pod with the following specifications:
- Name: test-pod
- Image: runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
- GPU Type: NVIDIA GeForce RTX 4090
- GPU Count: 1Create a Serverless endpoint
Create a Runpod Serverless endpoint with the following configuration:
- Name: my-endpoint
- Image: runpod/test-output:0.0.1
- GPU pool: AMPERE_80 (a "pool" value from list-gpu-types)
- Minimum workers: 0
- Maximum workers: 3On the v2 API (the default) endpoints are image-based — pass an image and a GPU
pool, not a template. To use the legacy template-based model, pin
RUNPOD_REST_VERSION=v1 and provide a Template ID instead.
Hosted HTTP deployment
The package also exports a Streamable HTTP entrypoint at @runpod/mcp-server/http, and this repo includes a Vercel function in api/index.ts. This is what powers the hosted server at https://mcp.getrunpod.io/.
The hosted transport is stateless:
Each request creates a fresh MCP server instance.
The caller must send
Authorization: Bearer <token>.No credentials are cached or shared server-side.
Hosted auth
The server forwards the caller's Bearer token directly to the Runpod API as the credential for that request. There is no server-side or shared key.
The token can be either:
a Runpod API key the caller configured manually, or
a Runpod API key obtained through the OAuth "Sign in with Runpod" flow (see below).
An unauthenticated request receives a 401 with a WWW-Authenticate header pointing at the protected-resource metadata, which tells an OAuth-capable client (such as Claude) to start the sign-in flow. A caller that brings its own API key as the bearer token never hits that path.
Sign in with Runpod (authorize flow)
In OAuth mode the hosted server is itself the authorization server for Claude's connector. It advertises itself in /.well-known/oauth-protected-resource and /.well-known/oauth-authorization-server, pointing authorization_endpoint at its own GET /authorize, token_endpoint at its own POST /token, and registration_endpoint at its own POST /register.
The flow reuses the Runpod flash auth backend and mints a real Runpod API key:
The client registers via
POST /register(OAuth Dynamic Client Registration, RFC 7591) to obtain aclient_id. This is a public client; no client secret is used.GET /authorizecalls the guestcreateFlashAuthRequestmutation to get a request id (which serves as the OAuth authorization code), then302-redirects the browser to the console handoff page (/integrations/mcp/login), carrying the request id plus the client'sredirect_uriandstate.The user logs in and approves the request in the console. On approval the backend mints a Runpod API key for the request.
The console returns the browser to the client's
redirect_uriwithcode=<request id>.POST /tokenpolls the guestflashAuthRequestStatusquery for that id; onceAPPROVED, it returns the mintedapiKeyas theaccess_token. The client then sends that key as its bearer token on every MCP request, and the server forwards it to the Runpod API.
This flow uses these environment variables:
RUNPOD_GRAPHQL_URL: flash auth backend endpoint (defaulthttps://api.runpod.io/graphql).CONSOLE_BASE_URL: base URL of the console that hosts the handoff login page (defaulthttps://console.runpod.io).RUNPOD_REST_API_URL/RUNPOD_SERVERLESS_API_URL: override the REST and Serverless API hosts so a deployment authenticating with non-production keys can target the matching environment.RUNPOD_API_KEY_NAME: name for the minted key as shown in the user's dashboard. Defaults torunpod-mcp. Set it to""to omit the name for a backend that does not support theapiKeyNameargument (such backends reject the request when it is sent).MCP_ALLOWED_REDIRECT_URIS: comma-separated extraredirect_urivalues to allow, in addition to the built-in Claude callbacks. Loopback addresses (localhost/127.0.0.1/::1, any port) are always allowed./authorizeand/tokenreject anyredirect_urinot on this list, since the authorization code redeems into a real API key.
You can verify the entire flow end to end with MCP_SERVER_URL=<deployment-url> npx tsx scripts/oauth-e2e.ts (the harness uses a loopback callback).
Notes and current limitations:
PKCE is not supported:
code_challengeis neither advertised nor enforced (the flash flow has no place to bind it). Security relies on theredirect_uriallowlist and the single-use, short-lived flash approval.The minted key is named via
RUNPOD_API_KEY_NAME(defaultrunpod-mcp), which requires a flash backend that supports theapiKeyNameargument. Against a backend without it, setRUNPOD_API_KEY_NAME="".
Vercel
This repo already contains vercel.json and the Vercel handler.
Deploy with the Vercel CLI:
vercel
vercel --prodAfter deploy, test the endpoint:
curl -i https://YOUR-DEPLOYMENT.vercel.app/Unauthenticated requests should return 401.
Hosted smoke test
MCP_SERVER_URL=https://YOUR-DEPLOYMENT.vercel.app \
RUNPOD_API_KEY=YOUR_API_KEY \
pnpm smoke:httpThis validates:
MCP initialization.
tools/list.A public GraphQL-backed tool call.
An authenticated REST-backed tool call.
Local development
git clone https://github.com/runpod/runpod-mcp.git
cd runpod-mcp
pnpm install
pnpm buildRun the local build directly:
RUNPOD_API_KEY=YOUR_API_KEY node dist/stdio.mjsOr point a client at your local build:
{
"command": "node",
"args": ["/absolute/path/to/runpod-mcp/dist/stdio.mjs"],
"env": {
"RUNPOD_API_KEY": "YOUR_API_KEY"
}
}Local smoke test
RUNPOD_API_KEY=YOUR_API_KEY pnpm build
RUNPOD_API_KEY=YOUR_API_KEY pnpm smoke:stdioContributing
The source is now split by responsibility:
src/stdio.ts: localstdioentrypoint (runs the v2 probe at startup forautomode).src/http.ts: bearer-token extraction and the per-request MCP session for the Streamable HTTP transport.src/tools.ts: thin orchestrator — builds the shared runtime and calls each per-resource registrar.src/tools/<resource>.ts: the tools for one resource (pods,endpoints,jobs,templates,network-volumes,registries,catalog), each with a description + MCP annotations.src/tools/runtime.ts: the shared per-server runtime (caller-tracking, the authenticated REST/Serverless/GraphQL clients, the v1/v2 backend resolver) threaded into every registrar.src/server.ts: shared server metadata and construction.src/_shared/backend.ts: v1/v2 routing adapter — version resolution, per-resource paths, list-envelope unwrap, and the v2 probe.src/_shared/http.ts: unified authenticated JSON client +HttpError.src/_shared/tracking.ts: caller-tracking header construction.src/_shared/mappers.ts: v1→v2 request-body mappers.api/index.ts: Vercel adapter and the OAuth authorization-server routes (/.well-known/*,/register,/authorize,/token).
After changes:
pnpm type-check
pnpm lint
pnpm test
pnpm buildpnpm test runs the offline unit suite (outbound-request goldens, adapter, mappers, http client) — no network or API key required.
v2 spec parity gate
tests/spec-parity.test.ts walks every operation in the vendored v2 OpenAPI spec (tests/fixtures/v2-openapi.yaml) and fails if a v2 endpoint has no MCP tool covering it (and isn't explicitly allowlisted) — the drift gate that flags when the API grows an endpoint we don't expose yet. It also runs the reverse check (no tool is left unaccounted for). It's part of pnpm test, hermetic (parses the committed spec; no network, no API key).
Refresh the vendored spec when the v2 API changes:
pnpm tsx scripts/fetch-v2-spec.ts # re-fetch tests/fixtures/v2-openapi.yaml
pnpm test # parity test shows any newly-uncovered endpointThe serverless runtime tools (run-endpoint, get-job-status, …) target api.runpod.ai/v2, which is a different service from the v2 REST control plane, so they are allowlisted as intentionally spec-unmapped. The spec has no logs/artifacts endpoints, so there are no such tools.
An optional live shape check (skipped by default) verifies the live endpoints still return the envelopes the list tools unwrap; run it with MCP_LIVE_V2_KEY=<dev-key> pnpm test (optionally MCP_LIVE_V2_BASE).
For transport validation:
pnpm smoke:stdio
pnpm smoke:httpTo exercise a real create→get→delete lifecycle against a dev account (free resources only — templates and registry auths — with fail-closed teardown):
RUNPOD_API_KEY=YOUR_DEV_KEY \
RUNPOD_REST_V2_API_URL=https://v2-rest.runpod.dev/v2 \
pnpm smoke:crud v1 v2This project uses changesets for versioning and npm publishing. Every PR with user-facing changes needs a changeset file at .changeset/DESCRIPTIVE_NAME.md.
See CLAUDE.md and docs/context.md for contributor guidance.
Reference
Deployment modes
The server supports two deployment modes:
Local
stdiofor Claude Desktop, Claude Code, Cursor, VS Code, and other MCP clients that launch a local process. The caller setsRUNPOD_API_KEYin the environment.Hosted Streamable HTTP for Vercel or other HTTP-capable platforms. Each request carries its own
Authorization: Bearer <token>, which the server forwards directly to the Runpod API. The token can be a Runpod API key or one obtained through the OAuth "Sign in with Runpod" flow.
The server never holds a credential of its own and never shares one across users.
REST API version (v1 / v2)
The server can target either the v1 REST API (rest.runpod.io/v1) or the newer v2 REST API (v2-rest.runpod.io/v2). It defaults to v2; set RUNPOD_REST_VERSION=v1 to pin the previous v1 behavior. These environment variables are read once at startup:
Variable | Values | Default | Effect |
|
|
| Version used for all resources. |
|
| — | Per-resource override (e.g. |
| URL |
| v2 base URL. Override to target a non-prod host. |
| URL |
| v1 base URL. |
| URL |
| Serverless runtime base URL. |
Notes:
Default is v2 — with nothing set, every control-plane resource uses the v2 REST API. Set
RUNPOD_REST_VERSION=v1to pin the previous v1 behavior.Pinning a deployment — the env var is read once at startup, so set
RUNPOD_REST_VERSION(v1orv2) in the host's environment (e.g. a Vercel project env var) and redeploy; no code change. Hosted HTTP honors this default like any other transport.What
autodoes: it probes v2 once at startup and falls back to v1 — but only on thestdiotransport (one process = one key). On hosted HTTPautoresolves to v1, because a warm instance serves many users and a cached probe verdict could leak across them.autois a stdio-only convenience; on HTTP, rely on the default or pin the version explicitly.jobs(serverless runtime) always uses v1 regardless of the setting — it has no v2 REST home (it targetsapi.runpod.ai/v2, a different service).The v2-only tools (
list-cpu-types,get-gpu-type,restart-pod) return a clear "v2 only" notice when called under v1.
⚠️ Migration note —
create-endpoint/update-endpointchanged shape in v2. Serverless endpoints now use the v2 API (/v2/serverless) with an inline config instead of atemplateId. On v2,create-endpointrequiresimageName+gpuPoolIds(GPU pool names fromlist-gpu-types— thepoolfield, e.g.AMPERE_80), plus optionalworkersMin/workersMax,scalerType/scalerValue/idleTimeout,containerDiskInGb,env,flashboot, etc. It no longer acceptstemplateId. If you previously calledcreate-endpointwith{ templateId }and have noRUNPOD_REST_VERSIONset, that call now returns a clean400after upgrading — switch to the inline fields, or pinRUNPOD_REST_VERSION=v1to keep the legacy template-based model.
create-podfor a CPU pod (computeType: "CPU") on v2 is transparently served by the v1 API — v2 has no CPU pods yet — and the reply is flagged_servedBy: "v1". Because that fallback hits the v1 base, setRUNPOD_REST_API_URLto match your environment when running v2 against a non-prod host (otherwise CPU creates land on v1 prod). On v2 a create with neithergpuTypeIdsnorcomputeTypeis rejected — absence is never silently turned into a CPU pod.
Example (stdio client config) — v2 against prod:
"env": {
"RUNPOD_API_KEY": "YOUR_API_KEY",
"RUNPOD_REST_VERSION": "v2"
}Targeting a non-prod (dev) environment — point both bases at it (the v1 base matters even in v2 mode, for the CPU-pod fallback above):
"env": {
"RUNPOD_API_KEY": "YOUR_DEV_KEY",
"RUNPOD_REST_VERSION": "v2",
"RUNPOD_REST_V2_API_URL": "https://v2-rest.runpod.dev/v2",
"RUNPOD_REST_API_URL": "https://rest.runpod.dev/v1"
}Large tool output
Resource lists are paginated (default 20 items, nextCursor), so a big account can't flood the agent's context. But serverless job output — run-endpoint, runsync-endpoint, get-job-status, and especially stream-job (which accumulates every chunk) — is returned as-is and is not size-capped. It's a single opaque payload, not a list, so there's no cursor to page. A very large or long-streaming result can exceed the context window. If output may be huge, have the agent write it to a file instead of returning it inline, or set s3Config on the job so large outputs go to object storage.
Security considerations
This server acts with the full permissions of the supplied RUNPOD_API_KEY.
Never share your API key.
Be deliberate about destructive tools.
Treat hosted deployments as sensitive infrastructure.
Each request authenticates with its own caller-supplied token, which is forwarded to the Runpod API and never persisted server-side.
License
Apache-2.0
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/runpod/runpod-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server