Skip to main content
Glama

create-endpoint

Deploy scalable GPU or CPU endpoints on RunPod by specifying templates, compute resources, and worker configurations for AI inference workloads.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
nameNoName for the endpoint
templateIdYesTemplate ID to use
computeTypeNoGPU or CPU endpoint
gpuTypeIdsNoList of acceptable GPU types
gpuCountNoNumber of GPUs per worker
workersMinNoMinimum number of workers
workersMaxNoMaximum number of workers
dataCenterIdsNoList of data centers

Implementation Reference

  • Handler function that sends a POST request to RunPod's /endpoints endpoint with the input parameters and formats the response as MCP content.
    async (params) => { const result = await runpodRequest('/endpoints', 'POST', params); return { content: [ { type: 'text', text: JSON.stringify(result, null, 2), }, ], }; }
  • Zod input schema defining parameters for creating a RunPod endpoint.
    { name: z.string().optional().describe('Name for the endpoint'), templateId: z.string().describe('Template ID to use'), computeType: z .enum(['GPU', 'CPU']) .optional() .describe('GPU or CPU endpoint'), gpuTypeIds: z .array(z.string()) .optional() .describe('List of acceptable GPU types'), gpuCount: z.number().optional().describe('Number of GPUs per worker'), workersMin: z.number().optional().describe('Minimum number of workers'), workersMax: z.number().optional().describe('Maximum number of workers'), dataCenterIds: z .array(z.string()) .optional() .describe('List of data centers'), },
  • src/index.ts:399-432 (registration)
    Registration of the create-endpoint tool on the MCP server using server.tool().
    server.tool( 'create-endpoint', { name: z.string().optional().describe('Name for the endpoint'), templateId: z.string().describe('Template ID to use'), computeType: z .enum(['GPU', 'CPU']) .optional() .describe('GPU or CPU endpoint'), gpuTypeIds: z .array(z.string()) .optional() .describe('List of acceptable GPU types'), gpuCount: z.number().optional().describe('Number of GPUs per worker'), workersMin: z.number().optional().describe('Minimum number of workers'), workersMax: z.number().optional().describe('Maximum number of workers'), dataCenterIds: z .array(z.string()) .optional() .describe('List of data centers'), }, async (params) => { const result = await runpodRequest('/endpoints', 'POST', params); return { content: [ { type: 'text', text: JSON.stringify(result, null, 2), }, ], }; } );
  • Shared helper function for making authenticated HTTP requests to the RunPod API, used by all tools including create-endpoint.
    async function runpodRequest( endpoint: string, method: string = 'GET', body?: Record<string, unknown> ) { const url = `${API_BASE_URL}${endpoint}`; const headers = { Authorization: `Bearer ${API_KEY}`, 'Content-Type': 'application/json', }; const options: NodeFetchRequestInit = { method, headers, }; if (body && (method === 'POST' || method === 'PATCH')) { options.body = JSON.stringify(body); } try { const response = await fetch(url, options); if (!response.ok) { const errorText = await response.text(); throw new Error(`RunPod API Error: ${response.status} - ${errorText}`); } // Some endpoints might not return JSON const contentType = response.headers.get('content-type'); if (contentType && contentType.includes('application/json')) { return await response.json(); } return { success: true, status: response.status }; } catch (error) { console.error('Error calling RunPod API:', error); throw error; } }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/runpod/runpod-mcp-ts'

If you have feedback or need assistance with the MCP directory API, please join our Discord server