Skip to main content
Glama

create-endpoint

Deploy scalable AI inference endpoints on RunPod by configuring compute resources, GPU specifications, and worker scaling parameters.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
nameNoName for the endpoint
templateIdYesTemplate ID to use
computeTypeNoGPU or CPU endpoint
gpuTypeIdsNoList of acceptable GPU types
gpuCountNoNumber of GPUs per worker
workersMinNoMinimum number of workers
workersMaxNoMaximum number of workers
dataCenterIdsNoList of data centers

Implementation Reference

  • Handler function for the 'create-endpoint' tool that POSTs to RunPod /endpoints API and returns the JSON response as formatted text.
    async (params) => { const result = await runpodRequest('/endpoints', 'POST', params); return { content: [ { type: 'text', text: JSON.stringify(result, null, 2), }, ], }; }
  • Zod input schema defining parameters for creating a RunPod endpoint.
    { name: z.string().optional().describe('Name for the endpoint'), templateId: z.string().describe('Template ID to use'), computeType: z .enum(['GPU', 'CPU']) .optional() .describe('GPU or CPU endpoint'), gpuTypeIds: z .array(z.string()) .optional() .describe('List of acceptable GPU types'), gpuCount: z.number().optional().describe('Number of GPUs per worker'), workersMin: z.number().optional().describe('Minimum number of workers'), workersMax: z.number().optional().describe('Maximum number of workers'), dataCenterIds: z .array(z.string()) .optional() .describe('List of data centers'), },
  • src/index.ts:399-432 (registration)
    MCP server.tool registration for the 'create-endpoint' tool, specifying name, input schema, and handler function.
    server.tool( 'create-endpoint', { name: z.string().optional().describe('Name for the endpoint'), templateId: z.string().describe('Template ID to use'), computeType: z .enum(['GPU', 'CPU']) .optional() .describe('GPU or CPU endpoint'), gpuTypeIds: z .array(z.string()) .optional() .describe('List of acceptable GPU types'), gpuCount: z.number().optional().describe('Number of GPUs per worker'), workersMin: z.number().optional().describe('Minimum number of workers'), workersMax: z.number().optional().describe('Maximum number of workers'), dataCenterIds: z .array(z.string()) .optional() .describe('List of data centers'), }, async (params) => { const result = await runpodRequest('/endpoints', 'POST', params); return { content: [ { type: 'text', text: JSON.stringify(result, null, 2), }, ], }; } );
  • Reusable helper for authenticated RunPod API requests, used by create-endpoint and other tools.
    async function runpodRequest( endpoint: string, method: string = 'GET', body?: Record<string, unknown> ) { const url = `${API_BASE_URL}${endpoint}`; const headers = { Authorization: `Bearer ${API_KEY}`, 'Content-Type': 'application/json', }; const options: NodeFetchRequestInit = { method, headers, }; if (body && (method === 'POST' || method === 'PATCH')) { options.body = JSON.stringify(body); } try { const response = await fetch(url, options); if (!response.ok) { const errorText = await response.text(); throw new Error(`RunPod API Error: ${response.status} - ${errorText}`); } // Some endpoints might not return JSON const contentType = response.headers.get('content-type'); if (contentType && contentType.includes('application/json')) { return await response.json(); } return { success: true, status: response.status }; } catch (error) { console.error('Error calling RunPod API:', error); throw error; } }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/runpod/runpod-mcp-ts'

If you have feedback or need assistance with the MCP directory API, please join our Discord server