create-endpoint
Deploy scalable AI inference endpoints on RunPod by configuring compute resources, GPU specifications, and worker scaling parameters.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| name | No | Name for the endpoint | |
| templateId | Yes | Template ID to use | |
| computeType | No | GPU or CPU endpoint | |
| gpuTypeIds | No | List of acceptable GPU types | |
| gpuCount | No | Number of GPUs per worker | |
| workersMin | No | Minimum number of workers | |
| workersMax | No | Maximum number of workers | |
| dataCenterIds | No | List of data centers |
Implementation Reference
- src/index.ts:420-431 (handler)Handler function for the 'create-endpoint' tool that POSTs to RunPod /endpoints API and returns the JSON response as formatted text.async (params) => { const result = await runpodRequest('/endpoints', 'POST', params); return { content: [ { type: 'text', text: JSON.stringify(result, null, 2), }, ], }; }
- src/index.ts:401-419 (schema)Zod input schema defining parameters for creating a RunPod endpoint.{ name: z.string().optional().describe('Name for the endpoint'), templateId: z.string().describe('Template ID to use'), computeType: z .enum(['GPU', 'CPU']) .optional() .describe('GPU or CPU endpoint'), gpuTypeIds: z .array(z.string()) .optional() .describe('List of acceptable GPU types'), gpuCount: z.number().optional().describe('Number of GPUs per worker'), workersMin: z.number().optional().describe('Minimum number of workers'), workersMax: z.number().optional().describe('Maximum number of workers'), dataCenterIds: z .array(z.string()) .optional() .describe('List of data centers'), },
- src/index.ts:399-432 (registration)MCP server.tool registration for the 'create-endpoint' tool, specifying name, input schema, and handler function.server.tool( 'create-endpoint', { name: z.string().optional().describe('Name for the endpoint'), templateId: z.string().describe('Template ID to use'), computeType: z .enum(['GPU', 'CPU']) .optional() .describe('GPU or CPU endpoint'), gpuTypeIds: z .array(z.string()) .optional() .describe('List of acceptable GPU types'), gpuCount: z.number().optional().describe('Number of GPUs per worker'), workersMin: z.number().optional().describe('Minimum number of workers'), workersMax: z.number().optional().describe('Maximum number of workers'), dataCenterIds: z .array(z.string()) .optional() .describe('List of data centers'), }, async (params) => { const result = await runpodRequest('/endpoints', 'POST', params); return { content: [ { type: 'text', text: JSON.stringify(result, null, 2), }, ], }; } );
- src/index.ts:27-66 (helper)Reusable helper for authenticated RunPod API requests, used by create-endpoint and other tools.async function runpodRequest( endpoint: string, method: string = 'GET', body?: Record<string, unknown> ) { const url = `${API_BASE_URL}${endpoint}`; const headers = { Authorization: `Bearer ${API_KEY}`, 'Content-Type': 'application/json', }; const options: NodeFetchRequestInit = { method, headers, }; if (body && (method === 'POST' || method === 'PATCH')) { options.body = JSON.stringify(body); } try { const response = await fetch(url, options); if (!response.ok) { const errorText = await response.text(); throw new Error(`RunPod API Error: ${response.status} - ${errorText}`); } // Some endpoints might not return JSON const contentType = response.headers.get('content-type'); if (contentType && contentType.includes('application/json')) { return await response.json(); } return { success: true, status: response.status }; } catch (error) { console.error('Error calling RunPod API:', error); throw error; } }