Skip to main content
Glama

gpu_run

Execute AI services including LLM inference, image/video generation, speech processing, and document analysis through a unified GPU compute gateway.

Instructions

Run any GPU-Bridge AI service. 30 services available: LLM inference (sub-second), image generation (FLUX, SD3.5), video generation, video enhancement (up to 4K), speech-to-text (Whisper, <1s), TTS (40+ voices), music generation, voice cloning, embeddings, document reranking (Jina), OCR, PDF/document parsing, NSFW detection, image captioning, visual Q&A, background removal, face restoration, upscaling, stickers, and more. Use gpu_catalog to see all available services.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
serviceYesService key. Common ones: llm-4090 (text), image-4090 (image), video (video), whisper-l4 (speech-to-text), tts-l4 (text-to-speech), embedding-l4 (embeddings), rembg-l4 (bg removal), upscale-l4 (upscale), ocr (text extraction), caption (image caption), face-restore, musicgen-l4, llava-4090 (visual Q&A), sticker, whisperx (diarized STT), bark (expressive TTS), voice-clone, photomaker, ad-inpaint, animate, image-variation, inpaint, controlnet, clip, segmentation, rerank (document reranking), nsfw-detect (content moderation), video-enhance (video upscaling), pdf-parse (document parsing)
inputYesService-specific input. Examples: LLM {"prompt":"...","max_tokens":512,"model":"llama-3.3-70b-versatile"}, Image {"prompt":"..."}, Whisper {"audio_url":"https://..."}, TTS {"text":"...","voice":"af_alloy"}, Embedding {"text":"..."}, OCR/Rembg/Upscale/Caption {"image_url":"https://..."}, Video {"prompt":"..."}
priorityNoRouting priority. "fast" = lowest latency (default), "cheap" = lowest cost.

Implementation Reference

  • index.js:17-39 (registration)
    Registration of the 'gpu_run' tool, including its description and input schema.
    {
      name: "gpu_run",
      description: "Run any GPU-Bridge AI service. 30 services available: LLM inference (sub-second), image generation (FLUX, SD3.5), video generation, video enhancement (up to 4K), speech-to-text (Whisper, <1s), TTS (40+ voices), music generation, voice cloning, embeddings, document reranking (Jina), OCR, PDF/document parsing, NSFW detection, image captioning, visual Q&A, background removal, face restoration, upscaling, stickers, and more. Use gpu_catalog to see all available services.",
      inputSchema: {
        type: "object",
        properties: {
          service: {
            type: "string",
            description: "Service key. Common ones: llm-4090 (text), image-4090 (image), video (video), whisper-l4 (speech-to-text), tts-l4 (text-to-speech), embedding-l4 (embeddings), rembg-l4 (bg removal), upscale-l4 (upscale), ocr (text extraction), caption (image caption), face-restore, musicgen-l4, llava-4090 (visual Q&A), sticker, whisperx (diarized STT), bark (expressive TTS), voice-clone, photomaker, ad-inpaint, animate, image-variation, inpaint, controlnet, clip, segmentation, rerank (document reranking), nsfw-detect (content moderation), video-enhance (video upscaling), pdf-parse (document parsing)"
          },
          input: {
            type: "object",
            description: 'Service-specific input. Examples: LLM {"prompt":"...","max_tokens":512,"model":"llama-3.3-70b-versatile"}, Image {"prompt":"..."}, Whisper {"audio_url":"https://..."}, TTS {"text":"...","voice":"af_alloy"}, Embedding {"text":"..."}, OCR/Rembg/Upscale/Caption {"image_url":"https://..."}, Video {"prompt":"..."}'
          },
          priority: {
            type: "string",
            enum: ["fast", "cheap"],
            description: 'Routing priority. "fast" = lowest latency (default), "cheap" = lowest cost.'
          }
        },
        required: ["service", "input"]
      }
    },
  • Handler logic for the 'gpu_run' tool, which makes the API call and polls for the job result.
          case "gpu_run": {
            const { service, input, priority } = args;
            const headers = {};
            if (priority) headers["X-Priority"] = priority;
            const job = await apiCall("/run", "POST", { service, input }, headers);
            if (job.error) {
              return { content: [{ type: "text", text: `Error: ${job.error}${job.hint ? `
    Hint: ${job.hint}` : ""}${job.available_services ? `
    Available: ${job.available_services.join(", ")}` : ""}` }], isError: true };
            }
            const result = await pollJob(job.job_id);
            const output = result.output;
            let text;
            if (typeof output === "string") {
              text = output;
            } else if (output?.text) {
              text = output.text;
            } else if (output?.url) {
              text = output.url;
            } else if (output?.audio_url) {
              text = output.audio_url;
            } else if (output?.embedding) {
              text = `Embedding (${output.dimensions} dimensions): [${output.embedding.slice(0, 5).map((n) => n.toFixed(4)).join(", ")}...]`;
            } else {
              text = JSON.stringify(output, null, 2);
            }
            if (result.output_notice) {
              text += `
    
    Note: ${result.output_notice}`;
            }
            return { content: [{ type: "text", text }] };
          }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/gpu-bridge/mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server