Skip to main content
Glama

caption_image

Generate descriptive captions for images from URLs or local files using AI vision models. Describe key subjects, scenes, and moods in concise language.

Input Schema

NameRequiredDescriptionDefault
backendNo
file_pathNo
image_urlNo
local_model_idNo
promptNoWrite a concise, vivid caption for this image. Describe key subjects, scene, and mood in 1-2 sentences.

Input Schema (JSON Schema)

{ "properties": { "backend": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Backend" }, "file_path": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "File Path" }, "image_url": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Image Url" }, "local_model_id": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Local Model Id" }, "prompt": { "default": "Write a concise, vivid caption for this image. Describe key subjects, scene, and mood in 1-2 sentences.", "title": "Prompt", "type": "string" } }, "title": "caption_imageArguments", "type": "object" }

Implementation Reference

  • The MCP tool handler for 'caption_image'. Validates inputs, selects backend (openrouter or local), and delegates captioning to the appropriate client.
    @mcp.tool() def caption_image( image_url: Optional[str] = None, file_path: Optional[str] = None, prompt: str = DEFAULT_PROMPT, backend: Optional[str] = None, local_model_id: Optional[str] = None, ) -> str: if not image_url and not file_path: raise ValueError("Provide either image_url or file_path") if image_url and file_path: raise ValueError("Provide only one of image_url or file_path, not both") image_ref = image_url or file_path # type: ignore # Resolve defaults from global config if not explicitly provided try: from cv_mcp.metadata.runner import _CFG as _GLOBAL_CFG # type: ignore except Exception: _GLOBAL_CFG = {} backend = (backend or str(_GLOBAL_CFG.get("caption_backend", "openrouter"))).lower() local_model_id = local_model_id or str(_GLOBAL_CFG.get("local_vlm_id", "Qwen/Qwen2-VL-2B-Instruct")) if backend == "openrouter": client = OpenRouterClient() res = client.analyze_single_image(image_ref, prompt) if not res.get("success"): raise RuntimeError(str(res.get("error", "Captioning failed"))) content = res.get("content", "") return str(content) elif backend == "local": try: from cv_mcp.captioning.local_captioner import LocalCaptioner except Exception as e: # pragma: no cover raise RuntimeError( "Local backend not available. Install optional deps with `pip install .[local]`." ) from e local = LocalCaptioner(model_id=local_model_id) return local.caption(image_ref, prompt) else: raise ValueError("Invalid backend. Use 'openrouter' or 'local'.")
  • OpenRouter client method called by the handler for remote captioning. Wraps analyze_images for single image.
    def analyze_single_image(self, image: Union[str, Dict], prompt: str, *, model: Optional[str] = None, system: Optional[str] = None) -> Dict[str, Any]: return self.analyze_images([image], prompt, model=model, system=system)
  • Local captioner method called by the handler for local model inference. Loads image, processes with transformers model, generates caption.
    def caption( self, image: Union[str, "Image.Image"], prompt: str, max_new_tokens: int = 128, ) -> str: img = self._load_image(image) messages = [ { "role": "user", "content": [ {"type": "image", "image": img}, {"type": "text", "text": prompt}, ], } ] text = self.processor.apply_chat_template(messages, add_generation_prompt=True) inputs = self.processor(text=[text], images=[img], return_tensors="pt").to(self.model.device) generate_ids = self.model.generate( **inputs, max_new_tokens=max_new_tokens, do_sample=False, use_cache=True, ) out = self.processor.batch_decode(generate_ids, skip_special_tokens=True)[0] return out.strip()

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/samhains/cv-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server