Skip to main content
Glama

caption_image

Generate descriptive captions for images from URLs or local files using AI vision models. Describe key subjects, scenes, and moods in concise language.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
backendNo
file_pathNo
image_urlNo
local_model_idNo
promptNoWrite a concise, vivid caption for this image. Describe key subjects, scene, and mood in 1-2 sentences.

Implementation Reference

  • The MCP tool handler for 'caption_image'. Validates inputs, selects backend (openrouter or local), and delegates captioning to the appropriate client.
    @mcp.tool() def caption_image( image_url: Optional[str] = None, file_path: Optional[str] = None, prompt: str = DEFAULT_PROMPT, backend: Optional[str] = None, local_model_id: Optional[str] = None, ) -> str: if not image_url and not file_path: raise ValueError("Provide either image_url or file_path") if image_url and file_path: raise ValueError("Provide only one of image_url or file_path, not both") image_ref = image_url or file_path # type: ignore # Resolve defaults from global config if not explicitly provided try: from cv_mcp.metadata.runner import _CFG as _GLOBAL_CFG # type: ignore except Exception: _GLOBAL_CFG = {} backend = (backend or str(_GLOBAL_CFG.get("caption_backend", "openrouter"))).lower() local_model_id = local_model_id or str(_GLOBAL_CFG.get("local_vlm_id", "Qwen/Qwen2-VL-2B-Instruct")) if backend == "openrouter": client = OpenRouterClient() res = client.analyze_single_image(image_ref, prompt) if not res.get("success"): raise RuntimeError(str(res.get("error", "Captioning failed"))) content = res.get("content", "") return str(content) elif backend == "local": try: from cv_mcp.captioning.local_captioner import LocalCaptioner except Exception as e: # pragma: no cover raise RuntimeError( "Local backend not available. Install optional deps with `pip install .[local]`." ) from e local = LocalCaptioner(model_id=local_model_id) return local.caption(image_ref, prompt) else: raise ValueError("Invalid backend. Use 'openrouter' or 'local'.")
  • OpenRouter client method called by the handler for remote captioning. Wraps analyze_images for single image.
    def analyze_single_image(self, image: Union[str, Dict], prompt: str, *, model: Optional[str] = None, system: Optional[str] = None) -> Dict[str, Any]: return self.analyze_images([image], prompt, model=model, system=system)
  • Local captioner method called by the handler for local model inference. Loads image, processes with transformers model, generates caption.
    def caption( self, image: Union[str, "Image.Image"], prompt: str, max_new_tokens: int = 128, ) -> str: img = self._load_image(image) messages = [ { "role": "user", "content": [ {"type": "image", "image": img}, {"type": "text", "text": prompt}, ], } ] text = self.processor.apply_chat_template(messages, add_generation_prompt=True) inputs = self.processor(text=[text], images=[img], return_tensors="pt").to(self.model.device) generate_ids = self.model.generate( **inputs, max_new_tokens=max_new_tokens, do_sample=False, use_cache=True, ) out = self.processor.batch_decode(generate_ids, skip_special_tokens=True)[0] return out.strip()

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/samhains/cv-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server