Skip to main content
Glama

dense_caption

Generate detailed captions for images from URLs or local files to describe visual content using computer vision models.

Input Schema

NameRequiredDescriptionDefault
file_pathNo
image_urlNo

Input Schema (JSON Schema)

{ "properties": { "file_path": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "File Path" }, "image_url": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Image Url" } }, "title": "dense_captionArguments", "type": "object" }

Implementation Reference

  • MCP tool handler function for 'dense_caption'. Validates image input (URL or file path) and calls the runner function.
    @mcp.tool() def dense_caption( image_url: Optional[str] = None, file_path: Optional[str] = None, ) -> str: if not image_url and not file_path: raise ValueError("Provide either image_url or file_path") if image_url and file_path: raise ValueError("Provide only one of image_url or file_path, not both") image_ref = image_url or file_path # type: ignore return run_dense_caption(image_ref)
  • Core logic for generating dense captions using local, Ollama, or OpenRouter backends with specific prompts.
    def run_dense_caption(image_ref: str, *, model: Optional[str] = None) -> str: if _use_local_for("caption"): prompt = f"{prompts.CAPTION_SYSTEM}\n\n{prompts.CAPTION_USER}" return _local_gen(image_ref, prompt) if _use_ollama_for("caption"): from cv_mcp.captioning.ollama_client import OllamaClient client = OllamaClient(host=str(_cfg_value("ollama_host", "http://localhost:11434"))) res = client.analyze_single_image( image_ref, prompts.CAPTION_USER, model=_cfg_value("caption_model"), system=prompts.CAPTION_SYSTEM, ) if not res.get("success"): raise RuntimeError(str(res.get("error", "Dense caption generation failed (ollama)"))) return str(res.get("content", "")).strip() client = OpenRouterClient() res = client.analyze_single_image( image_ref, prompts.CAPTION_USER, model=model or _cfg_value("caption_model"), system=prompts.CAPTION_SYSTEM, ) if not res.get("success"): raise RuntimeError(str(res.get("error", "Dense caption generation failed"))) return str(res.get("content", "")).strip()
  • Prompt templates (system and user) specifically for dense caption generation.
    CAPTION_SYSTEM = ( "You carefully describe visual content without guessing. Mention salient text only if clearly readable." ) CAPTION_USER = ( "Write a factual, detailed caption (2–6 sentences) for this image. Cover:\n" "- Who/what is visible (counts if reliable).\n" "- Where/setting if visually indicated.\n" "- Salient readable text.\n" "- Relationships (e.g., 'person holding red umbrella near taxi').\n" "- Lighting/time cues if obvious (e.g., night, golden hour).\n" "If uncertain, say 'unclear'. Do not guess brands, species, or locations unless unmistakable. Avoid subjective adjectives." )
  • The @mcp.tool() decorator registers this function as the 'dense_caption' tool in the FastMCP server.
    @mcp.tool() def dense_caption( image_url: Optional[str] = None, file_path: Optional[str] = None, ) -> str: if not image_url and not file_path: raise ValueError("Provide either image_url or file_path") if image_url and file_path: raise ValueError("Provide only one of image_url or file_path, not both") image_ref = image_url or file_path # type: ignore return run_dense_caption(image_ref)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/samhains/cv-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server