Skip to main content
Glama

dense_caption

Generate detailed captions for images from URLs or local files to describe visual content using computer vision models.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_pathNo
image_urlNo

Implementation Reference

  • MCP tool handler function for 'dense_caption'. Validates image input (URL or file path) and calls the runner function.
    @mcp.tool()
    def dense_caption(
        image_url: Optional[str] = None,
        file_path: Optional[str] = None,
    ) -> str:
        if not image_url and not file_path:
            raise ValueError("Provide either image_url or file_path")
        if image_url and file_path:
            raise ValueError("Provide only one of image_url or file_path, not both")
        image_ref = image_url or file_path  # type: ignore
        return run_dense_caption(image_ref)
  • Core logic for generating dense captions using local, Ollama, or OpenRouter backends with specific prompts.
    def run_dense_caption(image_ref: str, *, model: Optional[str] = None) -> str:
        if _use_local_for("caption"):
            prompt = f"{prompts.CAPTION_SYSTEM}\n\n{prompts.CAPTION_USER}"
            return _local_gen(image_ref, prompt)
        if _use_ollama_for("caption"):
            from cv_mcp.captioning.ollama_client import OllamaClient
            client = OllamaClient(host=str(_cfg_value("ollama_host", "http://localhost:11434")))
            res = client.analyze_single_image(
                image_ref,
                prompts.CAPTION_USER,
                model=_cfg_value("caption_model"),
                system=prompts.CAPTION_SYSTEM,
            )
            if not res.get("success"):
                raise RuntimeError(str(res.get("error", "Dense caption generation failed (ollama)")))
            return str(res.get("content", "")).strip()
        client = OpenRouterClient()
        res = client.analyze_single_image(
            image_ref,
            prompts.CAPTION_USER,
            model=model or _cfg_value("caption_model"),
            system=prompts.CAPTION_SYSTEM,
        )
        if not res.get("success"):
            raise RuntimeError(str(res.get("error", "Dense caption generation failed")))
        return str(res.get("content", "")).strip()
  • Prompt templates (system and user) specifically for dense caption generation.
    CAPTION_SYSTEM = (
        "You carefully describe visual content without guessing. Mention salient text only if clearly readable."
    )
    
    CAPTION_USER = (
        "Write a factual, detailed caption (2–6 sentences) for this image. Cover:\n"
        "- Who/what is visible (counts if reliable).\n"
        "- Where/setting if visually indicated.\n"
        "- Salient readable text.\n"
        "- Relationships (e.g., 'person holding red umbrella near taxi').\n"
        "- Lighting/time cues if obvious (e.g., night, golden hour).\n"
        "If uncertain, say 'unclear'. Do not guess brands, species, or locations unless unmistakable. Avoid subjective adjectives."
    )
  • The @mcp.tool() decorator registers this function as the 'dense_caption' tool in the FastMCP server.
    @mcp.tool()
    def dense_caption(
        image_url: Optional[str] = None,
        file_path: Optional[str] = None,
    ) -> str:
        if not image_url and not file_path:
            raise ValueError("Provide either image_url or file_path")
        if image_url and file_path:
            raise ValueError("Provide only one of image_url or file_path, not both")
        image_ref = image_url or file_path  # type: ignore
        return run_dense_caption(image_ref)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/samhains/cv-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server