dense_caption

dense_caption

Generate detailed captions for images from URLs or local files to describe visual content using computer vision models.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`file_path`	No
`image_url`	No

Implementation Reference

src/cv_mcp/mcp_server.py:94-105 (handler)
MCP tool handler function for 'dense_caption'. Validates image input (URL or file path) and calls the runner function.
@mcp.tool() def dense_caption( image_url: Optional[str] = None, file_path: Optional[str] = None, ) -> str: if not image_url and not file_path: raise ValueError("Provide either image_url or file_path") if image_url and file_path: raise ValueError("Provide only one of image_url or file_path, not both") image_ref = image_url or file_path # type: ignore return run_dense_caption(image_ref)
src/cv_mcp/metadata/runner.py:146-171 (helper)
Core logic for generating dense captions using local, Ollama, or OpenRouter backends with specific prompts.
def run_dense_caption(image_ref: str, *, model: Optional[str] = None) -> str: if _use_local_for("caption"): prompt = f"{prompts.CAPTION_SYSTEM}\n\n{prompts.CAPTION_USER}" return _local_gen(image_ref, prompt) if _use_ollama_for("caption"): from cv_mcp.captioning.ollama_client import OllamaClient client = OllamaClient(host=str(_cfg_value("ollama_host", "http://localhost:11434"))) res = client.analyze_single_image( image_ref, prompts.CAPTION_USER, model=_cfg_value("caption_model"), system=prompts.CAPTION_SYSTEM, ) if not res.get("success"): raise RuntimeError(str(res.get("error", "Dense caption generation failed (ollama)"))) return str(res.get("content", "")).strip() client = OpenRouterClient() res = client.analyze_single_image( image_ref, prompts.CAPTION_USER, model=model or _cfg_value("caption_model"), system=prompts.CAPTION_SYSTEM, ) if not res.get("success"): raise RuntimeError(str(res.get("error", "Dense caption generation failed"))) return str(res.get("content", "")).strip()
src/cv_mcp/metadata/prompts.py:15-27 (helper)
Prompt templates (system and user) specifically for dense caption generation.
CAPTION_SYSTEM = ( "You carefully describe visual content without guessing. Mention salient text only if clearly readable." ) CAPTION_USER = ( "Write a factual, detailed caption (2–6 sentences) for this image. Cover:\n" "- Who/what is visible (counts if reliable).\n" "- Where/setting if visually indicated.\n" "- Salient readable text.\n" "- Relationships (e.g., 'person holding red umbrella near taxi').\n" "- Lighting/time cues if obvious (e.g., night, golden hour).\n" "If uncertain, say 'unclear'. Do not guess brands, species, or locations unless unmistakable. Avoid subjective adjectives." )
src/cv_mcp/mcp_server.py:94-105 (registration)
The @mcp.tool() decorator registers this function as the 'dense_caption' tool in the FastMCP server.
@mcp.tool() def dense_caption( image_url: Optional[str] = None, file_path: Optional[str] = None, ) -> str: if not image_url and not file_path: raise ValueError("Provide either image_url or file_path") if image_url and file_path: raise ValueError("Provide only one of image_url or file_path, not both") image_ref = image_url or file_path # type: ignore return run_dense_caption(image_ref)

Computer Vision MCP Server

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API