Skip to main content
Glama

caption

Generate descriptive captions for image files by processing them through the Florence-2 MCP Server, using a file path or URL as input.

Instructions

Processes an image file and generates captions for the image.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
srcYesA file path or URL to the image file that needs to be processed.

Implementation Reference

  • Handler and registration for the MCP 'caption' tool. Processes src (file path or URL), loads images, and delegates to processor.caption with MORE_DETAILED level.
    @mcp.tool() def caption( ctx: Context, src: PathLike | str = Field(description="A file path or URL to the image file that needs to be processed."), ) -> list[str]: """Processes an image file and generates captions for the image.""" with get_images(src) as images: app_ctx: AppContext = ctx.request_context.lifespan_context return app_ctx.processor.caption(images, CaptionLevel.MORE_DETAILED)
  • Core helper function in Florence2 that performs the actual model inference for generating captions or OCR text.
    def generate(self, prompt: str, images: list[Image]) -> list[str]: res = [] for img in images: with img.convert("RGB") as rgb_img: inputs = self.processor(text=prompt, images=rgb_img, return_tensors="pt").to( self.device, self.torch_dtype ) generated_ids = self.model.generate( input_ids=inputs["input_ids"], pixel_values=inputs["pixel_values"], max_new_tokens=1024, num_beams=3, do_sample=False, ) generated_text = self.processor.batch_decode(generated_ids, skip_special_tokens=False)[0] parsed_answer = self.processor.post_process_generation( generated_text, task=prompt, image_size=(rgb_img.width, rgb_img.height) ) res.append(parsed_answer[prompt].strip()) return res
  • Helper context manager to load one or more PIL Images from local file path, URL, or PDF (multi-page support).
    @contextmanager def get_images(src: PathLike | str) -> Iterator[list[Image]]: """Opens and returns a list of images from a file path or URL.""" if isinstance(src, str) and (src.startswith("http://") or src.startswith("https://")): res = requests.get(src) res.raise_for_status() if res.headers["Content-Type"] == "application/pdf": pass with ExitStack() as stack: images = [] with closing(PdfDocument(res.content)) as doc: for page in doc: images.append(stack.enter_context(page.render().to_pil())) yield images else: with open_image(BytesIO(res.content)) as image: yield [image] else: ext = os.path.splitext(src)[1].lower() if ext == ".pdf": with ExitStack() as stack: images = [] with closing(PdfDocument(src)) as doc: for page in doc: images.append(stack.enter_context(page.render().to_pil())) yield images else: with open_image(src) as image: yield [image]
  • Enum schema defining caption prompt levels used by the Florence2 caption method.
    class CaptionLevel(StrEnum): NORMAL = "<CAPTION>" DETAILED = "<DETAILED_CAPTION>" MORE_DETAILED = "<MORE_DETAILED_CAPTION>"
  • Florence2 processor's caption method, bridging to the generate helper with level-based prompt.
    def caption(self, images: list[Image], level: CaptionLevel = CaptionLevel.NORMAL) -> list[str]: return self.generate(str(level.value), images)

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jkawamoto/mcp-florence2'

If you have feedback or need assistance with the MCP directory API, please join our Discord server