Skip to main content
Glama

ocr

Extract text from image files or URLs using optical character recognition (OCR) with the Florence-2 MCP Server. Process images to retrieve text content efficiently.

Instructions

Process an image file or URL using OCR to extract text.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
srcYesA file path or URL to the image file that needs to be processed.

Implementation Reference

  • The primary MCP tool handler for 'ocr'. Accepts src (path/URL), loads images using get_images, and invokes the processor's ocr method via app lifespan context.
    @mcp.tool() def ocr( ctx: Context, src: PathLike | str = Field(description="A file path or URL to the image file that needs to be processed."), ) -> list[str]: """Process an image file or URL using OCR to extract text.""" with get_images(src) as images: app_ctx: AppContext = ctx.request_context.lifespan_context return app_ctx.processor.ocr(images)
  • Helper context manager to load a list of PIL Images from src: supports local image/PDF files, HTTP(S) URLs to images/PDFs (renders PDF pages).
    @contextmanager def get_images(src: PathLike | str) -> Iterator[list[Image]]: """Opens and returns a list of images from a file path or URL.""" if isinstance(src, str) and (src.startswith("http://") or src.startswith("https://")): res = requests.get(src) res.raise_for_status() if res.headers["Content-Type"] == "application/pdf": pass with ExitStack() as stack: images = [] with closing(PdfDocument(res.content)) as doc: for page in doc: images.append(stack.enter_context(page.render().to_pil())) yield images else: with open_image(BytesIO(res.content)) as image: yield [image] else: ext = os.path.splitext(src)[1].lower() if ext == ".pdf": with ExitStack() as stack: images = [] with closing(PdfDocument(src)) as doc: for page in doc: images.append(stack.enter_context(page.render().to_pil())) yield images else: with open_image(src) as image: yield [image]
  • Core Florence2 processor implementation: ocr() calls generate("<OCR>", images), which processes each image using the transformers Florence2 model to generate and parse OCR text.
    def ocr(self, images: list[Image]) -> list[str]: return self.generate("<OCR>", images) def caption(self, images: list[Image], level: CaptionLevel = CaptionLevel.NORMAL) -> list[str]: return self.generate(str(level.value), images) def generate(self, prompt: str, images: list[Image]) -> list[str]: res = [] for img in images: with img.convert("RGB") as rgb_img: inputs = self.processor(text=prompt, images=rgb_img, return_tensors="pt").to( self.device, self.torch_dtype ) generated_ids = self.model.generate( input_ids=inputs["input_ids"], pixel_values=inputs["pixel_values"], max_new_tokens=1024, num_beams=3, do_sample=False, ) generated_text = self.processor.batch_decode(generated_ids, skip_special_tokens=False)[0] parsed_answer = self.processor.post_process_generation( generated_text, task=prompt, image_size=(rgb_img.width, rgb_img.height) ) res.append(parsed_answer[prompt].strip()) return res
  • Factory function to create and configure the FastMCP server instance, including lifespan for processor initialization and registration of 'ocr' and 'caption' tools via @mcp.tool() decorators.
    def server(name: str, model_id: str, subprocess: bool = True) -> FastMCP: """Creates a new FastMCP server instance with the specified name and model ID.""" mcp = FastMCP(name, lifespan=partial(app_lifespan, model_id=model_id, subprocess=subprocess)) @mcp.tool() def ocr( ctx: Context, src: PathLike | str = Field(description="A file path or URL to the image file that needs to be processed."), ) -> list[str]: """Process an image file or URL using OCR to extract text.""" with get_images(src) as images: app_ctx: AppContext = ctx.request_context.lifespan_context return app_ctx.processor.ocr(images) @mcp.tool() def caption( ctx: Context, src: PathLike | str = Field(description="A file path or URL to the image file that needs to be processed."), ) -> list[str]: """Processes an image file and generates captions for the image.""" with get_images(src) as images: app_ctx: AppContext = ctx.request_context.lifespan_context return app_ctx.processor.caption(images, CaptionLevel.MORE_DETAILED) return mcp
  • Input schema for 'ocr' tool: parameter 'src' (PathLike | str) with description, using pydantic Field for validation in MCP.
    def ocr( ctx: Context, src: PathLike | str = Field(description="A file path or URL to the image file that needs to be processed."),

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jkawamoto/mcp-florence2'

If you have feedback or need assistance with the MCP directory API, please join our Discord server