ocr
Extract text from image files or URLs using optical character recognition (OCR) with the Florence-2 MCP Server. Process images to retrieve text content efficiently.
Instructions
Process an image file or URL using OCR to extract text.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| src | Yes | A file path or URL to the image file that needs to be processed. |
Implementation Reference
- src/mcp_florence2/__init__.py:116-124 (handler)The primary MCP tool handler for 'ocr'. Accepts src (path/URL), loads images using get_images, and invokes the processor's ocr method via app lifespan context.@mcp.tool() def ocr( ctx: Context, src: PathLike | str = Field(description="A file path or URL to the image file that needs to be processed."), ) -> list[str]: """Process an image file or URL using OCR to extract text.""" with get_images(src) as images: app_ctx: AppContext = ctx.request_context.lifespan_context return app_ctx.processor.ocr(images)
- src/mcp_florence2/__init__.py:31-63 (helper)Helper context manager to load a list of PIL Images from src: supports local image/PDF files, HTTP(S) URLs to images/PDFs (renders PDF pages).@contextmanager def get_images(src: PathLike | str) -> Iterator[list[Image]]: """Opens and returns a list of images from a file path or URL.""" if isinstance(src, str) and (src.startswith("http://") or src.startswith("https://")): res = requests.get(src) res.raise_for_status() if res.headers["Content-Type"] == "application/pdf": pass with ExitStack() as stack: images = [] with closing(PdfDocument(res.content)) as doc: for page in doc: images.append(stack.enter_context(page.render().to_pil())) yield images else: with open_image(BytesIO(res.content)) as image: yield [image] else: ext = os.path.splitext(src)[1].lower() if ext == ".pdf": with ExitStack() as stack: images = [] with closing(PdfDocument(src)) as doc: for page in doc: images.append(stack.enter_context(page.render().to_pil())) yield images else: with open_image(src) as image: yield [image]
- src/mcp_florence2/florence2.py:50-79 (helper)Core Florence2 processor implementation: ocr() calls generate("<OCR>", images), which processes each image using the transformers Florence2 model to generate and parse OCR text.def ocr(self, images: list[Image]) -> list[str]: return self.generate("<OCR>", images) def caption(self, images: list[Image], level: CaptionLevel = CaptionLevel.NORMAL) -> list[str]: return self.generate(str(level.value), images) def generate(self, prompt: str, images: list[Image]) -> list[str]: res = [] for img in images: with img.convert("RGB") as rgb_img: inputs = self.processor(text=prompt, images=rgb_img, return_tensors="pt").to( self.device, self.torch_dtype ) generated_ids = self.model.generate( input_ids=inputs["input_ids"], pixel_values=inputs["pixel_values"], max_new_tokens=1024, num_beams=3, do_sample=False, ) generated_text = self.processor.batch_decode(generated_ids, skip_special_tokens=False)[0] parsed_answer = self.processor.post_process_generation( generated_text, task=prompt, image_size=(rgb_img.width, rgb_img.height) ) res.append(parsed_answer[prompt].strip()) return res
- src/mcp_florence2/__init__.py:112-137 (registration)Factory function to create and configure the FastMCP server instance, including lifespan for processor initialization and registration of 'ocr' and 'caption' tools via @mcp.tool() decorators.def server(name: str, model_id: str, subprocess: bool = True) -> FastMCP: """Creates a new FastMCP server instance with the specified name and model ID.""" mcp = FastMCP(name, lifespan=partial(app_lifespan, model_id=model_id, subprocess=subprocess)) @mcp.tool() def ocr( ctx: Context, src: PathLike | str = Field(description="A file path or URL to the image file that needs to be processed."), ) -> list[str]: """Process an image file or URL using OCR to extract text.""" with get_images(src) as images: app_ctx: AppContext = ctx.request_context.lifespan_context return app_ctx.processor.ocr(images) @mcp.tool() def caption( ctx: Context, src: PathLike | str = Field(description="A file path or URL to the image file that needs to be processed."), ) -> list[str]: """Processes an image file and generates captions for the image.""" with get_images(src) as images: app_ctx: AppContext = ctx.request_context.lifespan_context return app_ctx.processor.caption(images, CaptionLevel.MORE_DETAILED) return mcp
- Input schema for 'ocr' tool: parameter 'src' (PathLike | str) with description, using pydantic Field for validation in MCP.def ocr( ctx: Context, src: PathLike | str = Field(description="A file path or URL to the image file that needs to be processed."),