describe_region
Analyze specific image regions by cropping to bounding boxes and generating detailed descriptions. Use after object detection to focus on particular elements.
Instructions
Crop an image to a bounding box and describe that region in detail. Use this after detect() to zoom in on specific objects.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| image | Yes | Path to the image file | |
| bbox | Yes | Bounding box as [ymin, xmin, ymax, xmax] normalized 0-1000 | |
| prompt | No | Optional question or instruction for the description | |
| provider | No | Vision provider to use (default: gemini) |
Implementation Reference
- src/tools/describe-region.ts:47-89 (handler)The handler function `handleDescribeRegion` that crops the image to the specified bounding box using `cropToRegion`, encodes it to base64, and generates a detailed description using the selected vision provider (gemini, openai, or claude). Returns a structured response with bbox and description.export async function handleDescribeRegion(args: Record<string, unknown>) { const image = args.image as string; const bbox = args.bbox as [number, number, number, number]; const prompt = args.prompt as string | undefined; const provider = (args.provider as Provider) || "gemini"; // Crop to region const { buffer } = await cropToRegion(image, bbox); const base64 = buffer.toString("base64"); const mimeType = "image/png"; let description: string; switch (provider) { case "gemini": description = await geminiDescribe(base64, mimeType, prompt, "detailed"); break; case "openai": description = await openaiDescribe(base64, mimeType, prompt, "detailed"); break; case "claude": description = await claudeDescribe(base64, mimeType, prompt, "detailed"); break; default: throw new Error(`Unknown provider: ${provider}`); } return { content: [ { type: "text", text: JSON.stringify( { bbox, description, }, null, 2 ), }, ], }; }
- src/tools/describe-region.ts:14-45 (schema)The tool definition `describeRegionTool` including name, description, and input schema specifying required `image` and `bbox` parameters, optional `prompt` and `provider`.export const describeRegionTool: Tool = { name: "describe_region", description: "Crop an image to a bounding box and describe that region in detail. Use this after detect() to zoom in on specific objects.", inputSchema: { type: "object", properties: { image: { type: "string", description: "Path to the image file or URL (http/https)", }, bbox: { type: "array", items: { type: "number" }, minItems: 4, maxItems: 4, description: "Bounding box as [ymin, xmin, ymax, xmax] normalized 0-1000", }, prompt: { type: "string", description: "Optional question or instruction for the description", }, provider: { type: "string", enum: ["gemini", "openai", "claude"], description: "Vision provider to use (default: gemini)", }, }, required: ["image", "bbox"], }, };
- src/index.ts:58-59 (registration)Registration of the `describe_region` tool handler in the main switch statement for tool calls.case "describe_region": return await handleDescribeRegion(args);
- src/index.ts:42-42 (registration)Registration of the `describeRegionTool` schema in the list of available tools returned by ListToolsRequestHandler.describeRegionTool,
- src/index.ts:21-21 (registration)Import of the tool schema and handler from the implementation file.import { describeRegionTool, handleDescribeRegion } from "./tools/describe-region.js";