Skip to main content
Glama

Image Tiler for LLM Vision

tiler

Split large images or web page captures into optimally sized tiles for LLM vision analysis, reducing token costs by fetching only non-blank, relevant tiles.

Instructions

Split images into optimally-sized tiles for LLM vision analysis, or capture web page screenshots and tile them.

MANDATORY two-phase workflow — DO NOT skip Phase 1:

Phase 1 (REQUIRED first): Provide ONLY the image source (filePath, sourceUrl, url, etc). DO NOT include preset, tileSize, or outputDir. Returns a model comparison table with token estimates and an outputDir. You MUST present this table to the user and ask which preset they prefer. DO NOT select a preset yourself — the user decides. If you must auto-select, always use the cheapest option.

Phase 2: Call again with the user's chosen preset + the outputDir from Phase 1. Re-include your original image source (filePath, sourceUrl, etc.). For captures, use screenshotPath from Phase 1 instead of url. Returns tile summary with metadata and content hints (no tile images). Use tilesDir + start/end to fetch only the tiles you need.

Stop after Phase 1 if you only need the screenshot (capture mode) or comparison data.

4 tiling presets available:

  • "claude": 1092px tiles, ~1590 tokens/tile

  • "openai": 768px tiles, ~765 tokens/tile

  • "gemini3": 1536px tiles, ~1120 tokens/tile

  • "gemini": 768px tiles, ~258 tokens/tile

Supports: local files (filePath), remote images (sourceUrl), data URLs, base64, and web page capture (url — Chrome required). Tiles saved as WebP (default) or PNG. Auto-downscales images over 10000px by default.

TOKEN COST NOTE: The get-tiles mode returns image tiles as inline base64, consuming significantly more tokens than typical text-only MCP tools. Each tile costs ~258-1590 tokens depending on preset. Use the Phase 2 summary and tile hints to fetch only non-blank, relevant tiles.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
endNoEnd tile index (0-based, inclusive). Defaults to start + 4. Max 5 tiles per batch to stay within MCP response limits.
urlNoURL of the web page to capture. Requires Chrome/Chromium installed.
pageNoTile page to return (0 = first 5, 1 = next 5, etc.). Default: 0
delayNoAdditional delay in ms after the page is loaded, before capturing (default: 3000)
modelNoDeprecated: use "preset" instead. Accepted for backward compatibility. Available: "claude" (1092px tiles, ~1590 tokens/tile), "openai" (768px tiles, ~765 tokens/tile), "gemini3" (1536px tiles, ~1120 tokens/tile), "gemini" (768px tiles, ~258 tokens/tile).
startNoStart tile index (0-based, inclusive). Used with tilesDir for pagination.
formatNoOutput format for tiles: "webp" (smaller, default) or "png" (lossless)webp
mobileNoWhether to emulate a mobile device. When true, defaults viewportWidth to 390, deviceScaleFactor to 2, and sets a mobile user agent if not explicitly provided.
presetNoDO NOT provide on Phase 1 (first call). Only specify on Phase 2 after the user has chosen from the comparison table. Available: "claude" (1092px tiles, ~1590 tokens/tile), "openai" (768px tiles, ~765 tokens/tile), "gemini3" (1536px tiles, ~1120 tokens/tile), "gemini" (768px tiles, ~258 tokens/tile). Auto-selects cheapest when omitted on Phase 2.
dataUrlNoData URL with base64-encoded image (e.g. "data:image/png;base64,...")
filePathNoAbsolute or relative path to the image file
tileSizeNoTile size in pixels. If omitted, uses the model's optimal default (Claude: 1092, OpenAI: 768, Gemini 3: 1536, Gemini: 768). Values outside the model's supported range are automatically clamped with a warning.
tilesDirNoPath to the tiles directory (returned as outputDir from a previous tiling call). When provided, returns tiles as base64 images for pagination.
outputDirNoDirectory to save tiles. Defaults to tiles/{name}_vN/ next to source for filePath; {base}/tiles/tiled_{ts}_{hex}/ for URL/base64 sources; {base}/tiles/capture_{ts}_{hex}/ for captures.
sourceUrlNoURL to download the image from (max 50MB, 30s timeout). SSRF filtering active on https:. Set TILER_DENY_HTTP_PRIVATE=1 to block private IPs on http:
userAgentNoCustom user agent string
waitUntilNoWhen to consider the page loaded: "load" (default), "networkidle", or "domcontentloaded"load
imageBase64NoRaw base64-encoded image data (no data URL prefix)
maxDimensionNoMax dimension in px (0 to disable, or 256-65536). When set, the image is resized so its longest side fits within this value before tiling. Reduces token consumption for large images. Defaults to 10000px. Set to 0 to disable auto-downscaling.
viewportWidthNoBrowser viewport width in pixels. Defaults to 1280 for desktop, 390 when mobile is true.
screenshotPathNoPath to a previously captured screenshot. When provided and accessible, skips URL capture.
skipBlankTilesNoSkip blank tiles in get-tiles mode, returning text annotations instead of images. Set to false to include all tiles. Default: true.
includeMetadataNoAnalyze each tile and return content hints (blank, low-detail, mixed, high-detail) and brightness stats. Enabled by default; set to false to skip.
deviceScaleFactorNoDevice pixel ratio (e.g., 2 for retina). Defaults to 1 for desktop, 2 when mobile is true.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate openWorldHint=true, and the description adds context like auto-downscaling, token consumption notes, and file output behavior. While it could mention more side effects, it does not contradict annotations and adds useful behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is lengthy but well-structured with clear headings and numbered phases. Every sentence adds value, though minor trimming could improve conciseness. It front-loads the critical workflow.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (24 parameters, no output schema), the description covers the workflow, token costs, supported sources, and parameter timing comprehensively. It is complete and leaves no major gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds significant value by explaining the two-phase workflow, which parameters are for which phase, and token costs. This goes beyond the schema definitions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool splits images into tiles for LLM vision analysis or captures web page screenshots and tiles them. It provides a specific verb and resource, and the detailed workflow distinguishes it from any potential siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly outlines a mandatory two-phase workflow with step-by-step instructions, when to stop, and which parameters to use in each phase. It also provides token cost comparisons for presets, aiding the agent in selecting the appropriate mode.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/keiver/image-tiler-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server