Skip to main content
Glama

STAC MCP Server

by BnJam
0009-data-size-estimation-tool.md3.26 kB
# ADR 0009: Data Size Estimation Tool Status: Proposed Date: 2025-09-23 ## Context Users often need a fast, order‑of‑magnitude estimate of the volume of data (bytes, MB, GB) implied by a STAC query before committing to downstream processing or download steps. The baseline server (ADR 0001) did not include any resource planning utilities. The new `estimate_data_size` tool provides a value‑add planning primitive while keeping within the repository constraints (fast, offline‑testable, no mandatory heavy dependencies beyond those already selected). Key goals: * Provide quick, lazy estimation using `odc.stac` + `xarray` when available. * Offer a deterministic fallback that requires only item metadata (works without fully loading raster assets; remains testable offline). * Allow optional spatial subsetting via an AOI geometry to refine estimates to the *effective* area of interest. * Preserve performance bounds — avoid unbounded iteration (respect a `limit`). ## Decision Introduce a fifth tool `estimate_data_size` with parameters: * `collections` (optional list[str]) * `bbox` (optional list[float] — WGS84 [west, south, east, north]) * `datetime` (optional ISO 8601 interval or timestamp) * `query` (optional object for property filtering) * `aoi_geojson` (optional GeoJSON geometry; used to derive a tighter effective bounding box when present) * `limit` (int, default 100, max 500) — safeguards runtime and memory * `catalog_url` (optional override) Behavior: 1. Perform a STAC search (respecting the provided limit) and collect items. 2. If `odc.stac` is installed: * Attempt a lazy `odc.stac.load` call with `bbox` or AOI‑derived bbox and empty dask chunks to avoid pulling full data eagerly. * Derive per‑variable `nbytes` and aggregate total estimated size. 3. If the lazy load fails OR `odc.stac` is not installed: * Fallback estimation iterates assets and derives a heuristic size using (a) explicit `file:size` extra fields when present, otherwise (b) inferred size rules by media type and approximate area. 4. Return a text (current) or future JSON (ADR 0006) response summarizing item count, size in bytes/MB/GB, spatial/temporal context, variable or asset breakdown, and a message noting which method was used (lazy vs fallback). ## Consequences * Adds planning capability without modifying existing search functionality. * Requires two additional dependencies already accepted (`odc-stac`, `xarray`, `rasterio`) — tool still degrades gracefully if `odc.stac` import fails. * Slightly increases code surface; future refactors (error handling, JSON mode, nodata handling) must include this tool. * Fallback heuristics are approximate and documented as such — not a guarantee of precise on‑disk size. ## Alternatives considered * Always eager download assets to compute exact size — rejected (slow, network heavy, violates ASR 1001 fast tests). * Omit capability entirely — rejected (reduces user guidance; low effort incremental value). * Require `odc.stac` as a hard dependency — rejected (allow installations that only need baseline search tools). ## Addendums * 2025-09-23: Future enhancement planned (see ASR 1006) to account for `nodata` / `_FillValue` compression or sparsity heuristics.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/BnJam/stac-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server