extract_structured
Extract structured metadata from web pages: JSON-LD, OpenGraph, microdata. Retrieve fields like price, rating, author, and date for analysis.
Instructions
Pull JSON-LD, OpenGraph, Twitter cards, and microdata from a web page.
Best for:
- Product pages (price, currency, availability, brand, rating).
- Article pages (author, publish date, image, headline).
- Recipe / event / video pages where rich metadata IS the answer.
- Cases where `fetch` returns prose but you need fields.
Not recommended for:
- Just reading a page -> use `fetch`.
- PDFs / DOCX -> use `read_doc`.
- Pages that don't publish schema.org metadata (most blogs) — you'll get
empty lists; fall back to `fetch`.
Returns:
- json: {url, json_ld:[], microdata:[], opengraph:[], rdfa:[]}. Twitter
card meta tags are surfaced inside the `opengraph` list.
- markdown (default): a flattened key/value view with each block printed
as a JSON code block under its syntax heading.
Common mistakes:
- Calling on every URL "just in case" — most sites have no structured
data, and `fetch` is what you actually want.
Args:
url: Absolute http(s) URL.
format: "markdown" (default) or "json".
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | ||
| format | No | markdown |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |