Skip to main content
Glama

diffbot.pages.analyze

Extract structured data from any web page by auto-detecting its type (product, article, image, video) using AI-powered analysis.

Instructions

Auto-detect page type (product, article, image, video) and extract structured data from any URL using AI (Diffbot)

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesWeb page URL to auto-detect type and extract structured data
modeNoForce extraction mode instead of auto-detection
fallbackNoFallback extraction type if auto-detection fails
timeoutNoRequest timeout in milliseconds (5000-30000, default 15000)
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the burden of behavioral disclosure and adds context by identifying the AI provider (Diffbot) and the extraction mechanism. However, it lacks details on error handling, rate limits, caching behavior, or what constitutes 'structured data' in the output.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the primary action ('Auto-detect page type') and immediately specifies the supported content types and extraction purpose. Every word contributes to understanding the tool's core functionality without redundancy or filler.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (4 parameters, no output schema, no annotations), the description adequately covers the core extraction functionality and supported page types. While it could benefit from mentioning the return format or error conditions given the lack of output schema, it provides sufficient context for an agent to understand when and how to invoke the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage for all four parameters (url, mode, fallback, timeout), so the schema itself provides comprehensive documentation. The description references the URL and page types which align with schema fields but does not add additional semantic details or usage examples beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool 'auto-detect[s] page type' and 'extract[s] structured data from any URL,' providing specific verbs and resources. It distinguishes itself from siblings like `diffbot.articles.extract` and `diffbot.products.extract` by emphasizing its auto-detection capability across multiple content types (product, article, image, video) rather than requiring a specific type upfront.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

While the description implies usage through the term 'auto-detect,' suggesting it should be used when the page type is unknown, it does not explicitly state when to use this tool versus the specific extraction siblings. There are no explicit when-not conditions or named alternatives provided in the text.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/whiteknightonhorse/APIbase'

If you have feedback or need assistance with the MCP directory API, please join our Discord server