Skip to main content
Glama

Extract Article Text

diffbot.articles.extract
Read-onlyIdempotent

Extract article text, author, date, tags, sentiment, and images from any blog or news URL. Follows multi-page articles to concatenate complete content.

Instructions

Extract article text, author, date, tags, sentiment, and images from any blog or news URL with multi-page support (Diffbot)

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesArticle or blog post URL to extract text, author, and metadata
pagingNoFollow multi-page articles and concatenate text (default true)
maxTagsNoMaximum number of topic tags to return (1-50, default 10)
timeoutNoRequest timeout in milliseconds (5000-30000, default 15000)

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultNoTool response payload. Shape varies per tool — consult the tool description and inputSchema. May be an object, array, string, or number depending on the upstream provider response.
errorNoPresent only when the call failed. Includes error code, message, request_id, and any provider-specific extras.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint. The description adds 'multi-page support (Diffbot)', which is a behavioral trait beyond annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

One sentence, perfectly sized, with all key information front-loaded. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists, the description needn't detail return values. It covers extracted fields and multi-page support. For a read-only tool with rich schema and annotations, this is adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds 'multi-page support', loosely linking to the paging parameter, but does not explain timeout or maxTags. Minimal added value over schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states 'Extract article text, author, date, tags, sentiment, and images from any blog or news URL', providing a specific verb (Extract) and resource (article content). It clearly differentiates from sibling tools like diffbot.products.extract (for products) and diffbot.pages.analyze (general pages) by focusing on articles.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for blog or news URLs with multi-page support. It does not explicitly state when not to use or mention alternatives, but the sibling names provide context. Slightly lacking in exclusion guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/whiteknightonhorse/APIbase'

If you have feedback or need assistance with the MCP directory API, please join our Discord server