Skip to main content
Glama
elad12390

Web Research Assistant

by elad12390

extract_data

Extract structured data from web pages including tables, lists, fields, and JSON-LD. Automatically detect content or specify selectors for targeted extraction.

Instructions

Extract structured data from web pages.

Extracts tables, lists, or specific fields from HTML pages and returns
structured data. Much more efficient than parsing full page text.

Extract Types:
- "table": Extract HTML tables as list of dicts
- "list": Extract lists (ul/ol/dl) as structured list
- "fields": Extract specific elements using CSS selectors
- "json-ld": Extract JSON-LD structured data
- "auto": Automatically detect and extract structured content

Examples:
- extract_data("https://pypi.org/project/fastapi/", reasoning="Get package info")
- extract_data("https://github.com/user/repo/releases", reasoning="Get releases", extract_type="list")
- extract_data(
    "https://example.com/product",
    reasoning="Extract product details",
    extract_type="fields",
    selectors={"price": ".price", "title": "h1.product-name"}
  )

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYes
reasoningYes
extract_typeNoauto
selectorsNo
max_itemsNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Without annotations, description explains extract types and behavior via examples and bullet points. It does not cover authorization, rate limits, or error handling, but the behavioral core is well explained.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is well-structured with clear sections, bullet points for extract types, and multiple examples. No superfluous text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of output schema, description appropriately focuses on input behavior. Covers main use cases but omits potential limitations like page size or dynamic content.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, but description compensates by explaining extract_type enums, selectors usage with examples, and implicitly covers url and reasoning. However, max_items is not explained.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Extract structured data from web pages' and specifies five extract types (table, list, fields, json-ld, auto), making the verb and resource specific. It differentiates from siblings like crawl_url and web_search by focusing on structured extraction from HTML.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description notes efficiency advantage over parsing full page text and provides examples, implying use for structured extraction. However, lacks explicit when-not-to-use instructions or comparison to alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/elad12390/web-research-assistant'

If you have feedback or need assistance with the MCP directory API, please join our Discord server