alterlab-mcp-server

alterlab_extract

Extract structured data like prices, ratings, and reviews from HTML or markdown content using pre-defined profiles or natural language prompts. Returns JSON.

Instructions

Extract product data, scrape prices, get structured data from any page content, or pull specific fields like names, emails, and ratings from HTML. Runs AlterLab's extraction pipeline on raw HTML, text, or markdown you already have — does NOT scrape a URL. For scraping + extraction in one step, use alterlab_scrape with extraction_schema instead. Profiles: 'product' (price, title, reviews), 'article' (title, author, body), 'job_posting', 'faq', 'recipe', 'event', 'ecommerce_homepage', 'directory_listing'. Returns JSON data. Use extraction_prompt for natural language extraction (LLM-powered). Use cache='only' to retrieve a previously cached result without calling the LLM.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`cache`	No	Cache control for LLM extraction results. 'auto': return cached result if available (default). 'skip': bypass cache lookup, always call LLM (result is still stored). 'only': return cached result or 404 if not cached — never calls the LLM.	auto
`content`	Yes	Raw content to extract from — HTML, text, or markdown. Bring your own pre-fetched content; this endpoint does NOT scrape a URL.
`formats`	No	Output formats for content transformation. 'json' is best for structured extraction. 'content' returns filtered/cleaned content. 'raw' returns the unprocessed response body.
`evidence`	No	Include field provenance/evidence for extracted fields (which part of the content each field came from)
`cache_ttl`	No	TTL for caching this extraction result, in seconds. Defaults to server setting (3600s). Max 86400s (24 hours).
`source_url`	No	Original URL of the content (for context only — not fetched). Helps the extractor understand the content's domain.
`content_type`	No	Type of the provided content	html
`extraction_model`	No	Per-request LLM model override in provider-specific format (e.g. 'gpt-4o', 'claude-opus-4-5-20251101', 'llama3-70b-8192'). Overrides the model saved in your BYOK key settings for this request only.
`extraction_prompt`	No	Natural language instructions for LLM extraction (e.g., 'Extract all product prices and ratings'). Charged at LLM extraction rate when provided.
`extraction_schema`	No	Custom JSON Schema for extraction. Fields are mapped from content. Overrides extraction_profile when provided
`extraction_profile`	No	Pre-defined extraction profile. 'product' extracts price/title/reviews, 'article' extracts title/author/body, etc. 'auto' detects the page type. Mutually exclusive with extraction_template.
`extraction_provider`	No	LLM provider to use for extraction. Selects the matching BYOK key registered at /dashboard/settings/llm-keys. When omitted, the most recently used registered key is used.
`extraction_template`	No	Shorthand alias for extraction_profile — selects the same pre-built schema template. Mutually exclusive with extraction_profile.

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full disclosure burden. It discloses that the tool operates on raw HTML/text/markdown, does not fetch URLs, returns JSON, explains cache behavior (including 'only' mode never calling LLM), and mentions charging for extraction_prompt. However, it lacks details on authentication requirements, rate limits, or idempotency, which would elevate it to 5.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is detailed and informative, with the core purpose front-loaded. Every sentence adds value, but it could be slightly more structured (e.g., separating profile list from other details). Minor redundancy in explaining extraction_template as an alias for extraction_profile. Still, it is concise given the complexity of 13 parameters.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has 13 parameters, nested objects, and no output schema. The description covers the main functionality, clearly differentiates from siblings, explains key parameters, and gives usage hints. However, it could elaborate more on the output structure (especially since no output schema) and potential edge cases like error handling. Despite this, it is fairly complete for the complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds significant value beyond the schema. It explains the mutual exclusivity of extraction_profile and extraction_template, notes that extraction_prompt incurs extra charges, elaborates on cache options, and provides context for profiles. This helps the agent understand parameter semantics beyond simple definitions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Extract product data, scrape prices, get structured data from any page content, or pull specific fields like names, emails, and ratings from HTML.' It distinguishes itself from sibling alterlab_scrape by explicitly stating it does NOT scrape a URL. The list of profiles and mention of extraction_prompt add further specificity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool vs alternatives: 'For scraping + extraction in one step, use alterlab_scrape with extraction_schema instead.' It also advises using extraction_prompt for natural language extraction and cache='only' for retrieving cached results. This ensures the agent knows the appropriate contexts.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/RapierCraft/alterlab-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server