Skip to main content
Glama
lalit9168

Website Scraper MCP Server

by lalit9168

clean_content

Remove noise from raw HTML to extract readable plain text. Retain only main article content, headings, paragraphs, tables, and lists.

Instructions

Clean raw HTML by removing scripts, styles, navigation bars, footers, cookie banners, ads, and other noise. Returns readable plain text keeping only main article content, headings, paragraphs, tables, and lists.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlNoOptional source URL (helps with relative link resolution).
htmlYesRaw HTML string to clean.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the full burden of behavioral disclosure. It explains the transformation (removing noise, keeping text) and return type (readable plain text). However, it does not mention edge cases like malformed HTML, performance, idempotency, or whether the input is mutated.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loaded with the action 'Clean raw HTML'. Every sentence serves a purpose: specifying what is removed, what is kept, and the return type. No fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description adequately explains the return value (plain text with structure). It covers both parameters and the core behavior. Missing details on error handling or potential performance implications, but overall sufficient for the tool's simplicity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description does not add new meaning beyond the schema: 'url' is for relative link resolution (as schema notes), 'html' is the input. No parameter-specific elaboration in the description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Clean raw HTML' by removing specific elements (scripts, styles, navigation, etc.) and keeping meaningful content (headings, paragraphs, tables, lists). It distinguishes itself from siblings like scrape_website or chunk_content by focusing on cleaning/transforming HTML to plain text.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use after scraping HTML, but it does not explicitly state when to use versus alternatives like chunk_content (which splits text) or when not to use (e.g., if HTML is already clean). No exclusions or specific context cues are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/lalit9168/web-scrapping'

If you have feedback or need assistance with the MCP directory API, please join our Discord server