web-scraping-mcp-server
Server Details
Generic URL crawl + HTML extraction — fallback for sites without dedicated MCPs.
- Status
- Healthy
- Last Tested
- Transport
- Streamable HTTP
- URL
Glama MCP Gateway
Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.
Full call logging
Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.
Tool access control
Enable or disable individual tools per connector, so you decide what your agents can and cannot do.
Managed credentials
Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.
Usage analytics
See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.
Tool Definition Quality
Average 3.9/5 across 3 of 3 tools scored.
Each tool targets a distinct source: crawling any website, searching Google, and looking up Wikipedia articles. No overlapping purposes.
All three tools follow a clear verb_noun pattern: crawl_website, google_search, wikipedia_lookup. Consistent and predictable.
Three tools is minimal but reasonable for a focused web scraping server. Each tool serves a distinct purpose, though additional tools (e.g., single-page scrape) could be expected.
Notable gaps: missing a general single-page fetch or extract tool. While crawl_website handles multi-page crawling, there is no way to scrape a single page without crawling links.
Available Tools
3 toolscrawl_websiteBRead-onlyInspect
Crawl a website and extract its content as structured data.
Args: url: Website URL to crawl (e.g. 'https://example.com') max_pages: Max pages to crawl (default 5)
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | ||
| max_pages | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already provide readOnlyHint=true and openWorldHint=true. The description adds that it extracts content as structured data, but does not disclose potential behavioral traits like rate limits, handling of dynamic content, or what constitutes 'structured data.' It adds minimal value beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise with a clear title and structured Args block. It has no wasted words, though it could potentially be even shorter by integrating the parameter info into the main sentence. Still, it is well-organized.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a simple tool with annotations, the description covers purpose and parameters. However, it lacks details on return format or behavior (e.g., what 'structured data' means). Despite no output schema, the description could better set expectations. It is adequate but not thorough.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The description includes an Args section explaining both parameters (url: URL to crawl, max_pages: max pages with default 5) with examples. This adds meaning beyond the input schema, which only defines types and defaults. Given schema coverage is 0%, the description compensates well.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action: 'Crawl a website and extract its content as structured data.' It specifies the verb (crawl) and resource (website), and the outcome. This distinctively separates it from sibling tools like google_search and wikipedia_lookup.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus siblings or when not to use it. There is no mention of prerequisites, ethical considerations, or alternatives. The agent has to infer from context.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
google_searchARead-onlyInspect
Search Google and return organic results with titles, URLs, snippets.
Args: query: Search query (e.g. 'best python libraries 2026') max_results: Max results (default 10)
| Name | Required | Description | Default |
|---|---|---|---|
| query | Yes | ||
| max_results | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true and openWorldHint=true, covering safety and non-determinism. Description adds that results are organic and include specific fields, but lacks details on rate limits or pagination. Adequate but not rich.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Extremely concise: two sentences for purpose and two bullet-style param descriptions. Front-loaded and no wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a simple 2-parameter read-only search tool with annotations, the description is complete: explains return format, params, and safety. No gaps given the tool's complexity.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, but the description's docstring adds meaning: explains query purpose and max_results default. This compensates for the bare schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states verb ('Search Google'), resource ('Google'), and return details (organic results with titles, URLs, snippets). It distinguishes from siblings like crawl_website and wikipedia_lookup.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Description implies usage for general web search but provides no explicit when-to-use, when-not-to-use, or alternative guidance. Siblings are listed but not contrasted.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
wikipedia_lookupARead-onlyInspect
Look up a Wikipedia article and return its content.
Args: topic: Topic to look up (e.g. 'Artificial intelligence')
| Name | Required | Description | Default |
|---|---|---|---|
| topic | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true and openWorldHint=true, covering safety and openness. The description adds that it returns content but lacks details on behavior like rate limits, response format, or any side effects, so it adds limited value beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise: two sentences plus an Args line. It front-loads the purpose in the first sentence, and every part is necessary and sufficient. No wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Despite no output schema, the description adequately notes that it 'return[s] its content'. For a simple read-only tool with one parameter, this is mostly complete. However, it could briefly mention the format (plain text, sections) or potential limitations (article size).
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 0% schema description coverage, the description fully compensates by explaining the single 'topic' parameter with a clear example ('e.g. 'Artificial intelligence''), adding meaning beyond the schema's type-only definition.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses a specific verb 'Look up' and resource 'Wikipedia article', clearly indicating the tool's function. It effectively distinguishes from siblings like 'google_search' (general search) and 'crawl_website' (generic crawling) by specifying its Wikipedia focus.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage for retrieving Wikipedia content, providing clear context. However, it does not explicitly state when to use this tool versus alternatives like google_search or crawl_website, missing an opportunity for clearer guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:
{
"$schema": "https://glama.ai/mcp/schemas/connector.json",
"maintainers": [{ "email": "your-email@example.com" }]
}The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.
Control your server's listing on Glama, including description and metadata
Access analytics and receive server usage reports
Get monitoring and health status updates for your server
Feature your server to boost visibility and reach more users
For users:
Full audit trail – every tool call is logged with inputs and outputs for compliance and debugging
Granular tool control – enable or disable individual tools per connector to limit what your AI agents can do
Centralized credential management – store and rotate API keys and OAuth tokens in one place
Change alerts – get notified when a connector changes its schema, adds or removes tools, or updates tool definitions, so nothing breaks silently
For server owners:
Proven adoption – public usage metrics on your listing show real-world traction and build trust with prospective users
Tool-level analytics – see which tools are being used most, helping you prioritize development and documentation
Direct user feedback – users can report issues and suggest improvements through the listing, giving you a channel you would not have otherwise
The connector status is unhealthy when Glama is unable to successfully connect to the server. This can happen for several reasons:
The server is experiencing an outage
The URL of the server is wrong
Credentials required to access the server are missing or invalid
If you are the owner of this MCP connector and would like to make modifications to the listing, including providing test credentials for accessing the server, please contact support@glama.ai.
Discussions
No comments yet. Be the first to start the discussion!