Prysm MCP Server

scrapeBalanced

Extract web content efficiently with a balanced approach, including images and paginated data, while controlling parameters like scroll attempts, timeouts, and image size for precise results.

Instructions

Balanced web scraping approach with good coverage and reasonable speed

Input Schema

TableJSON Schema

Name	Required	Description
`downloadImages`	No	Whether to download images locally
`maxImages`	No	Maximum number of images to extract
`maxScrolls`	No	Maximum number of scroll attempts (default: 10)
`minImageSize`	No	Minimum width/height for images in pixels
`output`	No	Output directory for downloaded images
`pages`	No	Number of pages to scrape (if pagination is present)
`scrapeImages`	No	Whether to include images in the scrape result
`scrollDelay`	No	Delay between scrolls in ms (default: 2000)
`timeout`	No	Maximum time in ms for the scrape operation (default: 30000)
`url`	Yes	URL of the webpage to scrape

Implementation Reference

src/tools/scrapeBalanced.ts:60-131 (handler)

The handler function executes balanced web scraping using the prysm library. It configures scraping options, implements timeout handling via Promise.race, limits content and images to prevent overwhelming the client, and handles errors by returning a structured error response.

handler: async (params: ScraperBaseParams & { timeout?: number }): Promise<ScraperResponse> => {
  const { url, maxScrolls = 10, scrollDelay = 2000, pages = 1, scrapeImages = false, 
          downloadImages = false, maxImages = 50, minImageSize = 100, timeout = 30000, 
          output, imageOutput } = params;
  
  try {
    // Create options object for the scraper
    const options = {
      maxScrolls,
      scrollDelay,
      pages,
      focused: false,
      standard: true, // Use standard mode for balanced extraction
      deep: false,
      scrapeImages: scrapeImages || downloadImages,
      downloadImages,
      maxImages,
      minImageSize,
      timeout, // Add timeout option
      output: output || config.serverOptions.defaultOutputDir, // Use configured default if not provided
      imageOutput: imageOutput || config.serverOptions.defaultImageOutputDir // Use configured default if not provided
    };
    
    // Create a promise with timeout
    const scrapePromise = prysm.scrape(url, options);
    
    // Add timeout
    const timeoutPromise = new Promise<never>((_, reject) => {
      setTimeout(() => reject(new Error(`Scraping timed out after ${timeout}ms`)), timeout);
    });
    
    // Race the scraping against the timeout
    const result = await Promise.race([scrapePromise, timeoutPromise]) as ScraperResponse;
    
    // Limit content size to prevent overwhelming the MCP client
    if (result.content && result.content.length > 0) {
      // Limit the number of content sections
      if (result.content.length > 20) {
        result.content = result.content.slice(0, 20);
        result.content.push("(Content truncated due to size limitations)");
      }
      
      // Limit the size of each content section
      result.content = result.content.map(section => {
        if (section.length > 5000) {
          return section.substring(0, 5000) + "... (truncated)";
        }
        return section;
      });
    }
    
    // Limit the number of images to return
    if (result.images && result.images.length > 20) {
      result.images = result.images.slice(0, 20);
    }
    
    return result;
  } catch (error) {
    console.error(`Error scraping ${url}:`, error);
    // Return a proper error format for MCP
    return {
      title: "Scraping Error",
      content: [`Failed to scrape ${url}: ${error instanceof Error ? error.message : String(error)}`],
      images: [],
      metadata: { error: true },
      url: url,
      structureType: "error",
      paginationType: "none",
      extractionMethod: "none"
    };
  }
}

src/tools/scrapeBalanced.ts:10-59 (schema)

JSON Schema defining the input parameters for the scrapeBalanced tool, including required 'url' and optional parameters for scrolling, pagination, images, timeouts, and outputs.

parameters: {
  type: 'object',
  properties: {
    url: {
      type: 'string',
      description: 'URL of the webpage to scrape'
    },
    maxScrolls: {
      type: 'number',
      description: 'Maximum number of scroll attempts (default: 10)'
    },
    scrollDelay: {
      type: 'number',
      description: 'Delay between scrolls in ms (default: 2000)'
    },
    pages: {
      type: 'number',
      description: 'Number of pages to scrape (if pagination is present)'
    },
    scrapeImages: {
      type: 'boolean',
      description: 'Whether to include images in the scrape result'
    },
    downloadImages: {
      type: 'boolean',
      description: 'Whether to download images locally'
    },
    maxImages: {
      type: 'number',
      description: 'Maximum number of images to extract'
    },
    minImageSize: {
      type: 'number',
      description: 'Minimum width/height for images in pixels'
    },
    timeout: {
      type: 'number',
      description: 'Maximum time in ms for the scrape operation (default: 30000)'
    },
    output: {
      type: 'string',
      description: 'Output directory for general results'
    },
    imageOutput: {
      type: 'string',
      description: 'Output directory for downloaded images'
    }
  },
  required: ['url']
},

src/config.ts:65-71 (registration)
Registration of the scrapeBalanced tool in the main server configuration array.
```
tools: [
  scrapeFocused,
  scrapeBalanced, 
  scrapeDeep,
  // analyzeUrl,
  formatResult
],
```

src/tools/index.ts:8-14 (registration)

Export of toolDefinitions array including scrapeBalanced for use in the MCP server.

export const toolDefinitions: ToolDefinition[] = [
  scrapeFocused,
  scrapeBalanced,
  scrapeDeep,
  // analyzeUrl,
  formatResult,
];

src/tools/scrapeBalanced.ts:7-8 (registration)
Definition and export of the scrapeBalanced ToolDefinition object.
```
export const scrapeBalanced: ToolDefinition = {
  name: 'scrapeBalanced',
```

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure but provides minimal information. It mentions 'good coverage and reasonable speed' which hints at performance characteristics, but doesn't disclose important behavioral traits like whether it respects robots.txt, what authentication might be needed, rate limiting considerations, error handling, or what the output format looks like. For a scraping tool with 10 parameters, this is inadequate behavioral transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately concise - a single sentence that gets straight to the point without unnecessary words. However, while it's structurally efficient, it's under-specified rather than truly concise. Every word earns its place, but there aren't enough words to be truly helpful. The front-loading is good but the content is insufficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex scraping tool with 10 parameters, no annotations, and no output schema, the description is incomplete. It doesn't explain what 'balanced' means operationally, what gets returned (structured data? HTML? images?), error conditions, or performance guarantees. The context signals indicate significant complexity that the description fails to address adequately.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds no parameter-specific information beyond what's already in the schema (which has 100% coverage). While the schema thoroughly documents all 10 parameters with clear descriptions, the tool description doesn't provide additional context about how parameters interact (e.g., relationship between downloadImages and scrapeImages) or usage patterns. With high schema coverage, the baseline is 3, but the description doesn't enhance parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose2/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Balanced web scraping approach with good coverage and reasonable speed' is vague and tautological - it restates the tool name 'scrapeBalanced' without specifying what it actually does. It doesn't clearly state what resource it operates on (web pages) or what specific scraping approach it implements. Compared to siblings like 'scrapeDeep' and 'scrapeFocused', it fails to distinguish itself meaningfully.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. With sibling tools like 'scrapeDeep' and 'scrapeFocused' available, there's no indication of what 'balanced' means in comparison - whether it's a middle ground between depth and speed, or some other trade-off. No explicit when/when-not instructions or alternative recommendations are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Related Tools

scrapeDeepC
@pinkpixel-dev/prysm-mcp-server
tavily-extract
@jackedelic/tavily-mcp
tavily-extractC
@tsmndev/tavily-mcp-sse
tavily_extract_process
@spences10/mcp-omnisearch
fetch
@kwp-lab/mcp-fetch
one_scrape
@yokingma/one-search-mcp

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/pinkpixel-dev/prysm-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server