Skip to main content
Glama
JMRMEDEV

Enhanced Web Scraper MCP Server

by JMRMEDEV

Enhanced Web Scraper MCP Server

A professional Model Context Protocol (MCP) server for web scraping, React app testing, and React Native web app inspection using Playwright. Fully backward compatible with regular websites and standard React applications.

🚀 Latest Improvements

  • 🔥 Context-Optimized Screenshots - Screenshots return only file paths and analysis text (no base64 data)

  • 📊 Enhanced Page Analysis - Detailed element counting, content structure analysis, and page state inspection

  • 🔍 Comprehensive Comparison Tools - Visual similarity analysis with layout, color, and typography detection

  • 💾 File-Based Output - All screenshots saved to /tmp/ with structured analysis data

  • 🎯 Smart Content Detection - Automatically detects empty states, loading indicators, and content availability

  • Enhanced Error Handling - Comprehensive input validation and error reporting

  • Optimized Performance - Reduced code duplication and improved efficiency

  • Standardized Timeouts - Configurable timeout constants for reliability

  • Professional Code Structure - ES6+ best practices and maintainable architecture

🔄 Backward Compatibility

This enhanced server maintains 100% compatibility with:

  • Regular websites (HTML, CSS, JavaScript)

  • Standard React applications (Create React App, Next.js, etc.)

  • Traditional web scraping workflows

  • Existing CSS selectors and interactions

Plus new enhanced support for:

  • 🆕 React Native web applications

  • 🆕 Expo web projects

  • 🆕 Mobile viewport emulation

  • 🆕 Advanced React component inspection

📋 Tools Overview

Tool

Purpose

Best For

take_screenshot

Context-free screenshot capture

Visual analysis, UI documentation

compare_screenshots

Visual UI comparison with semantic analysis

UI replication, visual regression testing

scrape_page

Universal web scraping

Content extraction, data collection

test_react_app

React app testing with mobile gestures

UI testing, interaction automation

get_page_info

Page analysis with React insights

Performance monitoring, framework detection

extract_content

Clean content extraction

Documentation, article processing

wait_for_element

Smart element waiting

Dynamic content, loading states

inspect_react_app

React component analysis

Component debugging, state inspection

wait_for_react_state

React state management

Hydration, navigation, data loading

execute_in_react_context

JavaScript execution in React context

Advanced debugging, custom scripts

check_expo_dev_server

Expo development server status

Development workflow, debugging

duckduckgo_search

DuckDuckGo search with result extraction

Research, finding relevant URLs for content extraction

Key Features for AI Visual Analysis

🔥 Context-Free Design

  • No Base64 Data: Screenshots return only file paths and analysis text

  • Minimal Context Usage: Dramatically reduced token consumption per screenshot

  • File-Based Storage: All images saved to /tmp/ for external access

  • Structured Analysis: Rich text analysis without heavy image data

🔍 Smart Content Detection

  • Empty State Detection: Automatically identifies when pages have no meaningful content

  • Table Population Verification: Counts table rows to verify data is actually displaying

  • Loading State Recognition: Detects and waits for loading indicators to disappear

  • Content Structure Analysis: Provides detailed breakdown of page elements

📁 File-Based Output

Every visual tool provides:

  1. 📊 Analysis Text: Element counts, text content, structural analysis

  2. 📁 File Path: Saved screenshot location for external viewing

  3. 🎯 Pass/Fail Status: Built-in success criteria for automated workflows

🎯 Migration & Testing Support

Perfect for:

  • UI Migration Verification: Compare source vs target implementations

  • Mock Data Validation: Verify that mock data is actually displaying

  • Visual Regression Testing: Ensure UI changes don't break layouts

  • Component Testing: Validate React components render correctly

📊 Success Metrics Integration

  • Configurable Similarity Thresholds: Built-in pass/fail criteria for visual comparisons

  • Populated Data Requirements: Detects empty states that prevent meaningful comparison

  • Comprehensive Reporting: Detailed analysis for debugging visual differences

Available Tools

1. take_screenshot - Context-Free Screenshot Capture

Captures screenshots with comprehensive analysis while keeping context usage minimal.

{
  url: "https://example.com",
  browser: "chromium",
  device: "iPhone 12", // Optional device emulation
  fullPage: true,
  waitForSPA: true // Auto-detects and waits for React/Vue/Angular apps
}

Returns:

  • 📊 Comprehensive Analysis: Element counts, page structure, content preview

  • 📁 File Path: Screenshot saved to /tmp/screenshot-[timestamp].png

  • 🎯 Content Status: Pass/fail indicators for populated data

Example Output:

📸 Screenshot saved to: /tmp/screenshot-1234567890.png

📄 Page Analysis:
- Title: "My React App"
- Has Content: ✅
- Visible Elements: 247

📊 Content Elements:
- Headings: 3
- Paragraphs: 12
- Buttons: 8
- Tables: 1
- Table Rows: 15  ← Indicates populated data!

📝 Page Content Preview:
Welcome to our service platform. Here you can find contractors...

2. compare_screenshots - Context-Free Visual Comparison

Compares two pages with comprehensive analysis while maintaining minimal context usage.

{
  urlA: "https://source-design.com", // Source/reference
  urlB: "https://your-implementation.com", // Target/implementation
  browser: "chromium",
  threshold: 0.1, // Similarity threshold (0-1)
  analyzeLayout: true, // Detect alignment differences
  analyzeColors: true, // Exact color comparison
  analyzeTypography: true, // Font size/weight analysis
  waitForSPA: true // Smart SPA detection
}

Returns:

  • 📊 Visual Similarity Score: Percentage match with pass/fail status

  • 🏗️ Structural Comparison: Element counts, table rows, content structure

  • 🎨 Layout Analysis: Alignment differences, positioning issues

  • 📁 File Paths: Both screenshots saved to /tmp/ for external viewing

Example Output:

📸 Screenshots saved:
- Source: /tmp/compare-source-1234567890.png
- Target: /tmp/compare-target-1234567891.png

📊 VISUAL SIMILARITY: 87.3% ✅ PASS

🏗️ Structural Comparison:
- Tables: 1 → 1
- Table Rows: 0 → 8  ← Target has populated data!
- Buttons: 12 → 12

📋 Layout Analysis:
- 2 regions with significant layout differences
- Content appears centered in source but left-aligned in target

🎨 Color Analysis:
- Minor color differences detected
- Example: rgb(229, 122, 68) → rgb(225, 118, 64)

3. scrape_page - Universal Web Scraping

Works with any website - regular HTML, React apps, or React Native web.

Regular website example:

{
  url: "https://example.com",
  selector: ".article-title", // Standard CSS selector
  screenshot: true
}

React Native web example:

{
  url: "http://localhost:8081",
  selector: "login-button", // Will try testID, aria-label fallbacks
  mobileViewport: true,
  device: "iPhone 12"
}

4. test_react_app - Universal React Testing

Works with any React application - standard React or React Native web.

Standard React app example:

{
  url: "http://localhost:3000",
  waitForHydration: false, // Optional for regular React apps
  actions: [
    { type: "click", selector: "#submit-button" },
    { type: "fill", selector: "input[name='email']", value: "test@example.com" }
  ]
}

React Native web example:

{
  url: "http://localhost:8081",
  device: "iPhone 12",
  waitForHydration: true, // Recommended for RN web
  actions: [
    { type: "tap", selector: "login-button" },
    { type: "swipe", selector: "scroll-view", value: "up" }
  ]
}

5. get_page_info - Enhanced Page Analysis

Provides comprehensive information for any web page with React-specific insights.

{
  url: "https://any-website.com", // Works with any URL
  includePerformance: true
}

6. extract_content - Clean Content Extraction

Extract clean, readable content from web pages without HTML/CSS clutter. Perfect for documentation, articles, and structured content consumption.

{
  url: "https://docs.example.com/api-guide",
  includeLinks: true,    // Extract and categorize hyperlinks
  format: "markdown"     // Output format: 'markdown' or 'text'
}

Output Example:

# API Documentation

## Authentication
You need to obtain an API key [1] from the developer portal [2].

### Rate Limits
See the rate limiting guide [3] for details.

---
## Links Found:
[1] https://example.com/api-keys (internal)
[2] https://developer.example.com (external) 
[3] https://example.com/docs/rate-limits (internal)

Features:

  • Clean Structure - Preserves headings, paragraphs, lists, code blocks

  • Link Extraction - Categorizes links as internal, external, anchor, or download

  • Content Filtering - Removes navigation, ads, sidebars automatically

  • Multiple Formats - Markdown or plain text output

7. wait_for_element - Smart Element Waiting

Intelligent element waiting with automatic selector strategy fallbacks.

{
  url: "https://example.com",
  selector: ".loading-spinner", // CSS selector with RN fallbacks
  timeout: 10000
}

React Native Web Specific Tools

8. inspect_react_app - React Component Analysis

Deep inspection of React applications (works best with React Native web).

9. wait_for_react_state - React State Management

Wait for React-specific conditions like hydration, navigation, data loading.

10. execute_in_react_context - JavaScript Execution

Execute JavaScript in React context for advanced inspection.

11. check_expo_dev_server - Expo Development Tools

Check Expo/Metro bundler status for development workflows.

Selector Strategy Priority

The server uses intelligent selector strategies:

  1. Primary: Direct CSS selector (e.g., #button, .class, input[name='email'])

  2. Fallback 1: TestID attribute ([data-testid="button"])

  3. Fallback 2: Accessibility label ([aria-label="Button"])

  4. Fallback 3: AccessibilityLabel ([accessibilityLabel="Button"])

This ensures regular CSS selectors work normally while providing React Native web compatibility.

Usage Examples

Context-Free Visual Verification

// Verify data is actually displaying without burning context
{
  url: "http://localhost:3000/data-table",
  fullPage: true,
  waitForSPA: true
}
// Returns: File path + "Table Rows: 8" ← Confirms data is populated!

Context-Free Migration Comparison

// Compare source vs target implementation efficiently
{
  urlA: "http://localhost:3001/page", // Source
  urlB: "http://localhost:3000/page", // Target
  threshold: 0.05, // High similarity requirement
  analyzeLayout: true,
  analyzeColors: true
}
// Returns: File paths + "VISUAL SIMILARITY: 96.2% ✅ PASS"

Regular Website Scraping

// Works exactly like before
{
  url: "https://news.ycombinator.com",
  selector: ".storylink",
  screenshot: false
}

Standard React App Testing

// Standard React app (Create React App, Next.js, etc.)
{
  url: "http://localhost:3000",
  actions: [
    { type: "click", selector: "button.login" },
    { type: "fill", selector: "#username", value: "testuser" }
  ]
}

React Native Web App Testing

// React Native web with enhanced features
{
  url: "http://localhost:8081",
  device: "iPhone 12",
  waitForHydration: true,
  actions: [
    { type: "tap", selector: "login-button" }, // Uses testID
    { type: "swipe", selector: "scroll-view", value: "up" }
  ]
}

Clean Content Extraction

// Extract clean content from documentation
{
  url: "https://docs.react.dev/learn",
  includeLinks: true,
  format: "markdown"
}

Installation

npm install
npx playwright install

Usage with Amazon Q Developer

# Take a context-free screenshot and analyze content
q chat "Take a screenshot of localhost:3000/data-page and analyze the content"

# Compare pages efficiently without context bloat
q chat "Compare the page between localhost:3001 and localhost:3000"

# Mock data verification with minimal context usage
q chat "Verify that the data table is populated at localhost:3000"

# Works with any website
q chat "Scrape the headlines from https://news.ycombinator.com"

# Works with React apps
q chat "Test the login flow on my React app at localhost:3000"

# Enhanced React Native web support
q chat "Inspect the React Native web app at localhost:8081"

# Extract clean content for reading
q chat "Extract the main content from https://docs.react.dev/learn"

Benefits of Context-Free Design

🔥 Dramatically Reduced Context Usage

  • Before: 50-200KB base64 data per screenshot

  • After: Only text analysis (~1-2KB per screenshot)

  • Result: 50-100x reduction in context consumption

📁 File-Based Workflow

  • Screenshots saved to /tmp/ with timestamps

  • External tools can access images directly

  • No context pollution from image data

  • Structured analysis data remains in conversation

🎯 Better AI Workflows

  • More screenshots possible per conversation

  • Focus on analysis rather than data transfer

  • Cleaner conversation history

  • Faster response times

Troubleshooting

Error Handling

  • Input Validation - Server validates required parameters and provides clear error messages

  • Timeout Configuration - Default timeouts are optimized but can be adjusted per request

  • Browser Cleanup - Automatic resource cleanup prevents memory leaks

Regular Websites

  • Use standard CSS selectors (.class, #id, tag[attribute])

  • Set mobileViewport: false (default) for desktop sites

  • Set waitForHydration: false (default) for non-React sites

React Applications

  • Set waitForHydration: true for better reliability

  • Use semantic selectors when possible

  • Check browser console for React errors

React Native Web

  • Use testID attributes in your components

  • Enable mobileViewport or specify device

  • Set waitForHydration: true

  • Use inspect_react_app to see available elements

License

MIT


12. duckduckgo_search - Web Search Integration

Search DuckDuckGo and extract result links with titles and snippets for further content extraction.

Parameters

  • query (required): Search query string

  • maxResults (optional): Maximum results to return (1-10, default: 5)

Example Usage

# Search for React documentation
q chat "Search DuckDuckGo for 'React hooks documentation'"

# Get more results
q chat "Search DuckDuckGo for 'Node.js best practices' with maxResults=8"

# Combine with content extraction
q chat "Search for 'AWS Lambda tutorials' then extract content from the top 2 results"

Use Cases

  • Research: Find relevant URLs for content extraction

  • Documentation Discovery: Locate official docs and tutorials

  • Content Pipeline: Search → Extract → Analyze workflow

  • Development Research: Find code examples and solutions

Integration with Other Tools

Perfect for combining with extract_content:

  1. Use duckduckgo_search to find relevant URLs

  2. Use extract_content on the top results

  3. Get comprehensive information on any topic

Why DuckDuckGo?

  • No Bot Detection: More lenient than Google for automated requests

  • Free & Unlimited: No API keys or rate limits required

  • Privacy-Focused: Doesn't track users or requests

  • Reliable Results: High-quality search results for development topics

Note: This tool extracts public search results only (completely legal). DuckDuckGo is more automation-friendly than Google, providing reliable results without anti-bot measures.

Install Server
A
security – no known vulnerabilities
F
license - not found
A
quality - confirmed to work

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/JMRMEDEV/amazon-q-web-scraper-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server