MCP Webscan Server

MIT License
Overview InspectNew Schema Related Servers Reviews Score
mcp-server-webscan
# MCP Webscan Server
[![smithery badge](https://smithery.ai/badge/mcp-server-webscan)](https://smithery.ai/server/mcp-server-webscan)

A Model Context Protocol (MCP) server for web content scanning and analysis. This server provides tools for fetching, analyzing, and extracting information from web pages.

<a href="https://glama.ai/mcp/servers/u0tna3hemh"><img width="380" height="200" src="https://glama.ai/mcp/servers/u0tna3hemh/badge" alt="Webscan Server MCP server" /></a>

## Features

- **Page Fetching**: Convert web pages to Markdown for easy analysis
- **Link Extraction**: Extract and analyze links from web pages
- **Site Crawling**: Recursively crawl websites to discover content
- **Link Checking**: Identify broken links on web pages
- **Pattern Matching**: Find URLs matching specific patterns
- **Sitemap Generation**: Generate XML sitemaps for websites

## Installation

### Installing via Smithery

To install Webscan for Claude Desktop automatically via [Smithery](https://smithery.ai/server/mcp-server-webscan):

```bash
npx -y @smithery/cli install mcp-server-webscan --client claude
```

### Manual Installation
```bash
# Clone the repository
git clone <repository-url>
cd mcp-server-webscan

# Install dependencies
npm install

# Build the project
npm run build
```

## Usage

### Starting the Server

```bash
npm start
```

The server runs on stdio transport, making it compatible with MCP clients like Claude Desktop.

### Available Tools

1. `fetch_page`
   - Fetches a web page and converts it to Markdown
   - Parameters:
     - `url` (required): URL of the page to fetch
     - `selector` (optional): CSS selector to target specific content

2. `extract_links`
   - Extracts all links from a web page with their text
   - Parameters:
     - `url` (required): URL of the page to analyze
     - `baseUrl` (optional): Base URL to filter links

3. `crawl_site`
   - Recursively crawls a website up to a specified depth
   - Parameters:
     - `url` (required): Starting URL to crawl
     - `maxDepth` (optional, default: 2): Maximum crawl depth

4. `check_links`
   - Checks for broken links on a page
   - Parameters:
     - `url` (required): URL to check links for

5. `find_patterns`
   - Finds URLs matching a specific pattern
   - Parameters:
     - `url` (required): URL to search in
     - `pattern` (required): Regex pattern to match URLs against

6. `generate_sitemap`
   - Generates a simple XML sitemap
   - Parameters:
     - `url` (required): Root URL for sitemap
     - `maxUrls` (optional, default: 100): Maximum number of URLs to include

## Example Usage with Claude Desktop

1. Configure the server in your Claude Desktop settings:

```json
{
  "mcpServers": {
    "webscan": {
      "command": "node",
      "args": ["path/to/mcp-server-webscan/dist/index.js"],
      "env": {
        "NODE_ENV": "development"
      }
    }
  }
}
```

2. Use the tools in your conversations:

```
Could you fetch the content from https://example.com and convert it to Markdown?
```

## Development

### Prerequisites

- Node.js >= 18
- npm

### Project Structure

```
mcp-server-webscan/
├── src/
│   └── index.ts    # Main server implementation
├── dist/           # Compiled JavaScript
├── package.json
└── tsconfig.json
```

### Building

```bash
npm run build
```

### Development Mode

```bash
npm run dev
```

## Error Handling

The server implements comprehensive error handling:

- Invalid parameters
- Network errors
- Content parsing errors
- URL validation

All errors are properly formatted according to the MCP specification.

## Contributing

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

## License

MIT License - see the LICENSE file for details