MCP Webscan Server
by bsmi021
# MCP Webscan Server
[](https://smithery.ai/server/mcp-server-webscan)
A Model Context Protocol (MCP) server for web content scanning and analysis. This server provides tools for fetching, analyzing, and extracting information from web pages.
<a href="https://glama.ai/mcp/servers/u0tna3hemh"><img width="380" height="200" src="https://glama.ai/mcp/servers/u0tna3hemh/badge" alt="Webscan Server MCP server" /></a>
## Features
- **Page Fetching**: Convert web pages to Markdown for easy analysis
- **Link Extraction**: Extract and analyze links from web pages
- **Site Crawling**: Recursively crawl websites to discover content
- **Link Checking**: Identify broken links on web pages
- **Pattern Matching**: Find URLs matching specific patterns
- **Sitemap Generation**: Generate XML sitemaps for websites
## Installation
### Installing via Smithery
To install Webscan for Claude Desktop automatically via [Smithery](https://smithery.ai/server/mcp-server-webscan):
```bash
npx -y @smithery/cli install mcp-server-webscan --client claude
```
### Manual Installation
```bash
# Clone the repository
git clone <repository-url>
cd mcp-server-webscan
# Install dependencies
npm install
# Build the project
npm run build
```
## Usage
### Starting the Server
```bash
npm start
```
The server runs on stdio transport, making it compatible with MCP clients like Claude Desktop.
### Available Tools
1. `fetch_page`
- Fetches a web page and converts it to Markdown
- Parameters:
- `url` (required): URL of the page to fetch
- `selector` (optional): CSS selector to target specific content
2. `extract_links`
- Extracts all links from a web page with their text
- Parameters:
- `url` (required): URL of the page to analyze
- `baseUrl` (optional): Base URL to filter links
3. `crawl_site`
- Recursively crawls a website up to a specified depth
- Parameters:
- `url` (required): Starting URL to crawl
- `maxDepth` (optional, default: 2): Maximum crawl depth
4. `check_links`
- Checks for broken links on a page
- Parameters:
- `url` (required): URL to check links for
5. `find_patterns`
- Finds URLs matching a specific pattern
- Parameters:
- `url` (required): URL to search in
- `pattern` (required): Regex pattern to match URLs against
6. `generate_sitemap`
- Generates a simple XML sitemap
- Parameters:
- `url` (required): Root URL for sitemap
- `maxUrls` (optional, default: 100): Maximum number of URLs to include
## Example Usage with Claude Desktop
1. Configure the server in your Claude Desktop settings:
```json
{
"mcpServers": {
"webscan": {
"command": "node",
"args": ["path/to/mcp-server-webscan/dist/index.js"],
"env": {
"NODE_ENV": "development"
}
}
}
}
```
2. Use the tools in your conversations:
```
Could you fetch the content from https://example.com and convert it to Markdown?
```
## Development
### Prerequisites
- Node.js >= 18
- npm
### Project Structure
```
mcp-server-webscan/
├── src/
│ └── index.ts # Main server implementation
├── dist/ # Compiled JavaScript
├── package.json
└── tsconfig.json
```
### Building
```bash
npm run build
```
### Development Mode
```bash
npm run dev
```
## Error Handling
The server implements comprehensive error handling:
- Invalid parameters
- Network errors
- Content parsing errors
- URL validation
All errors are properly formatted according to the MCP specification.
## Contributing
1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
## License
MIT License - see the LICENSE file for details