Which integrations are available for this server?

Allows extraction of content from web pages using CSS selectors for targeted content scraping. Converts web page content to well-formatted Markdown, preserving headings, links, images, and lists. Extracts structured content from web pages and returns it in XML format with separate sections for headings, paragraphs, links, images, and lists.

How do I use Open Crawler MCP Server?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Open Crawler MCP Server crawl https://example.com as markdown" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

Open Crawler MCP Server

Official

by elchika-inc

Overview Schema Related Servers Score Discussions

TypeScript

Remote

Open Crawler MCP Server

license npm version npm downloads GitHub stars

A Model Context Protocol (MCP) server for web crawling and content extraction from web pages with multiple output formats.

Features

Multiple Output Formats: Extract content as text, markdown, structured XML, or JSON
Smart Content Extraction: CSS selector support for targeted content extraction
Robots.txt Compliance: Automatic robots.txt checking and compliance
Rate Limiting: Built-in rate limiting (1 second minimum between requests)
Size Protection: Maximum page size limit (10MB) to prevent memory issues
Structured Content: Extract headings, paragraphs, links, images, and lists separately
Error Handling: Comprehensive error codes for different failure scenarios

MCP Client Configuration

Add this server to your MCP client configuration:

{
  "mcpServers": {
    "open-crawler": {
      "command": "npx",
      "args": ["@elchika-inc/open-crawler-mcp-server"]
    }
  }
}

Available Tools

crawl_page

Extracts content from a web page in multiple formats with automatic robots.txt compliance checking.

Parameters:

url (required): Target URL to crawl
selector (optional): CSS selector for specific content extraction
format (optional): Output format - text, markdown, xml, or json (default: text)
text_only (optional): Legacy parameter for text-only extraction (deprecated, use format instead)

Output Formats:

text: Clean, plain text content with whitespace normalized
markdown: Well-formatted Markdown with headings, links, images, and lists preserved
xml: Structured XML with separate sections for headings, paragraphs, links, images, and lists
json: Structured JSON object containing categorized content elements

Examples:

Basic text extraction:

{
  "name": "crawl_page",
  "arguments": {
    "url": "https://example.com",
    "format": "text"
  }
}

Markdown extraction with CSS selector:

{
  "name": "crawl_page",
  "arguments": {
    "url": "https://example.com",
    "selector": "article",
    "format": "markdown"
  }
}

Structured JSON extraction:

{
  "name": "crawl_page",
  "arguments": {
    "url": "https://example.com",
    "format": "json"
  }
}

check_robots

Validates if a URL is allowed to be crawled according to the site's robots.txt file.

Parameters:

url (required): URL to check for crawling permission

Example:

{
  "name": "check_robots",
  "arguments": {
    "url": "https://example.com/page"
  }
}

Error Handling

Common error scenarios:

Network connection issues
Invalid HTML or missing content
Robots.txt restrictions
Request timeouts or rate limits
Content size too large (>10MB)

License

MIT

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Resources

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/elchika-inc/open-crawler-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server