ScrapeGraph MCP Server

Official

MIT License

Overview InspectNew Endpoints Schema Related Servers Reviews Score

mcp_protocol.md•17.9 kB

# Model Context Protocol (MCP) Integration **Last Updated:** October 2025 ## Table of Contents - [What is MCP?](#what-is-mcp) - [MCP in ScrapeGraph](#mcp-in-scrapegraph) - [Communication Protocol](#communication-protocol) - [Tool Schema](#tool-schema) - [Error Handling](#error-handling) - [Client Integration](#client-integration) --- ## What is MCP? The **Model Context Protocol** (MCP) is an open standard that defines how AI assistants (like Claude, Cursor, etc.) can interact with external tools and services in a consistent, structured way. **Official Documentation:** [https://modelcontextprotocol.io/](https://modelcontextprotocol.io/) ### Key Concepts **1. Server** - Exposes tools that AI assistants can use - Implements MCP protocol - Runs as a separate process - **Example:** This ScrapeGraph MCP server **2. Client** - AI assistant that uses the tools - Sends tool invocation requests - Receives tool results - **Examples:** Claude Desktop, Cursor, other AI assistants **3. Transport** - Communication layer between client and server - **Types:** stdio (standard input/output), HTTP, SSE - **This server uses:** stdio **4. Tools** - Functions exposed by the server - Have typed parameters and return values - Automatically discovered by AI assistants - **Examples:** `markdownify()`, `smartscraper()` **5. Resources** - Data exposed by the server (optional) - Not used in this implementation **6. Prompts** - Pre-defined prompts exposed by the server (optional) - Not used in this implementation --- ## MCP in ScrapeGraph ### Architecture Overview ``` ┌─────────────────────────────────┐ │ AI Assistant (Client) │ │ - Claude Desktop │ │ - Cursor │ │ - Other MCP-compatible AIs │ └────────────┬────────────────────┘ │ MCP Protocol (JSON-RPC over stdio) │ - Tool discovery │ - Tool invocation │ - Result streaming ▼ ┌─────────────────────────────────┐ │ FastMCP Server │ │ - Tool registry │ │ - Parameter validation │ │ - Serialization/ │ │ deserialization │ └────────────┬────────────────────┘ │ Python function calls ▼ ┌─────────────────────────────────┐ │ ScapeGraphClient │ │ - HTTP client (httpx) │ │ - API authentication │ │ - Error handling │ └────────────┬────────────────────┘ │ HTTPS API requests ▼ ┌─────────────────────────────────┐ │ ScrapeGraphAI API │ │ https://api.scrapegraphai.com │ └─────────────────────────────────┘ ``` ### FastMCP Framework This server uses **FastMCP**, a lightweight Python framework for building MCP servers: ```python from mcp.server.fastmcp import FastMCP # Create MCP server mcp = FastMCP("ScapeGraph API MCP Server") # Define tools with decorators @mcp.tool() def markdownify(website_url: str) -> Dict[str, Any]: """Convert a webpage to markdown.""" # Implementation... return {"result": "..."} # Run the server mcp.run(transport="stdio") ``` **FastMCP Features:** - Automatic tool discovery from decorated functions - Type hint → MCP schema generation - Request/response serialization - Error handling - Stdio transport out-of-the-box --- ## Communication Protocol ### Transport: stdio **Standard Input/Output (stdio)** is used for client-server communication: - **stdin (→ Server):** Client sends JSON-RPC requests - **stdout (← Client):** Server sends JSON-RPC responses - **stderr (← Client):** Server logs (not part of MCP protocol) **Example Flow:** ``` Client → Server (stdin): {"jsonrpc": "2.0", "method": "tools/call", "params": {"name": "markdownify", "arguments": {"website_url": "https://example.com"}}, "id": 1} Server → Client (stdout): {"jsonrpc": "2.0", "result": {"result": "# Example\n\nMarkdown content..."}, "id": 1} ``` ### JSON-RPC 2.0 MCP uses JSON-RPC 2.0 for message structure: **Request:** ```json { "jsonrpc": "2.0", "method": "tools/call", "params": { "name": "smartscraper", "arguments": { "user_prompt": "Extract product names", "website_url": "https://example.com" } }, "id": 1 } ``` **Response:** ```json { "jsonrpc": "2.0", "result": { "result": { "products": ["Product A", "Product B"] } }, "id": 1 } ``` **Error Response:** ```json { "jsonrpc": "2.0", "error": { "code": -32603, "message": "Internal error", "data": "Error 401: Unauthorized" }, "id": 1 } ``` ### MCP Methods **Tool Discovery:** ```json {"jsonrpc": "2.0", "method": "tools/list", "id": 1} Response: { "jsonrpc": "2.0", "result": { "tools": [ { "name": "markdownify", "description": "Convert a webpage into clean, formatted markdown.", "inputSchema": { "type": "object", "properties": { "website_url": {"type": "string"} }, "required": ["website_url"] } }, // ... other tools ] }, "id": 1 } ``` **Tool Invocation:** ```json {"jsonrpc": "2.0", "method": "tools/call", "params": {...}, "id": 1} ``` **Initialize:** ```json {"jsonrpc": "2.0", "method": "initialize", "params": {...}, "id": 1} ``` --- ## Tool Schema Each tool exposed by the server has a schema that defines its parameters and return type. ### Example: `markdownify` Tool **Python Definition:** ```python @mcp.tool() def markdownify(website_url: str) -> Dict[str, Any]: """ Convert a webpage into clean, formatted markdown. Args: website_url: URL of the webpage to convert Returns: Dictionary containing the markdown result """ # Implementation... ``` **Generated MCP Schema:** ```json { "name": "markdownify", "description": "Convert a webpage into clean, formatted markdown.", "inputSchema": { "type": "object", "properties": { "website_url": { "type": "string", "description": "URL of the webpage to convert" } }, "required": ["website_url"] } } ``` **Type Mapping:** - Python `str` → JSON Schema `"type": "string"` - Python `int` → JSON Schema `"type": "integer"` - Python `bool` → JSON Schema `"type": "boolean"` - Python `Dict[str, Any]` → JSON Schema `"type": "object"` - Python `Optional[str]` → JSON Schema `"type": ["string", "null"]` ### Example: `smartscraper` Tool (with optional parameters) **Python Definition:** ```python @mcp.tool() def smartscraper( user_prompt: str, website_url: str, number_of_scrolls: int = None, markdown_only: bool = None ) -> Dict[str, Any]: """Extract structured data from a webpage using AI.""" # Implementation... ``` **Generated MCP Schema:** ```json { "name": "smartscraper", "description": "Extract structured data from a webpage using AI.", "inputSchema": { "type": "object", "properties": { "user_prompt": {"type": "string"}, "website_url": {"type": "string"}, "number_of_scrolls": {"type": ["integer", "null"]}, "markdown_only": {"type": ["boolean", "null"]} }, "required": ["user_prompt", "website_url"] } } ``` --- ## Error Handling ### Error Strategy The server implements **graceful error handling** to prevent crashes and provide meaningful feedback to AI assistants. **Approach:** 1. **No exceptions to client** - All errors caught in tool functions 2. **Error dictionaries** - Return `{"error": "message"}` instead of raising 3. **Detailed messages** - Include HTTP status codes and API error messages ### Error Handling Pattern ```python @mcp.tool() def tool_name(param: str) -> Dict[str, Any]: """Tool description.""" if scrapegraph_client is None: return {"error": "ScapeGraph client not initialized. Please provide an API key."} try: return scrapegraph_client.method(param) except Exception as e: return {"error": str(e)} ``` **Why this approach?** - Prevents server crashes - Allows AI to handle errors gracefully - Enables retry logic - Provides context for user troubleshooting ### Error Types **1. Client Not Initialized:** ```json { "error": "ScapeGraph client not initialized. Please provide an API key." } ``` **Cause:** Missing `SGAI_API_KEY` environment variable **2. API Errors:** ```json { "error": "Error 401: Unauthorized" } ``` **Cause:** Invalid API key ```json { "error": "Error 402: Payment Required - Insufficient credits" } ``` **Cause:** Not enough credits ```json { "error": "Error 404: Not Found" } ``` **Cause:** Invalid URL or API endpoint **3. Network Errors:** ```json { "error": "httpx.ConnectTimeout: Connection timed out" } ``` **Cause:** Network issues or slow website **4. Validation Errors (SmartCrawler):** ```json { "error": "prompt is required when extraction_mode is 'ai'" } ``` **Cause:** Missing required parameter for AI extraction mode ### AI Assistant Error Handling When a tool returns an error, AI assistants typically: 1. **Parse the error message** 2. **Determine if retryable** (network error) or not (invalid API key) 3. **Inform the user** with actionable guidance 4. **Suggest fixes** (e.g., "Please add credits to your account") **Example AI Response:** ``` User: "Convert https://example.com to markdown" Tool result: {"error": "Error 402: Payment Required - Insufficient credits"} AI: "I wasn't able to convert the webpage because your ScrapeGraphAI account has insufficient credits. Please add credits at https://dashboard.scrapegraphai.com and try again." ``` --- ## Client Integration ### Claude Desktop Integration **Configuration File Location:** - **macOS:** `~/Library/Application Support/Claude/claude_desktop_config.json` - **Windows:** `%APPDATA%/Claude/claude_desktop_config.json` **Configuration:** ```json { "mcpServers": { "@ScrapeGraphAI-scrapegraph-mcp": { "command": "npx", "args": [ "-y", "@smithery/cli@latest", "run", "@ScrapeGraphAI/scrapegraph-mcp", "--config", "{\"scrapegraphApiKey\":\"YOUR-SGAI-API-KEY\"}" ] } } } ``` **How It Works:** 1. Claude Desktop reads the config file on startup 2. Starts the MCP server as a child process using the specified command 3. Establishes stdio communication 4. Discovers available tools via `tools/list` 5. User asks a question that requires web scraping 6. Claude calls the appropriate tool via `tools/call` 7. Server executes the tool and returns results 8. Claude incorporates results into its response **Example Interaction:** ``` User: "What are the main features of ScrapeGraphAI?" Claude (internal): 1. Determines that markdownify tool could help 2. Calls: markdownify("https://scrapegraphai.com") 3. Receives markdown content 4. Analyzes content 5. Responds to user Claude (to user): "Based on the ScrapeGraphAI website, the main features are: - AI-powered web scraping - Multiple scraping modes (SmartScraper, SearchScraper, etc.) - ... " ``` ### Cursor Integration **Setup:** 1. Open Cursor settings 2. Navigate to "MCP Servers" section 3. Click "Add MCP Server" 4. Select or configure ScrapeGraphAI MCP 5. Enter API key **Usage:** - Cursor's AI chat can automatically invoke MCP tools - Similar interaction pattern to Claude Desktop - Tool calls visible in chat interface (optional) ### Custom Client Integration To integrate with a custom MCP client: **1. Install the MCP SDK:** ```bash pip install mcp ``` **2. Create a client:** ```python import asyncio from mcp.client import ClientSession from mcp.client.stdio import stdio_client async def main(): # Start the server process server_params = StdioServerParameters( command="scrapegraph-mcp", env={"SGAI_API_KEY": "your-api-key"} ) async with stdio_client(server_params) as (read, write): async with ClientSession(read, write) as session: # Initialize await session.initialize() # List tools tools = await session.list_tools() print(f"Available tools: {[t.name for t in tools]}") # Call a tool result = await session.call_tool( "markdownify", arguments={"website_url": "https://example.com"} ) print(f"Result: {result}") asyncio.run(main()) ``` **3. Handle tool results:** ```python if "error" in result: print(f"Tool error: {result['error']}") else: print(f"Tool success: {result['result']}") ``` --- ## Advanced Topics ### Tool Versioning Currently, the server does not implement tool versioning. All tools are v1 implicitly. **Future Consideration:** - Add version to tool names: `smartscraper_v2()` - Maintain backward compatibility with deprecated tools - Use MCP metadata for version info ### Streaming Results MCP supports streaming results for long-running operations. This could be useful for SmartCrawler: **Current Approach (polling):** 1. Call `smartcrawler_initiate()` → get `request_id` 2. Repeatedly call `smartcrawler_fetch_results(request_id)` until complete **Potential Streaming Approach:** 1. Call `smartcrawler_initiate()` → server keeps connection open 2. Server streams progress updates: `{"status": "processing", "pages": 10}` 3. Server sends final result: `{"status": "completed", "results": [...]}` **Not currently implemented** due to FastMCP limitations. ### Authentication **Current Approach:** - API key passed via environment variable or config parameter - Single API key for entire server instance - No per-tool authentication **Future Consideration:** - Support multiple API keys (user-specific) - OAuth integration - JWT tokens ### Rate Limiting **Current State:** - No rate limiting in the MCP server - Rate limiting handled by ScrapeGraphAI API - Server is a simple pass-through **Future Consideration:** - Client-side rate limiting to prevent API quota exhaustion - Configurable request throttling - Request queuing --- ## Debugging MCP ### MCP Inspector The MCP Inspector is a tool for testing MCP servers: ```bash npx @modelcontextprotocol/inspector scrapegraph-mcp ``` **Features:** - Interactive tool discovery - Manual tool invocation - Request/response inspection - Error debugging ### Server Logs **FastMCP Logging:** ```python # Add logging to server.py import logging logging.basicConfig( level=logging.DEBUG, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' ) logger = logging.getLogger(__name__) @mcp.tool() def markdownify(website_url: str) -> Dict[str, Any]: logger.info(f"markdownify called with URL: {website_url}") # ... ``` **View Logs:** - Logs printed to stderr (not part of MCP protocol) - Visible in Claude Desktop logs: `~/Library/Logs/Claude/` (macOS) - Use MCP Inspector to see real-time logs ### Common Debugging Issues **Issue: Tools not appearing in Claude** - **Check:** Is the server running? Look in Claude logs - **Check:** Is the config file correct? Verify JSON syntax - **Check:** Does `tools/list` return the tools? Use MCP Inspector **Issue: Tool calls failing** - **Check:** Is the API key valid? Test with curl - **Check:** Are parameters correct? Review tool schema - **Check:** Network connectivity? Check firewall/proxy **Issue: Server crashes** - **Check:** Python version (≥3.10)? - **Check:** Dependencies installed? `pip list` - **Check:** Error in logs? Check stderr output --- ## Best Practices ### Tool Design **1. Clear Descriptions** - Write docstrings that explain what the tool does - Include parameter descriptions - Specify expected input/output formats **2. Type Hints** - Always use type hints for parameters and return values - FastMCP generates schemas from type hints - Helps AI understand tool contracts **3. Error Messages** - Provide actionable error messages - Include HTTP status codes - Suggest fixes when possible **4. Optional Parameters** - Use `= None` for optional parameters - Document default behavior - Don't require unnecessary inputs ### Server Design **1. Statelessness** - Each tool invocation should be independent - Don't rely on shared state between calls - Use API key from config, not global variable **2. Idempotency** - Same inputs should produce same outputs (when possible) - Helps with retries and debugging - Cache results when appropriate **3. Performance** - Keep tool invocations fast (<60s) - Use async operations for I/O (future improvement) - Consider timeouts for slow operations **4. Security** - Never log API keys - Validate all inputs - Use HTTPS for API calls - Rotate API keys regularly --- ## References - **MCP Specification:** [https://modelcontextprotocol.io/](https://modelcontextprotocol.io/) - **MCP Python SDK:** [https://github.com/modelcontextprotocol/python-sdk](https://github.com/modelcontextprotocol/python-sdk) - **FastMCP:** [https://github.com/jlowin/fastmcp](https://github.com/jlowin/fastmcp) - **JSON-RPC 2.0:** [https://www.jsonrpc.org/specification](https://www.jsonrpc.org/specification) - **ScrapeGraphAI API:** [https://api.scrapegraphai.com/docs](https://api.scrapegraphai.com/docs) --- **Made with ❤️ by [ScrapeGraphAI](https://scrapegraphai.com) Team**

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ScrapeGraphAI/scrapegraph-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server