Grabba MCP Server

README.md•21.1 kB

# Grabba MCP Server This repository contains the Grabba Microservice Connector Protocol (MCP) server, designed to expose Grabba API functionalities as a set of callable tools. Built on `FastMCP`, this server allows AI agents, orchestrators (like LangChain), and other applications to seamlessly interact with the Grabba data extraction and management services. ## Table of Contents 1. [Features](#features) 2. [Getting Started](#getting-started) * [Prerequisites](#prerequisites) * [Installation](#installation) * [Via PyPI (Recommended)](#via-pypi-recommended) * [From Source (Development)](#from-source-development) * [Running the Server](#running-the-server) * [Locally](#locally) * [Docker Container](#docker-container) * [Public Instance](#public-instance) 3. [Configuration](#configuration) * [Environment Variables](#environment-variables) * [Command-Line Arguments](#command-line-arguments) 4. [Available Tools](#available-tools) * [Authentication](#authentication) * [Tool Details](#tool-details) 5. [Connecting to the MCP Server](#connecting-to-the-mcp-server) * [Python Client (LangChain Example)](#python-client-langchain-example) * [Streamable HTTP Transport](#streamable-http-transport) * [Stdio Transport (for Docker-in-Docker or specific use cases)](#stdio-transport-for-docker-in-docker-or-specific-use-cases) 6. [Development Notes](#development-notes) * [Project Structure](#project-structure) * [Running Tests](#running-tests) 7. [Links & Resources](#links--resources) 8. [License](#license) ----- ## Features * **Grabba API Exposure:** Exposes key Grabba API functionalities (data extraction, job management, statistics) as accessible tools. * **Multiple Transports:** Supports `stdio`, `streamable-http`, and `sse` transports, offering flexibility for different deployment and client scenarios. * **Dependency Injection:** Leverages FastAPI's robust dependency injection for secure and efficient `GrabbaService` initialization (e.g., handling API keys). * **Containerized Deployment:** Optimized for Docker for easy packaging and deployment. * **Configurable:** Allows configuration via environment variables and command-line arguments. ----- ## Getting Started ### Prerequisites * Python 3.10+ * Docker (for containerized deployment) * A Grabba API Key (you can get one from the [Grabba website](https://www.grabba.dev/)) ### Installation #### Via PyPI (Recommended) The `grabba-mcp` package is available on PyPI. This is the simplest way to get started. ```bash pip install grabba-mcp ``` #### From Source (Development) If you plan to contribute or modify the server, you'll want to install from source. 1. **Clone the repository:** ```bash git clone https://github.com/grabba-dev/grabba-mcp cd grabba-mcp ``` 2. **Install Poetry:** If you don't have Poetry installed, follow their official guide: ```bash pip install poetry ``` 3. **Install project dependencies:** Navigate to the `apps/mcp` directory where `pyproject.toml` resides, then install: ```bash cd apps/mcp poetry install ``` ### Running the Server #### Locally After installation (either via `pip` or from source), you can run the server. 1. **Create a `.env` file:** In the `apps/mcp` directory (if running from source) or the directory from which you'll execute the `grabba-mcp` command, create a `.env` file and add your Grabba API key: ```dotenv API_KEY="YOUR_API_KEY_HERE" # Optional: configure the server port PORT=8283 # Optional: configure the default transport (overridden by CLI) MCP_SERVER_TRANSPORT="streamable-http" ``` 2. **Execute the server:** * **If installed via `pip`:** ```bash grabba-mcp ``` To specify a transport via command line: ```bash grabba-mcp streamable-http ``` * **If running from source (using Poetry):** ```bash cd apps/mcp poetry run python src/server.py ``` To specify a transport via command line: ```bash poetry run python src/server.py stdio ``` You should see output indicating the server is starting and listening on the specified port (e.g., `http://0.0.0.0:8283`) if using HTTP transports. Note that the `stdio` transport will exit after a single request/response cycle, making it unsuitable for persistent services. #### Docker Container A pre-built Docker image is available on Docker Hub, making deployment straightforward. 1. **Pull the image:** ```bash docker pull itsobaa/grabba-mcp:latest ``` 2. **Run the container:** For a persistent server, you'll typically use the `streamable-http` transport and map ports. ```bash docker run -d \ -p 8283:8283 \ -e API_KEY="YOUR_API_KEY_HERE" \ -e MCP_SERVER_TRANSPORT="streamable-http" \ itsobaa/grabba-mcp:latest ``` You can also use `docker-compose` for more complex setups: ```yaml # docker-compose.yml version: '3.8' services: grabba-mcp: image: itsobaa/grabba-mcp:latest container_name: grabba-mcp environment: API_KEY: ${API_KEY} # Reads from a .env file next to docker-compose.yml MCP_SERVER_TRANSPORT: streamable-http PORT: 8283 ports: - "8283:8283" healthcheck: test: ["CMD-SHELL", "curl -f http://localhost:8283/tools/openapi.json || exit 1"] interval: 10s timeout: 5s retries: 5 ``` With a `docker-compose.yml` file, create a `.env` file next to it (e.g., `API_KEY="YOUR_API_KEY_HERE"`) and run: ```bash docker-compose up -d ``` #### Public Instance The Grabba MCP Server is publicly accessible at: * **URL:** `https://mcp.grabba.dev/` * **Transports:** Supports `sse` and `streamable-http`. * **Authentication:** Requires an `API_KEY` header with your Grabba API key. ----- ## Configuration The server can be configured via environment variables and command-line arguments. ### Environment Variables * **`API_KEY`** (Required): Your Grabba API key. This is critical for authenticating with Grabba services. * **`PORT`** (Optional, default: `8283`): The port on which the MCP server's HTTP transports (`streamable-http`, `sse`) will listen. * **`MCP_SERVER_TRANSPORT`** (Optional, default: `stdio`): The default transport protocol for the MCP server. Can be `stdio`, `streamable-http`, or `sse`. ### Command-Line Arguments The server also accepts a single positional command-line argument which overrides `MCP_SERVER_TRANSPORT`: ```bash grabba-mcp [transport_protocol] # or for source: python src/server.py [transport_protocol] ``` * `[transport_protocol]`: Can be `stdio`, `streamable-http`, or `sse`. * Example: `grabba-mcp streamable-http` ----- ## Available Tools The Grabba MCP Server exposes a suite of tools that wrap the Grabba Python SDK functionalities. ### Authentication For `streamable-http` and `sse` transports, authentication is performed by including an **`API_KEY`** HTTP header with your Grabba API Key. Example: `API_KEY: YOUR_API_KEY_HERE` For `stdio` transport, the **`API_KEY`** environment variable must be set in the environment where the `grabba-mcp` command is executed, as there are no HTTP headers in this communication mode. ### Tool Details #### `extract_data` * **Description:** Schedules a new data extraction job with Grabba. Suitable for web search tasks. * **Input:** `Job` object (Pydantic model) detailing the extraction tasks. * **Output:** `tuple[str, Optional[Dict]]` - A message and the `JobResult` as a dictionary. #### `schedule_existing_job` * **Description:** Schedules an existing Grabba job to run immediately. * **Input:** `job_id` (string) - The ID of the existing job. * **Output:** `tuple[str, Optional[Dict]]` - A message and the `JobResult` as a dictionary. #### `fetch_all_jobs` * **Description:** Fetches all Grabba jobs for the current user. * **Input:** None. * **Output:** `tuple[str, Optional[List[Job]]]` - A message and a list of `Job` objects. #### `fetch_specific_job` * **Description:** Fetches details of a specific Grabba job by its ID. * **Input:** `job_id` (string) - The ID of the job. * **Output:** `tuple[str, Optional[Job]]` - A message and the `Job` object. #### `delete_job` * **Description:** Deletes a specific Grabba job. * **Input:** `job_id` (string) - The ID of the job to delete. * **Output:** `tuple[str, None]` - A success message. #### `fetch_job_result` * **Description:** Fetches results of a completed Grabba job by its result ID. * **Input:** `job_result_id` (string) - The ID of the job result. * **Output:** `tuple[str, Optional[Dict]]` - A message and the job result data as a dictionary. #### `delete_job_result` * **Description:** Deletes results of a completed Grabba job. * **Input:** `job_result_id` (string) - The ID of the job result to delete. * **Output:** `tuple[str, None]` - A success message. #### `fetch_stats_data` * **Description:** Fetches usage statistics and current user token balance for Grabba. * **Input:** None. * **Output:** `tuple[str, Optional[JobStats]]` - A message and the `JobStats` object. #### `estimate_job_cost` * **Description:** Estimates the cost of a Grabba job before creation or scheduling. * **Input:** `Job` object (Pydantic model) detailing the extraction tasks. * **Output:** `tuple[str, Optional[Dict]]` - A message and the estimated cost details as a dictionary. #### `create_job` * **Description:** Creates a new data extraction job in Grabba without immediately scheduling it for execution. * **Input:** `Job` object (Pydantic model) detailing the extraction tasks. * **Output:** `tuple[str, Optional[Job]]` - A message and the created `Job` object. #### `fetch_available_regions` * **Description:** Fetches a list of all available puppet (web agent) regions that can be used for scheduling web data extractions. * **Input:** None. * **Output:** `tuple[str, Optional[List[PuppetRegion]]]` - A message and a list of `PuppetRegion` objects. ----- ## Connecting to the MCP Server The `MultiServerMCPClient` from `mcp.client` is designed to connect to FastMCP servers. ### Python Client (LangChain Example) This example assumes you have the `mcp-client` package installed (often as part of a larger LangChain/Agent setup), along with `grabba` and `pydantic`. ```python import asyncio import os from typing import List, Dict, Optional from langchain_core.tools import BaseTool, Tool from mcp.models.mcp_server_config import McpServerConfig, McpServer from mcp.client.transports.streamable_http import StreamableHttpConnection from mcp.client.transports.stdio import StdioConnection from mcp.client.multi_server_client import MultiServerMCPClient from grabba import Job, JobStats, PuppetRegion # Import necessary Grabba Pydantic models from dotenv import load_dotenv # For loading API key from .env async def connect_and_use_mcp_tools(mcp_server_configs: List[McpServerConfig], api_key: Optional[str] = None) -> List[Tool]: """ Connects to the MCP server(s), discovers its tools, and wraps them as LangChain Tools. Handles API key injection for HTTP connections. """ try: mcp_client_config = {} for config in mcp_server_configs: # Pydantic V2 model validation mcp_server_model = McpServer.model_validate(config.mcp_server.model_dump()) connection_headers = {} if api_key: # Use standard header name for API keys connection_headers["API_KEY"] = api_key if mcp_server_model.transport == "streamable_http": server_params: StreamableHttpConnection = { "transport": "streamable_http", "url": str(mcp_server_model.url), "env": config.env_variables or {}, # For other env variables, if any "headers": connection_headers # Pass headers for HTTP transports } elif mcp_server_model.transport == "stdio": server_params: StdioConnection = { "transport": "stdio", "command": mcp_server_model.command, "args": mcp_server_model.args, "env": config.env_variables # For stdio, env maps to subprocess env vars } else: raise ValueError(f"Unsupported transport: {mcp_server_model.transport}") print(f"Client connecting with params: {server_params}") mcp_client_config[mcp_server_model.name] = server_params mcp_client = MultiServerMCPClient(mcp_client_config) tools: List[BaseTool] = await mcp_client.get_tools() print(f"Successfully loaded {len(tools)} tools.") return tools except Exception as e: print(f"Error connecting to MCP server or loading tools: {e}") return [] async def main(): load_dotenv() # Load API key from a client-side .env file API_KEY = os.getenv("API_KEY", "YOUR_API_KEY_HERE_IF_NOT_ENV") # --- Configuration for Streamable HTTP Transport (Local or Public Instance) --- # For local: url="http://localhost:8283" # For public: url="https://mcp.grabba.dev/" http_mcp_config = McpServerConfig( mcp_server=McpServer( name="grabba-agent-http", transport="streamable_http", url="http://localhost:8283" # Or "https://mcp.grabba.dev/" for public ) ) print("\n--- Connecting via Streamable HTTP ---") http_tools = await connect_and_use_mcp_tools( mcp_server_configs=[http_mcp_config], api_key=API_KEY ) if http_tools: print("\nAvailable HTTP Tools:") for tool in http_tools: print(f"- {tool.name}: {tool.description.split('.')[0]}.") # Example: Using the extract_data tool (adjust as per your Job Pydantic model) extract_tool = next((t for t in http_tools if t.name == "extract_data"), None) if extract_tool: print("\n--- Testing extract_data tool via HTTP ---") sample_job = Job( url="https://example.com/some-page", type="markdown", # or "pdf", "html" etc. parser="text-content", strategy="auto" # ... other required fields for Job ) try: result_msg, result_data = await extract_tool.ainvoke({"extraction_data": sample_job}) print(f"Extraction Result (HTTP): {result_msg}") if result_data: print(f"Extraction Data (HTTP): {result_data.get('extracted_text', 'No text extracted')[:100]}...") # Print first 100 chars except Exception as e: print(f"Error calling extract_data via HTTP: {e}") else: print("extract_data tool not found in HTTP tools.") # Example: Using fetch_all_jobs tool fetch_jobs_tool = next((t for t in http_tools if t.name == "fetch_all_jobs"), None) if fetch_jobs_tool: print("\n--- Testing fetch_all_jobs tool via HTTP ---") try: result_msg, jobs_list = await fetch_jobs_tool.ainvoke({}) print(f"Fetch Jobs Result (HTTP): {result_msg}") if jobs_list: print(f"Fetched {len(jobs_list)} jobs.") for job in jobs_list[:2]: # Print first 2 jobs print(f" - Job ID: {job.job_id}, URL: {job.url}") except Exception as e: print(f"Error calling fetch_all_jobs via HTTP: {e}") # Example: Using fetch_stats_data tool fetch_stats_tool = next((t for t in http_tools if t.name == "fetch_stats_data"), None) if fetch_stats_tool: print("\n--- Testing fetch_stats_data tool via HTTP ---") try: result_msg, stats_data = await fetch_stats_tool.ainvoke({}) print(f"Fetch Stats Result (HTTP): {result_msg}") if stats_data: print(f"Token Balance (HTTP): {stats_data.token_balance}") print(f"Jobs Run (HTTP): {stats_data.jobs_run_count}") except Exception as e: print(f"Error calling fetch_stats_data via HTTP: {e}") # --- Configuration for Stdio Transport (e.g., to a Docker container running the server) --- # This assumes you have the 'itsobaa/grabba-mcp:latest' Docker image available. # The client launches a temporary Docker container for each tool call. stdio_mcp_config = McpServerConfig( mcp_server=McpServer( name="grabba-agent-stdio", transport="stdio", command="docker", args=[ "run", "-i", # Keep STDIN open for interactive communication "--rm", # Remove container after exit "itsobaa/grabba-mcp:latest", # The Docker Hub image for Grabba MCP server "grabba-mcp", "stdio" # Command to run the server in stdio mode inside container ], env_variables={"API_KEY": API_KEY} # Pass API key as env var for stdio ) ) print("\n--- Connecting via Stdio (to Docker container as a subprocess) ---") stdio_tools = await connect_and_use_mcp_tools( mcp_server_configs=[stdio_mcp_config], api_key=API_KEY # Client might still pass for internal consistency, though env_variables is primary for stdio ) if stdio_tools: print("\nAvailable Stdio Tools:") for tool in stdio_tools: print(f"- {tool.name}: {tool.description.split('.')[0]}.") # Example: Using the fetch_available_regions tool via Stdio fetch_regions_tool = next((t for t in stdio_tools if t.name == "fetch_available_regions"), None) if fetch_regions_tool: print("\n--- Testing fetch_available_regions tool via Stdio ---") try: result_msg, regions_list = await fetch_regions_tool.ainvoke({}) print(f"Fetch Regions Result (Stdio): {result_msg}") if regions_list: print(f"Fetched {len(regions_list)} regions.") for region in regions_list[:3]: # Print first 3 regions print(f" - {region.display_name} ({region.code})") except Exception as e: print(f"Error calling fetch_available_regions via Stdio: {e}") else: print("fetch_available_regions tool not found in Stdio tools.") if __name__ == "__main__": asyncio.run(main()) ``` ----- ## Development Notes ### Project Structure ``` your_project_root/ ├── src/ │ └── server.py # Main FastMCP server application ├── .env # Environment variables for local development ├── pyproject.toml # Poetry project configuration └── poetry.lock # Poetry dependency lock file ├── Dockerfile # Docker build instructions for the server ├── docker-compose.yml # Docker Compose configuration for local development/deployment ├── .dockerignore # Files to ignore during Docker build ├── .env # Example .env for docker-compose (for API_KEY) ├── README.md # This documentation file ├── pyproject.toml # Root pyproject.toml (if using monorepo structure) ├── poetry.lock # Root poetry.lock (if using monorepo structure) ├── src/ # Source code (often for the root project if it's a monorepo) ├── tests/ # Project tests └── ... (other project files like dist, docs, tox.ini, project.json etc.) ``` ### Running Tests To run tests (as configured by your `pyproject.toml`): ```bash poetry run pytest ``` ----- ## Links & Resources * **Grabba Website:** [https://www.grabba.dev/](https://www.grabba.dev/) * **Grabba MCP Server Public Instance:** [https://mcp.grabba.dev/](https://mcp.grabba.dev/) * **GitHub Repository:** [https://github.com/grabba-dev/grabba-mcp](https://github.com/grabba-dev/grabba-mcp) * **Docker Hub Image:** [https://hub.docker.com/r/itsobaa/grabba-mcp](https://www.google.com/search?q=https://hub.docker.com/r/itsobaa/grabba-mcp) * **PyPI Package:** [https://pypi.org/project/grabba-mcp/](https://www.google.com/search?q=https://pypi.org/project/grabba-mcp/) ----- ## License This project is licensed under the Proprietary License. Please see the `LICENSE` file in the repository root for full details. -----

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/grabba-dev/grabba-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server