cli_add
Add URLs to an ArchiveBox web archive with options for crawling depth, tagging, updating snapshots, and customizing extraction methods.
Instructions
Execute archivebox add command.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| urls | Yes | List of URLs to archive | |
| tag | No | Comma-separated tags | |
| depth | No | Crawl depth | |
| update | No | Update existing snapshots | |
| update_all | No | Update all snapshots | |
| index_only | No | Index without archiving | |
| overwrite | No | Overwrite existing files | |
| init | No | Initialize collection if needed | |
| extractors | No | Comma-separated list of extractors to use | |
| parser | No | Parser type | auto |
| extra_data | No | Additional parameters as a dictionary |
Implementation Reference
- archivebox_api/archivebox_mcp.py:546-625 (handler)The cli_add MCP tool handler, registered with @mcp.tool decorator. Defines input schema using Pydantic Field annotations and executes the tool logic by creating an ArchiveBox API client and calling its cli_add method to add URLs for archiving.@mcp.tool( exclude_args=[ "archivebox_url", "username", "password", "token", "api_key", "verify", ], tags={"cli"}, ) def cli_add( urls: List[str] = Field( description="List of URLs to archive", ), tag: str = Field("", description="Comma-separated tags"), depth: int = Field(0, description="Crawl depth"), update: bool = Field(False, description="Update existing snapshots"), update_all: bool = Field(False, description="Update all snapshots"), index_only: bool = Field(False, description="Index without archiving"), overwrite: bool = Field(False, description="Overwrite existing files"), init: bool = Field(False, description="Initialize collection if needed"), extractors: str = Field( "", description="Comma-separated list of extractors to use" ), parser: str = Field("auto", description="Parser type"), extra_data: Optional[Dict] = Field( None, description="Additional parameters as a dictionary" ), archivebox_url: str = Field( default=os.environ.get("ARCHIVEBOX_URL", None), description="The URL of the ArchiveBox instance", ), username: Optional[str] = Field( default=os.environ.get("ARCHIVEBOX_USERNAME", None), description="Username for authentication", ), password: Optional[str] = Field( default=os.environ.get("ARCHIVEBOX_PASSWORD", None), description="Password for authentication", ), token: Optional[str] = Field( default=os.environ.get("ARCHIVEBOX_TOKEN", None), description="Bearer token for authentication", ), api_key: Optional[str] = Field( default=os.environ.get("ARCHIVEBOX_API_KEY", None), description="API key for authentication", ), verify: Optional[bool] = Field( default=to_boolean(os.environ.get("ARCHIVEBOX_VERIFY", "True")), description="Whether to verify SSL certificates", ), ) -> dict: """ Execute archivebox add command. """ client = Api( url=archivebox_url, username=username, password=password, token=token, api_key=api_key, verify=verify, ) response = client.cli_add( urls=urls, tag=tag, depth=depth, update=update, update_all=update_all, index_only=index_only, overwrite=overwrite, init=init, extractors=extractors, parser=parser, extra_data=extra_data, ) return response.json()
- Helper method in the ArchiveBox API client class that performs the actual HTTP POST request to the ArchiveBox server's /api/v1/cli/add endpoint to add URLs.def cli_add( self, urls: List[str], tag: str = "", depth: int = 0, update: bool = False, update_all: bool = False, index_only: bool = False, overwrite: bool = False, init: bool = False, extractors: str = "", parser: str = "auto", extra_data: Optional[Dict] = None, ) -> requests.Response: """ Execute archivebox add command Args: urls: List of URLs to archive. tag: Comma-separated tags (default: ""). depth: Crawl depth (default: 0). update: Update existing snapshots (default: False). update_all: Update all snapshots (default: False). index_only: Index without archiving (default: False). overwrite: Overwrite existing files (default: False). init: Initialize collection if needed (default: False). extractors: Comma-separated list of extractors to use (default: ""). parser: Parser type (default: "auto"). extra_data: Additional parameters as a dictionary (optional). Returns: Response: The response object from the POST request. Raises: ParameterError: If the provided parameters are invalid. """ data = { "urls": urls, "tag": tag, "depth": depth, "update": update, "update_all": update_all, "index_only": index_only, "overwrite": overwrite, "init": init, "extractors": extractors, "parser": parser, } if extra_data: data.update(extra_data) try: response = self._session.post( url=f"{self.url}/api/v1/cli/add", json=data, headers=self.headers, verify=self.verify, ) except ValidationError as e: raise ParameterError(f"Invalid parameters: {e.errors()}") return response
- MCP prompt helper that generates instructional text referencing the cli_add tool for adding URLs.@mcp.prompt def cli_add_prompt( urls: List[str], tag: str = "", depth: int = 0, ) -> str: """ Generates a prompt for executing archivebox add command. """ return f"Add new URLs to ArchiveBox: {urls}, with tags: '{tag}', depth: {depth}. Use the cli_add tool."