Skip to main content
Glama

cli_update

Update ArchiveBox snapshots by archiving new URLs or re-processing existing ones with configurable filters for timestamp, status, and extractors.

Instructions

Execute archivebox update command.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
resumeNoResume from timestamp
only_newNoUpdate only new snapshots
index_onlyNoIndex without archiving
overwriteNoOverwrite existing files
afterNoFilter snapshots after timestamp
beforeNoFilter snapshots before timestamp
statusNoFilter by statusunarchived
filter_typeNoFilter typesubstring
filter_patternsNoList of filter patterns
extractorsNoComma-separated list of extractors
extra_dataNoAdditional parameters as a dictionary

Implementation Reference

  • MCP tool registration, schema, and handler for 'cli_update'. Creates an ArchiveBox Api client and invokes its cli_update method to execute the update command.
    @mcp.tool( exclude_args=[ "archivebox_url", "username", "password", "token", "api_key", "verify", ], tags={"cli"}, ) def cli_update( resume: Optional[float] = Field(0, description="Resume from timestamp"), only_new: bool = Field(True, description="Update only new snapshots"), index_only: bool = Field(False, description="Index without archiving"), overwrite: bool = Field(False, description="Overwrite existing files"), after: Optional[float] = Field(0, description="Filter snapshots after timestamp"), before: Optional[float] = Field( 999999999999999, description="Filter snapshots before timestamp" ), status: Optional[str] = Field("unarchived", description="Filter by status"), filter_type: Optional[str] = Field("substring", description="Filter type"), filter_patterns: Optional[List[str]] = Field( None, description="List of filter patterns" ), extractors: Optional[str] = Field( "", description="Comma-separated list of extractors" ), extra_data: Optional[Dict] = Field( None, description="Additional parameters as a dictionary" ), archivebox_url: str = Field( default=os.environ.get("ARCHIVEBOX_URL", None), description="The URL of the ArchiveBox instance", ), username: Optional[str] = Field( default=os.environ.get("ARCHIVEBOX_USERNAME", None), description="Username for authentication", ), password: Optional[str] = Field( default=os.environ.get("ARCHIVEBOX_PASSWORD", None), description="Password for authentication", ), token: Optional[str] = Field( default=os.environ.get("ARCHIVEBOX_TOKEN", None), description="Bearer token for authentication", ), api_key: Optional[str] = Field( default=os.environ.get("ARCHIVEBOX_API_KEY", None), description="API key for authentication", ), verify: Optional[bool] = Field( default=to_boolean(os.environ.get("ARCHIVEBOX_VERIFY", "True")), description="Whether to verify SSL certificates", ), ) -> dict: """ Execute archivebox update command. """ client = Api( url=archivebox_url, username=username, password=password, token=token, api_key=api_key, verify=verify, ) response = client.cli_update( resume=resume, only_new=only_new, index_only=index_only, overwrite=overwrite, after=after, before=before, status=status, filter_type=filter_type, filter_patterns=filter_patterns, extractors=extractors, extra_data=extra_data, ) return response.json()
  • Helper method in Api class that performs the actual HTTP POST request to the ArchiveBox /api/v1/cli/update endpoint.
    def cli_update( self, resume: Optional[float] = 0, only_new: bool = True, index_only: bool = False, overwrite: bool = False, after: Optional[float] = 0, before: Optional[float] = 999999999999999, status: Optional[str] = "unarchived", filter_type: Optional[str] = "substring", filter_patterns: Optional[List[str]] = None, extractors: Optional[str] = "", extra_data: Optional[Dict] = None, ) -> requests.Response: """ Execute archivebox update command Args: resume: Resume from timestamp (default: 0). only_new: Update only new snapshots (default: True). index_only: Index without archiving (default: False). overwrite: Overwrite existing files (default: False). after: Filter snapshots after timestamp (default: 0). before: Filter snapshots before timestamp (default: 999999999999999). status: Filter by status (default: "unarchived"). filter_type: Filter type (default: "substring"). filter_patterns: List of filter patterns (default: ["https://example.com"]). extractors: Comma-separated list of extractors (default: ""). extra_data: Additional parameters as a dictionary (optional). Returns: Response: The response object from the POST request. Raises: ParameterError: If the provided parameters are invalid. """ data = { k: v for k, v in locals().items() if k != "self" and v is not None and k != "extra_data" } if filter_patterns is None: data["filter_patterns"] = ["https://example.com"] if extra_data: data.update(extra_data) try: response = self._session.post( url=f"{self.url}/api/v1/cli/update", json=data, headers=self.headers, verify=self.verify, ) except ValidationError as e: raise ParameterError(f"Invalid parameters: {e.errors()}") return response

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Knuckles-Team/archivebox-api'

If you have feedback or need assistance with the MCP directory API, please join our Discord server