Skip to main content
Glama

web-speed-agent

PyPI version Python License: GPL v3

Local browser automation + Web Speed API integration for authenticated web extraction.

Point an AI agent at any website — including ones that require login — and get back clean, structured data. Credentials stay on your machine. Only extracted HTML goes to the server.

pip install web-speed-agent
playwright install chromium

Want to use this with Claude, Gemini, or other AI clients?

Check out the MCP Server Installation Guide — it's the easiest way to let AI agents log in and extract data through natural language.


How it works

Your machine                           Web Speed server
─────────────────────────────────      ──────────────────────────
Playwright browser (local)
  ↓ navigates, logs in, clicks
  ↓ gets page HTML
  ↓ (no passwords sent)
agent.extract(html)         ────────→  Advanced extraction engine
                            ←────────  Structured JSON

Credentials never leave your machine. The server only sees HTML.


Quickstart

import asyncio
from web_speed_agent import Agent

async def main():
    agent = Agent(api_key="wsp_...")       # or set WEBSPEED_API_KEY env var

    # Public pages — no browser needed
    result = await agent.map("https://techcrunch.com/some-article/")
    print(result["article"]["sections"])

    # Authenticated pages — browser runs locally
    agent.store_credential("mysite", "me@example.com", "mypassword")

    async with agent.browser(session_name="mysite") as browser:
        page = await browser.new_page()
        await page.goto("https://mysite.com/login")

        username, password = agent.get_credential("mysite")
        await page.fill('[name="email"]', username)
        await page.fill('[name="password"]', password)
        await page.click('button[type="submit"]')
        await page.wait_for_load_state("networkidle")

        # Now on a logged-in page — extract it
        html = await page.content()
        result = await agent.extract(html, page_type="listing")
        print(result["listing"]["items"])

asyncio.run(main())

Get an API key at getwebspeed.io.


Installation

Requirements: Python 3.10+, a Web Speed API key

pip install web-speed-agent
playwright install chromium
export WEBSPEED_API_KEY="wsp_..."

Core concepts

Agent

The main class. Manages credentials, browser sessions, and API calls.

from web_speed_agent import Agent

# API key from argument
agent = Agent(api_key="wsp_...")

# API key from environment variable (recommended)
# export WEBSPEED_API_KEY="wsp_..."
agent = Agent()

# Use as async context manager (auto-closes HTTP client)
async with Agent() as agent:
    ...

Extracting public pages

No browser needed for pages that don't require login:

# Fetch + extract in one call
result = await agent.map("https://example.com/article")

# With JavaScript rendering (for heavy SPAs)
result = await agent.map("https://example.com/spa", js=True)

Extracting authenticated pages

Use a local browser session. The browser runs on your machine:

async with agent.browser(session_name="mysite") as browser:
    page = await browser.new_page()
    await page.goto("https://mysite.com/dashboard")
    html = await page.content()

result = await agent.extract(html)

The session_name persists cookies to ~/.webspeed/sessions/<name>/ so subsequent runs skip the login step.


Credential management

Credentials are stored in your system keychain (macOS Keychain, Windows Credential Manager, Linux secret-tool). They are never sent to Web Speed servers.

# Store once
agent.store_credential("mysite", "me@example.com", "mypassword")

# Retrieve anywhere
username, password = agent.get_credential("mysite")

# Remove
agent.delete_credential("mysite")

Extraction output

The server returns page-type-aware structured data:

# Article
result = await agent.extract(html, page_type="article")
# result["page_type"]    → "article"
# result["title"]        → "Article Title"
# result["author"]       → "Jane Smith"
# result["published_date"] → "2026-05-06"
# result["article"]["sections"] → [{"heading": "...", "paragraphs": [...]}]
# result["article"]["links"]    → [{"text": "...", "url": "..."}]

# Product
result = await agent.extract(html, page_type="product")
# result["product"]["name"]         → "Wireless Headphones"
# result["product"]["price"]        → "$99.99"
# result["product"]["availability"] → "In Stock"
# result["product"]["rating"]       → "4.5"
# result["product"]["specs"]        → {"Battery": "30h", ...}

# Listing (search results, category pages)
result = await agent.extract(html, page_type="listing")
# result["listing"]["items"] → [{"title": "...", "url": "...", "price": "..."}]

# Auto-detect (default)
result = await agent.extract(html)
# result["page_type"] → "article" | "product" | "listing" | "other"

All results include engine: "advanced" — 60–85% more token-efficient than raw HTML.


Examples

Price monitor

import asyncio
from web_speed_agent import Agent

async def check_price(url: str, site_name: str) -> str:
    async with Agent() as agent:
        agent.store_credential(site_name, "me@example.com", "password", overwrite=True)

        async with agent.browser(session_name=site_name) as browser:
            page = await browser.new_page()

            # Login
            await page.goto(f"https://{site_name}.com/login")
            user, pwd = agent.get_credential(site_name)
            await page.fill('[name="email"]', user)
            await page.fill('[name="password"]', pwd)
            await page.click('button[type="submit"]')
            await page.wait_for_load_state("networkidle")

            # Check product
            await page.goto(url)
            await page.wait_for_load_state("networkidle")
            html = await page.content()

        result = await agent.extract(html, page_type="product")
        return result.get("product", {}).get("price", "unknown")

price = asyncio.run(check_price("https://example.com/product/123", "example"))
print(f"Current price: {price}")

Read a private dashboard

import asyncio
from web_speed_agent import Agent

async def get_dashboard_data():
    async with Agent() as agent:
        async with agent.browser(session_name="analytics") as browser:
            page = await browser.new_page()

            # Login (first run only — session persists after)
            creds = agent.get_credential("analytics")
            if not creds:
                agent.store_credential("analytics", "me@company.com", "password")
                creds = agent.get_credential("analytics")

            await page.goto("https://analytics.company.com/login")
            await page.fill('[name="email"]', creds[0])
            await page.fill('[name="password"]', creds[1])
            await page.click('button[type="submit"]')
            await page.wait_for_load_state("networkidle")

            # Navigate to dashboard
            await page.goto("https://analytics.company.com/dashboard")
            await page.wait_for_selector(".metrics-table", timeout=10000)
            html = await page.content()

        result = await agent.extract(html)
        return result

asyncio.run(get_dashboard_data())

Multi-page scrape while logged in

import asyncio
from web_speed_agent import Agent

async def scrape_inbox():
    async with Agent() as agent:
        async with agent.browser(session_name="webmail") as browser:
            page = await browser.new_page()

            # Login
            await page.goto("https://mail.example.com/login")
            user, pwd = agent.get_credential("webmail")
            await page.fill('[name="username"]', user)
            await page.fill('[name="password"]', pwd)
            await page.click('[type="submit"]')
            await page.wait_for_load_state("networkidle")

            # Scrape multiple pages
            emails = []
            for page_num in range(1, 4):
                await page.goto(f"https://mail.example.com/inbox?page={page_num}")
                await page.wait_for_load_state("networkidle")
                html = await page.content()
                result = await agent.extract(html, page_type="listing")
                emails.extend(result.get("listing", {}).get("items", []))

        return emails

asyncio.run(scrape_inbox())

AI agent integration (MCP)

The included MCP server lets Claude Desktop, Gemini CLI, and any MCP-compatible agent use the SDK directly. The agent can log in, navigate, click, and extract — all through natural language.

Start the MCP server:

WEBSPEED_API_KEY="wsp_..." python3 agent_mcp_server.py

Add to Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "web-speed-agent": {
      "command": "python3",
      "args": ["/path/to/agent_mcp_server.py"],
      "env": {
        "WEBSPEED_API_KEY": "wsp_..."
      }
    }
  }
}

Add to Gemini CLI (~/.gemini/settings.json):

{
  "mcpServers": {
    "web-speed-agent": {
      "command": "python3.11",
      "args": ["/path/to/agent_mcp_server.py"],
      "env": {
        "WEBSPEED_API_KEY": "wsp_...",
        "PYTHONPATH": "/path/to/web-speed-agent"
      }
    }
  }
}

Then tell the agent:

"Store my credentials for united — username me@example.com, password mypassword"

"Log into united.com and find me the cheapest flight from SFO to JFK next Friday"

Available MCP tools:

Tool

Description

store_credential

Save login to system keychain

login

Open browser + sign in

navigate

Go to a URL in the active session

extract_page

Get structured data from current page

click

Click a button or link

fill_field

Type into a form field

submit_form

Submit a form

close_browser

End the browser session

account_info

Check API credit balance


API reference

Agent

Agent(
    api_key: str | None = None,
    server_url: str | None = None,
    config_dir: str = "~/.webspeed",
    headless: bool = True,
)

Parameter

Description

api_key

Web Speed API key. Falls back to WEBSPEED_API_KEY env var.

server_url

Override API server URL. Default: https://api.getwebspeed.io.

config_dir

Directory for config, sessions, and logs. Default: ~/.webspeed.

headless

Run browser headlessly. Default: True.


agent.browser()

agent.browser(
    session_name: str | None = None,
    headless: bool | None = None,
    proxy: str | None = None,
) -> ManagedBrowser

Returns an async context manager. Inside the block, call .new_page() to get a Playwright Page.

Parameter

Description

session_name

Persist cookies to ~/.webspeed/sessions/<name>/. None = no persistence.

headless

Override instance headless for this session.

proxy

Proxy URL e.g. "socks5://localhost:1080".

Session names must be alphanumeric + hyphens/underscores, max 64 chars.


agent.extract()

await agent.extract(
    html: str,
    page_type: str = "auto",
) -> dict

Sends HTML to the Web Speed API. Costs 1 credit.

Parameter

Description

html

Raw HTML string (e.g. from page.content()).

page_type

"article", "product", "listing", or "auto".


agent.map()

await agent.map(
    url: str,
    js: bool = False,
) -> dict

Fetches and extracts a public URL via the server. No local browser needed. Costs 1 credit.

Parameter

Description

url

Page URL. Must be http:// or https://.

js

Render JavaScript before extracting.


agent.account()

await agent.account() -> dict

Returns: credits, tier, status, lifetime (total/hits/misses).


agent.store_credential()

agent.store_credential(
    site: str,
    username: str,
    password: str,
    overwrite: bool = False,
) -> None

Saves to system keychain. Raises CredentialError if credential exists and overwrite=False.


agent.get_credential()

agent.get_credential(site: str) -> tuple[str, str] | None

Returns (username, password) or None if not found.


agent.delete_credential()

agent.delete_credential(site: str) -> None

Removes credential from keychain.


Exceptions

from web_speed_agent import (
    WebSpeedError,          # Base exception
    AuthenticationError,    # Invalid/missing API key
    InsufficientCreditsError, # No credits remaining
    APIError,               # API returned 4xx/5xx
    RateLimitError,         # 429 Too Many Requests
    CredentialError,        # Keychain error
    BrowserError,           # Playwright error
    NetworkError,           # Timeout or DNS failure
    PlaywrightNotInstalledError, # Run: playwright install chromium
)
from web_speed_agent import Agent, InsufficientCreditsError, NetworkError

try:
    result = await agent.extract(html)
except InsufficientCreditsError:
    print("Out of credits — top up at getwebspeed.io")
except NetworkError as e:
    print(f"Connection failed: {e}")

Configuration

Environment variables

Variable

Description

WEBSPEED_API_KEY

API key (recommended over config file)

WEBSPEED_SERVER_URL

Override server URL (must be https://)

Config file

~/.webspeed/config.yaml — created automatically on first run. Permissions set to 0o600 (owner-only).

api:
  server_url: https://api.getwebspeed.io
  timeout: 30

browser:
  headless: true

Session files

Persisted browser sessions are stored in ~/.webspeed/sessions/<name>/storage.json.

  • Permissions: 0o600 (owner-only)

  • Contains: cookies, localStorage, sessionStorage

  • Safe to delete: agent will re-authenticate on next run


Security

What leaves your machine

When you call agent.extract(html), the page HTML is sent to the Web Speed API for processing. Everything else stays local.

Data

Where it goes

Login credentials

Never leave your machine (system keychain only)

Browser cookies / session

Never leave your machine (local Playwright)

Page HTML

Sent over HTTPS to Web Speed API for extraction

Extracted JSON

Returned to you

HTML scrubbing (on by default)

Before any HTML is transmitted, the SDK automatically scrubs it locally:

  • Inline <script> and <style> blocks removed

  • Hidden form fields with auth-related names (csrf, token, nonce, session, etc.) have their values blanked

  • Sensitive <meta> content attributes cleared

  • HTML comments removed

Visible content — text, links, tables, headings, product data — is untouched.

# Default: scrubbing is on
result = await agent.extract(html)

# Turn off only if the page has no sensitive data
result = await agent.extract(html, scrub=False)

# Or scrub manually and inspect before sending
from web_speed_agent import scrub
clean_html = scrub(raw_html)
print(clean_html)  # inspect what will be sent
result = await agent.extract(clean_html, scrub=False)

Server-side data handling

  • HTML processed in-memory only — never written to disk, never logged, never cached

  • Auth-gated pages never cached — pages requiring login are explicitly excluded from the shared registry

  • Usage logs store only: a hash of your API key, a hash of the URL (or "sdk-extract"), timestamp, and detected page type — no content

  • No raw HTML in error responses — exceptions are sanitized before any error is returned

Other protections

  • Credentials stored in system keychain, never in files, never sent to servers

  • Session files written with 0o600 permissions (owner-only read/write)

  • Config directory created with 0o700 permissions

  • TLS always verifiedverify=True on all HTTP calls, cannot be disabled

  • HTTPS enforcedserver_url must start with https://, plain HTTP rejected

  • Path traversal prevention — session names validated against [a-zA-Z0-9_-] allowlist

  • No credential logging — passwords never appear in logs or error messages


License

GNU General Public License v3.0 — see LICENSE.

Web Speed API usage is subject to the Web Speed Terms of Service.

A
license - permissive license
-
quality - not tested
C
maintenance

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Dominic-Pi-Sunyer/web-speed-agent'

If you have feedback or need assistance with the MCP directory API, please join our Discord server