AWT (AI Watch Tester)
Provides specialized support for testing Flutter web applications, utilizing native CanvasKit and Semantics detection for accurate interaction.
Enables automated testing within CI/CD pipelines on GitHub Actions, allowing for regression testing and screenshot reporting on every commit.
Supports automated testing for Next.js applications with optimized speed modes for functional and regression testing.
Supports local AI execution for test generation and self-healing via Ollama, enabling private and free-to-run automated testing.
Leverages OpenAI's GPT-4o models to automatically scan web applications, generate test steps, and fix failing tests through a DevQA Loop.
Optimized for testing web applications built with React, featuring specific speed modes for rapid UI verification.
Integrates with Vercel for cloud-hosted testing environments and provides a dashboard for monitoring test results.
What is AWT?
AWT is a browser testing tool that writes and fixes its own tests.
You give it your web app's URL. AWT opens a real browser, figures out what's on the page (buttons, forms, links), writes test steps, runs them, and tells you what passed and what failed. If something breaks, the DevQA Loop kicks in — AI reads the error, updates the test or your code, and tries again.
No test code to write. No recording sessions. No manual updates when the UI changes.
Start in 5 Minutes
Option 1 — Cloud (no install, free)
1. Visit https://ai-watch-tester.vercel.app
2. Sign up (email or GitHub — takes 30 seconds)
3. Paste your app URL
4. Watch AWT test your site liveOption 2 — Local CLI (runs on your machine)
# Install (requires Python 3.11+)
pip install aat-devqa
playwright install chromium
# Run the visual dashboard
aat dashboard
# → Opens at http://localhost:9500
# Or test directly from the command line
aat devqa "test the login flow" --url https://your-app.comThat's it. AWT scans your page, writes a test plan, shows it to you for approval, then runs it in a real Chrome window.
How It Works
You give AWT a URL
│
▼
🔍 SCAN — AWT opens Chrome and reads every button, input, and link
│
▼
📝 GENERATE — AI writes a step-by-step test plan (you review & approve)
│
▼
▶️ RUN — AWT clicks, types, and navigates like a real user
│
├── ✅ All passed → screenshot report saved
│
└── ❌ Something failed
│
▼
🔄 DEVQA LOOP — AI reads the failure,
fixes the test (or your code),
and tries again (up to 5 times)The DevQA Loop — AWT's Core Feature
Most testing tools stop when a test fails and wait for a human. AWT keeps going.
When a step fails, AWT:
Takes a screenshot of exactly what the browser shows
Reads the error message and the visible page content
Re-scans the page to check if anything moved or changed
Patches the specific failing step and retries
If the failure is a bug in your source code (not just a wrong selector), AWT can trace it — finding the route handler, component, or API endpoint that's misbehaving — and suggest or apply a fix.
# Watch the loop run live
aat devqa "checkout flow test" --url http://localhost:3000
# Or use it with your AI coding tool (Claude Code, Cursor, Copilot...)
# "Test the registration page" → AWT scans, generates, runs, fixesFour Ways to Use AWT
Cloud | Local CLI | Agent Skill | MCP Server | |
How to start | Sign up at ai-watch-tester.vercel.app |
|
|
|
Browser | Headless (server) | Real Chrome on your machine | Real Chrome on your machine | Real Chrome on your machine |
AI key needed | No (server-provided or BYOK) | Yes (your OpenAI / Anthropic / Ollama) | No — your AI tool is the brain | No |
Best for | Quick tests, PMs, planners | Developers, CI/CD | AI-assisted development | Claude Desktop, Cursor, Windsurf |
Price | Free (5/mo) · Pro $28.99 · Team $98.99 | Free forever (MIT) | Free forever | Free forever |
Agent Skill — Let your AI coding tool drive AWT
# One-line install
npx skills add ksgisang/awt-skill --skill awt -g
# Then ask your AI tool:
"Test the login flow on http://localhost:3000"
"Check if the signup form works"
"Run regression tests after my last commit"
# → AWT scans, generates test steps, runs them, and reports backMCP Server — Protocol-native
# Add to Claude Code
claude mcp add awt -- python mcp/server.py
# Tools available: aat_run, aat_doctor, aat_list_scenarios, aat_validate, aat_costWhat AWT Is Great At
Feature | Description | |
🤖 | Zero-code test generation | Point at a URL — AI generates complete test steps with real selectors |
🔄 | Self-healing DevQA Loop | Tests fail? AI fixes and retries automatically (up to 5 attempts) |
👁️ | Visual verification | Screenshots before/after every action — not just DOM checks |
🌐 | Real browser | Chrome with human-like mouse movement and typing speed |
📱 | Flutter support | Native CanvasKit + Semantics detection — tests Flutter web apps too |
📄 | Document-based generation | Feed a PDF/DOCX spec — AI generates tests from requirements |
⚡ | Speed modes |
|
📸 | Smart screenshots |
|
🔌 | Plugin architecture | Swap engines, matchers, AI providers via simple registries |
AWT vs Other Tools
vs Playwright / Cypress
Playwright and Cypress are excellent — and AWT is built on top of Playwright. The difference is who writes the tests:
AWT | Playwright / Cypress | |
Who writes tests | AI (from your URL) | You (code) |
Maintenance when UI changes | AI auto-heals | You update selectors manually |
Learning curve | Zero — just paste a URL | Moderate (framework API + JS/TS) |
Flexibility | High (YAML scenarios) | Maximum (full code control) |
Use Playwright/Cypress when you want full programmatic control. Use AWT when you want tests without writing them.
vs testRigor
AWT | testRigor | |
Test authoring | AI generates from URL — you write nothing | Plain English (you write commands) |
Self-healing | DevQA Loop (AI re-generates automatically) | Built-in auto-maintenance |
Pricing | Free (MIT, self-host) | Enterprise (~$800+/mo) |
Open source | ✅ MIT License | ❌ |
vs Applitools
Applitools specializes in visual regression (pixel-by-pixel screenshot comparison). AWT specializes in functional testing (does the login actually work?). They complement each other — run AWT for functional tests, add Applitools for pixel-perfect visual checks.
Speed & Screenshot Modes
Control the trade-off between thoroughness and speed:
# CI/CD — fastest, minimal storage
aat run --verbosity=concise --screenshots=on-failure scenarios/
# Standard QA — balanced (recommended)
aat run --verbosity=concise --screenshots=before-after scenarios/
# Full audit — every step recorded
aat run --verbosity=detailed --screenshots=all scenarios/Mode | Steps | Screenshots | ~Time | Use For |
| 12–15 | 0–1 | ~1 min | CI/CD gates |
| 12–15 | 24 | ~2 min | Daily QA |
| 60–80 | 68 | ~5 min | Compliance / audit |
Supported AI Providers
Provider | Models | Cost | Setup |
OpenAI | gpt-4o, gpt-4o-mini | Pay-per-use |
|
Anthropic | Claude Sonnet 4 | Pay-per-use |
|
Ollama | codellama, llama3, mistral | Free (local) |
|
# aat.yaml
ai:
provider: openai # openai | anthropic | ollama
model: gpt-4o
api_key: ${OPENAI_API_KEY}Architecture
aat devqa / aat run / aat dashboard
│
▼
┌─────────────────────────────────────┐
│ CLI (Typer) │
├─────────────────────────────────────┤
│ Core Orchestrator │
│ Executor · Comparator · DevQALoop │
├────────────┬──────────┬─────────────┤
│ Engine │ Matcher │ AI Adapter │
│ web/desktop│ocr/cv/ai │ openai/etc. │
├────────────┴──────────┴─────────────┤
│ Pydantic v2 Models · SQLite Learn │
└─────────────────────────────────────┘All modules follow a plugin registry pattern — add a new engine, matcher, or AI provider by implementing one base class and registering it in __init__.py.
Development
Prerequisites
Python 3.11+
Tesseract OCR:
brew install tesseract/apt install tesseract-ocr
Commands
Command | What it does |
| Install all dependencies + Playwright + pre-commit |
| Check code style (ruff) |
| Auto-fix formatting |
| Strict type checking (mypy) |
| Run all tests (pytest) |
| Tests + coverage report |
git clone https://github.com/ksgisang/AI-Watch-Tester.git
cd AI-Watch-Tester
python -m venv .venv && source .venv/bin/activate
make dev
make test # verify everything works
aat dashboard # launch at http://localhost:9500Contributing
See CONTRIBUTING.md — contributions, bug reports, and new plugins are welcome.
git checkout -b feat/my-feature
make format && make lint && make typecheck && make test
git commit -m "feat(scope): description"FAQ
No. The Cloud version at ai-watch-tester.vercel.app needs nothing — just a browser. The local CLI needs one terminal command to install.
The only thing AWT needs from you is a URL and (optionally) a description of what to test.
When a web app changes — a button moves, a label changes, a new form field appears — traditional tests break and stay broken until someone manually updates them.
AWT's DevQA Loop re-scans the page after a failure, finds the updated element, and patches the test step automatically. You don't have to touch the test files.
Cloud (no install): ai-watch-tester.vercel.app
Local:
pip install aat-devqa
playwright install chromium
aat dashboard # opens at http://localhost:9500From source:
git clone https://github.com/ksgisang/AI-Watch-Tester.git
cd AI-Watch-Tester
make dev && aat dashboard
|
| |
Starting point | Just a description + URL | Existing scenario file |
Test generation | Automatic (scans and writes) | Uses your file |
Failure fixing | Patches the test YAML | AI patches your source code |
Best for | First run, quick testing | Iterative dev with code fixes |
Use aat devqa when starting from scratch. Use aat loop when you want AWT to also fix your application code.
--verbosity — how many steps run:
detailed(default): all steps including wait/assert/screenshotconcise: core actions only (navigate, click, type) — faster
--screenshots — how many images are saved:
all(default): after every stepbefore-after: before + after each click/type/navigate (~70% fewer files)on-failure: only when a step fails (great for CI/CD)
# Recommended for daily QA
aat run --verbosity=concise --screenshots=before-after scenarios/
# For CI/CD pipelines
aat run --verbosity=concise --screenshots=on-failure scenarios/Provider | Models | Cost |
OpenAI | gpt-4o, gpt-4o-mini | Pay-per-use |
Anthropic | Claude Sonnet 4 | Pay-per-use |
Ollama | codellama, llama3, mistral | Free (local GPU) |
Cloud BYOK keys are encrypted at rest (Fernet/AES-128-CBC).
Plan | Price | Tests/month |
Free | $0 | 5 |
Pro | $28.99/mo | 100 |
Team | $98.99/mo | 500 |
The local CLI is free forever with no limits.
Yes. For local runs, use the --screenshots=on-failure flag to keep output minimal. For cloud, the API accepts a POST request:
curl -X POST https://your-awt-server.com/api/v1/run \
-H "X-API-Key: awt_your_key" \
-H "Content-Type: application/json" \
-d '{"target_url": "https://staging.example.com"}'See the CI/CD Guide for GitHub Actions and GitLab CI examples.
All traffic encrypted via HTTPS/TLS
BYOK API keys: Fernet-encrypted (AES-128-CBC + HMAC-SHA256) at rest
Screenshots: auto-deleted after 7 days
Local mode: nothing leaves your machine
See our Privacy Policy
License
MIT — free for personal and commercial use.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/ksgisang/AI-Watch-Tester'
If you have feedback or need assistance with the MCP directory API, please join our Discord server