Skip to main content
Glama

Accessibility Evidence Engine (AEE)

CI

AI-first accessibility testing that augments your existing Playwright tests.

axe tells you the alt attribute exists. AEE tells you whether the alt text is right — and writes you a better one.

Static scanners (axe-core) answer "is the attribute present?" and top out around 30–50% of real issues, because the defects that matter — a meaningless alt, an icon button named "button", a heading that doesn't describe its section, meaning carried by color alone — are not statically detectable. AEE puts an AI judgment layer on top of that deterministic floor and asks "is it correct in this context?", returning a verdict, a grounded reason, a suggested fix, and a reliability tier.

AEE is agent-native: you operate it by chat through an MCP server (run investigations, query the evidence, apply fixes), declare a page's intent in plain language to sharpen judgments, chat with the report locally, and let it open remediation PRs.

Status: implemented end to end and verified on a local model — no API key. Coverage spans Tier 1 (naming), Tier 2 vision (color-alone, focus-visible, text-in-images), Tier 3 dynamic (focus, live regions, keyboard), the Tier 4 axe-core floor, and Tier 5 advisory (never a certified PASS). The agent surfaces (MCP, triage) and remediation (fix) are real. See docs/ROADMAP.md for current state and docs/PLAN.md for the architecture.

How it works

Playwright test → observers capture EVIDENCE → AI judges quality (evidence only) → REPORT + fixes
                                                         ↑
                              AI-first surfaces: MCP server · triage UI · auto-fix/PR

The core invariant: AI sees captured evidence only, never the live page — so judgments are grounded and reproducible, and the rule UNKNOWN never becomes PASS holds.

Related MCP server: Playwright Accessibility Testing MCP Server

Packages

Package

Role

@aee/core

Contracts, zod schemas, evidence/judge/report types (depends on zod only)

@aee/observers

Evidence grounding: DOM, a11y tree, screenshots, images, styles, network, SR

@aee/ai

AI judgment + conversational explain() + fix drafting (evidence only)

@aee/judges

Per-concern judges = axe-core floor + AI judgment

@aee/playwright

Driver + test fixture + checkpoint()

@aee/reporter

JSON report + terminal summary

@aee/mcp

MCP server — the agent-native surface

@aee/triage

Local "chat with your report" UI

@aee/fix

Apply suggested fixes and open PRs

Quickstart — drop-in Playwright fixture

import { test, expect } from '@aee/playwright'; // drop-in for '@playwright/test'

test('checkout flow', async ({ page, aee }) => {
  await page.goto('/checkout');
  await aee.checkpoint('checkout-loaded', {
    intent: { purpose: 'Checkout', primaryAction: 'Pay', notes: 'cart icon opens the cart drawer' },
  });
  // ...your existing test, unchanged...
});

Talk to it — the MCP server

AEE is agent-native: a coding agent (Claude Code, Cursor, …) connects to the MCP server and investigates pages by chat. Build, then run it over stdio:

pnpm build
node packages/mcp/dist/bin.js      # or `aee-mcp` once the package is linked

Register it like any stdio MCP server — local model, no API key:

{ "mcpServers": { "aee": {
  "command": "node",
  "args": ["/abs/path/accessibility-engine/packages/mcp/dist/bin.js"],
  "env": { "AEE_LLM_PROVIDER": "local" }
} } }

Then investigate a page (→ a graded report with fixes), explain a finding from evidence, suggest_fix (→ targeted FixPlans), and apply_fix (→ patches the fixes into source). Full reference: docs/mcp-tools.md.

Development

pnpm install
pnpm build       # tsc -b across all packages
pnpm typecheck
pnpm test        # node --test over built dist/
pnpm gen:schemas # regenerate JSON Schema in /schemas from the zod source
pnpm demo        # investigate a sample page and print the graded report

Requires Node ≥ 22 and pnpm. pnpm demo needs Chromium (pnpm exec playwright install chromium); set AEE_LLM_PROVIDER=local (with a local server running) to judge for real — otherwise AI verdicts are UNKNOWN and the axe floor still reports.

Model backends (no API key required)

The AI layer is provider-agnostic: it depends on a one-method JudgmentModel seam, not on any SDK. Pick a backend with createAIClient({ provider }) or the AEE_LLM_PROVIDER env var.

Provider

Backend

Needs

local

A local model over the OpenAI-compatible API (Ollama, LM Studio, llama.cpp, vLLM, …)

a running local server — no key

claude

Anthropic Claude (claude-opus-4-8 by default)

ANTHROPIC_API_KEY

stub

Always-UNKNOWN (the default when no key is set)

nothing

Run the engine against a local model — no key, no cloud:

# Ollama (default base URL http://localhost:11434/v1)
export AEE_LLM_PROVIDER=local
export AEE_LLM_MODEL=gemma4:e4b      # any chat model you have pulled
pnpm test                            # the local live tests now exercise real judging

Point AEE_LLM_BASE_URL at LM Studio (http://localhost:1234/v1), llama.cpp, vLLM, or any OpenAI-compatible endpoint. A local judgment that can't be reached or parsed degrades to UNKNOWN — never a guessed PASS.

License

MIT

A
license - permissive license
-
quality - not tested
B
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Elizabeth1979/accessibility-engine'

If you have feedback or need assistance with the MCP directory API, please join our Discord server