Autonomous goal-based tests or explicit step-canvas tests
Live + historical runs with screenshots, video, logs, network
Issues with agent root-cause, evidence, and proposed fixes
How it works
01
Test definitions library
Every saved test lives in /web/tests with name, mode, last-run status, trigger badge, and tags. Filter by mode (autonomous / deterministic), status (passing / failing / running / never run), or trigger source. New tests open the mode selector to either an autonomous-goal editor or a deterministic step canvas.
Search by name, goal, tag, or ID
Duplicate, archive, or delete from the row actions menu
Each test has Overview, Run history, and Configuration tabs
02
Goal-based autonomous tests
Describe what the test should accomplish in natural language. Add explicit pass and fail criteria. Pick environments, set triggers (manual / PR / deploy / cron), choose which reports the runner attaches, and select runner mode and agent speed. A live mirror sidebar shows what the Operator would see while authoring.
03
Node-based step canvas
Compose deterministic tests visually. Drag step kinds (navigate, click, type, assert, wait, branch) from a palette onto the React-Flow-style canvas, connect them to define control flow, and configure each step's fields inline. Steps reorder with framer-motion's Reorder.Group.
04
Runs — live and historical
Every execution lands in /web/runs with a live block above the history table. Filter by trigger source (Manual / MCP / PR / Schedule / Suite Runs). Each row carries status, mode, trigger, environment, started, and duration. Run detail pages render Overview, Timeline, Steps, Screenshots, Issues, Logs, and Video tabs alongside an Operator chat scoped to that run.
05
Issues with agent analysis
Bugs that runs surface land in /web/issues with severity (critical / high / warning / info), status (open / pushed / resolved / dismissed), and full evidence. Each issue carries an auto-generated agent narrative, expected-vs-actual diffs, and an Operator-proposed fix you can preview before applying. Push to GitHub or Linear from the row actions menu.
06
Dashboard scoped to web
/web/overview renders a configurable grid of widgets — pass-rate trend, top flaky tests, severity donut, active suites, trigger bar, coverage, daily summary. Drag to reorder, resize within allowed sizes, add or remove from the +Add widget modal. Layout persists per workspace.