Skip to main content

Agent Observability

This document describes the artifact capture layer that makes the desktop app inspectable by coding agents (Codex, Claude Code, Cursor) via existing WDIO/Appium/tauri-driver tooling.

Quick Start

bash app/scripts/e2e-agent-review.sh

Artifacts are at:

app/test/e2e/artifacts/<ISO-timestamp>-agent-review/
01-welcome.png
01-welcome.source.xml
02-post-onboarding.png
mock-requests-after-onboarding.json
failure-<test>.png # only on failure
meta.json # run metadata + checkpoint index

Components

ComponentPathRole
Helperapp/test/e2e/helpers/artifacts.tscaptureCheckpoint, saveMockRequestLog
WDIO hookapp/test/wdio.conf.ts (afterTest)Always dumps screenshot + source on any failing test
Specapp/test/e2e/specs/agent-review.spec.tsWelcome → onboarding → privacy panel
Wrapper scriptapp/scripts/e2e-agent-review.shBuild + run + print artifact directory

Environment Overrides

VariableEffect
E2E_ARTIFACT_DIRForce specific run directory
E2E_ARTIFACT_ROOTParent directory (default: app/test/e2e/artifacts)
E2E_ARTIFACT_LABELLabel (default: agent-review)

Using the Helper in New Specs

import { captureCheckpoint, saveMockRequestLog } from '../helpers/artifacts';
import { getRequestLog } from '../mock-server';

await captureCheckpoint('after-connect-click');
saveMockRequestLog('after-connect-click', getRequestLog());

Deliberately Out of Scope

  • Visual baselines / image diff for each component state
  • Screenshots on every click (too noisy)
  • Live integrations (Gmail, Notion, Telegram); mock server only
  • New test framework / reporter

Only expand to more flows after proving this loop is viable.

Next Steps