// AI · Non-Deterministic

Testing in the age of AI agents.

Two distinct problems get lumped under "AI testing" and they are not the same problem. Pick the path that matches what you're shipping, or browse by category below.

> reviewed = 2026-05-18 · 37 guides · 8 paths · refresh quarterly

>search AI guides, prompts & skills…⌘K

// the headline insight

When output is probabilistic, the test pyramid tilts. Unit tests at the base shrink. Evaluation rises to take their place.

Test pyramid shift diagramThe classical upright test pyramid (faded) shifts to an AI-adapted shape with a wider evaluation layerclassicalai-shifted
// path_05 · ai for test docs4 guides · 32m · reviewed May 2026

AI for test documentation

Test plans, acceptance criteria, traceability matrices — the documentation overhead that quietly consumes a third of every sprint. AI is uniquely well-suited here because the output is reviewed by humans before it matters, the inputs are unstructured prose, and the cost of a mediocre draft is low.

// path_03 · ai for test data5 guides · 47m · reviewed May 2026

AI for test data generation

Realistic test data is one of the harder problems in QA — production data is sensitive, manually crafted data misses edge cases, and synthetic data tooling has matured fast. AI is now the default approach for generating users, edge cases, adversarial inputs, and PII-safe substitutes that preserve shape without leaking real customers.

// path_04 · ai in ci/cd5 guides · 50m · reviewed May 2026

AI in CI/CD

The five highest-leverage places to add AI to a CI/CD pipeline: predictive test selection, flaky-test classification, failure triage, risk-based run ordering, and AI-generated tests on PR open. Each is a distinct problem with a distinct tooling landscape.

// path_06 · ai for automation scripting5 guides · 51m · reviewed May 2026

AI for automation scripting

The most-cited GenAI use case in QA — 63% of practitioners in WQR 2025-26. The question isn't whether to use Copilot, Cursor, or Claude Code to write Playwright. It's where they reliably win, where they reliably fail, and what the prompt patterns look like when you're 6 months into using them daily.

// path_01 · most common5 guides · 44m · reviewed May 2026

Testing AI features in your product

The test pyramid changes shape when output is non-deterministic. Exact-match assertions break. The evaluation layer — curated datasets, rubric scoring, LLM-as-judge — rises to fill the gap. Most teams discover this the hard way, months into a project.

// path_02 · testing with ai agents6 guides · 66m · reviewed May 2026

Using AI agents to test

An AI agent driving a real browser session is doing testing work — it decides what to click, observes what happened, and iterates toward a goal. An AI coding assistant that generates test code is helping you do testing work. These are categorically different architectures.

// path_07 · testing the model itself4 guides · 43m · reviewed May 2026

Testing the AI model itself

Distinct from testing AI features in your product — this band is about validating the model as an artefact: accuracy, bias, fairness, drift, robustness, evaluation frameworks. The audience extends beyond traditional QA into ML engineering and red-teaming.

// path_08 · governance & compliance4 guides · 40m · reviewed May 2026

AI governance, compliance, and red-teaming

EU AI Act bias-monitoring obligations land 2 August 2026. NIST AI RMF adoption is accelerating. ISO 42001 audit pressure is real. This band covers the QA practitioner side of AI governance — what to test, what to document, what to defend.