// AI · Non-Deterministic
Testing in the age of AI agents.
Two distinct problems get lumped under "AI testing" and they are not the same problem. Pick the path that matches what you're shipping, or browse by category below.
> reviewed = 2026-05-18 · 37 guides · 8 paths · refresh quarterly
$ what_are_you_trying_to_do --pick-one
// the headline insight
When output is probabilistic, the test pyramid tilts. Unit tests at the base shrink. Evaluation rises to take their place.
AI for test documentation
Test plans, acceptance criteria, traceability matrices — the documentation overhead that quietly consumes a third of every sprint. AI is uniquely well-suited here because the output is reviewed by humans before it matters, the inputs are unstructured prose, and the cost of a mediocre draft is low.
AI for test data generation
Realistic test data is one of the harder problems in QA — production data is sensitive, manually crafted data misses edge cases, and synthetic data tooling has matured fast. AI is now the default approach for generating users, edge cases, adversarial inputs, and PII-safe substitutes that preserve shape without leaking real customers.
AI in CI/CD
The five highest-leverage places to add AI to a CI/CD pipeline: predictive test selection, flaky-test classification, failure triage, risk-based run ordering, and AI-generated tests on PR open. Each is a distinct problem with a distinct tooling landscape.
AI for automation scripting
The most-cited GenAI use case in QA — 63% of practitioners in WQR 2025-26. The question isn't whether to use Copilot, Cursor, or Claude Code to write Playwright. It's where they reliably win, where they reliably fail, and what the prompt patterns look like when you're 6 months into using them daily.
Testing AI features in your product
The test pyramid changes shape when output is non-deterministic. Exact-match assertions break. The evaluation layer — curated datasets, rubric scoring, LLM-as-judge — rises to fill the gap. Most teams discover this the hard way, months into a project.
Using AI agents to test
An AI agent driving a real browser session is doing testing work — it decides what to click, observes what happened, and iterates toward a goal. An AI coding assistant that generates test code is helping you do testing work. These are categorically different architectures.
Testing the AI model itself
Distinct from testing AI features in your product — this band is about validating the model as an artefact: accuracy, bias, fairness, drift, robustness, evaluation frameworks. The audience extends beyond traditional QA into ML engineering and red-teaming.
AI governance, compliance, and red-teaming
EU AI Act bias-monitoring obligations land 2 August 2026. NIST AI RMF adoption is accelerating. ISO 42001 audit pressure is real. This band covers the QA practitioner side of AI governance — what to test, what to document, what to defend.