Deterministic vs probabilistic testing

AI & LLM Testing

// Definition

Traditional software tests are deterministic: same input, same output, pass or fail. AI-backed features are probabilistic: same input can give different outputs, and "correctness" is a distribution rather than a binary. This isn't a small distinction — it breaks most of the assumptions baked into existing test frameworks. Exact-match assertions stop being useful. Flaky-test detection logic flags real model variance as a bug. The unit of measurement shifts from "this test passed" to "this prompt scored 0.87 on average across the eval set, up from 0.83 last week." Senior testers working on AI features spend more time defining what correctness means for a given feature than they do writing assertions.

// Related terms