Why can't you use exact-match assertions when testing an LLM-powered feature?

Question

Accepted Answer

LLMs produce different text on every call even with the same input — temperature and sampling mean output varies by design. Exact-match assertions would fail on every run not because the feature is broken but because the phrasing changed. An LLM given "summarise this article" might return "The article discusses climate policy." one run and "This piece covers environmental regulation." the next. Both are correct summaries. An assertion like expect(output).toBe("The article discusses climate policy.") would fail the second run — a meaningless failure that trains the team to ignore test results. The fix is to stop thinking about what the output IS and start thinking about what it MUST satisfy: Is the output a valid JSON object with the required fields? Is the length within acceptable bounds? Does it avoid banned content (PII, profanity, competitor brand names)? Is it grounded in the source document — no fabricated facts? These are property checks that hold regardless of which valid phrasi

Why can't you use exact-match assertions when testing an LLM-powered feature?

Short answer

Detail

// WHAT INTERVIEWERS LOOK FOR