Tutorials·13 June 2026 · 9 min read
How I evaluate an AI chatbot before release
A practical evaluation pass for AI chat features: hallucinations, refusals, prompt injection, and the cases with no single right answer.
ai-testingllmevaluation
A practical evaluation pass for AI chat features: hallucinations, refusals, prompt injection, and the cases with no single right answer.
LLMs can't reliably separate instructions from data, so user input can hijack the model. Direct and indirect injection, what to check for, and how to report it QA-safe.
A screenshot isn't a repro when outputs vary. Capture the full assembled prompt, retrieved context, model version, and parameters so an AI bug is actually reproducible.
Concrete test cases for AI hallucination — unanswerable questions, false premises, invented entities, citations — and how to judge answers with no 'correct' value.