Safety Testing (LLM)

AI & LLM Testing

// Definition

Verifying that an LLM application refuses to generate harmful, illegal, or policy-violating content and resists adversarial attempts to elicit such content. Distinct from functional testing (does the feature work?) and performance testing. Covers: jailbreaking attempts, prompt injection payloads, outputs that violate content policies (PII leakage, instructions for illegal activity), and over-refusal (the model refusing legitimate requests to the point of being useless). A safety eval suite should run on every model upgrade and before production release.

// Related terms