Q3 of 21 · Testing AI systems
What are output property checks and how do you use them to test LLM responses?
Short answer
Short answer: Property checks test invariants that must hold on every valid output regardless of phrasing: required JSON fields exist, response length is within bounds, banned content is absent, claims cite the source. They replace exact-match assertions for non-deterministic outputs.
Detail
Property checks are assertions about constraints and content rules that define a valid response — not a specific valid response.
Common categories: Structural: does the response parse as valid JSON? Are required top-level fields present and the right type? Constraint: is the length within the documented range? Does the language match the requested locale? Safety: does the response contain PII, profanity, or competitor brand names? Use a regex or a secondary classifier. Groundedness: for RAG features, do all factual claims in the response appear in the retrieved source documents? A grounding check can be a secondary LLM call ("does claim X appear in context Y?") or an embedding similarity check. Instruction following: if the prompt specified "respond in bullet points" or "respond only in French," does the output comply?
Property checks run fast — they're assertions, not model calls — and can run on every response in a CI pipeline or even in production monitoring on live sampled traffic. See Evaluation methods.