Q6 of 21 · Testing AI systems

How do you test the deterministic parts of an LLM-powered system separately?

Testing AI systemsMidtesting-ai-systemsdeterministicunit-testingmockingarchitecturerag

Short answer

Short answer: Isolate the non-LLM layers — input parsing, routing, retrieval, output formatting, error handling — and test them with standard unit and integration tests. Mock the LLM calls so these tests are fast, deterministic, and independent of a live model.

Detail

Not everything in an AI system is non-deterministic. A RAG pipeline has several deterministic layers: the query parser, the vector similarity search, the context formatter, the response post-processor. Testing these with standard techniques is faster, cheaper, and more precise than running everything through the LLM.

In practice: Input preprocessing: unit test that the input is sanitised, trimmed, and routed to the correct handler. Retrieval: integration test that a given query returns the expected documents from the vector store (deterministic for the same index state). Output formatting: unit test that the model response is parsed and wrapped in the correct API shape, and that an unexpected model format is handled gracefully. Error handling: mock the LLM to return a timeout, a 429, or a malformed response, and verify your error handling fires correctly.

This follows the layered architecture described in New test pyramid for AI — isolate and test each layer at the appropriate level before testing the full pipeline end-to-end.

// WHAT INTERVIEWERS LOOK FOR

Knowing which parts of an AI system are deterministic. Mocking LLM calls for fast, deterministic tests. Testing retrieval, input processing, output formatting, and error handling independently.