How do you test the deterministic parts of an LLM-powered system separately?

Question

Accepted Answer

Isolate the non-LLM layers — input parsing, routing, retrieval, output formatting, error handling — and test them with standard unit and integration tests. Mock the LLM calls so these tests are fast, deterministic, and independent of a live model. Not everything in an AI system is non-deterministic. A RAG pipeline has several deterministic layers: the query parser, the vector similarity search, the context formatter, the response post-processor. Testing these with standard techniques is faster, cheaper, and more precise than running everything through the LLM. In practice: Input preprocessing: unit test that the input is sanitised, trimmed, and routed to the correct handler. Retrieval: integration test that a given query returns the expected documents from the vector store (deterministic for the same index state). Output formatting: unit test that the model response is parsed and wrapped in the correct API shape, and that an unexpected model format is handled gracefully. Error handling

How do you test the deterministic parts of an LLM-powered system separately?

Short answer

Detail

// WHAT INTERVIEWERS LOOK FOR