How do you test an agentic system that makes tool calls and takes multi-step actions?

Question

Accepted Answer

Test each tool in isolation with unit tests, test the agent's decision logic by providing known state and verifying it selects the right tool with the right parameters, then test full decision chains in a sandboxed replay environment. Verify error handling, escalation conditions, and termination criteria explicitly. An agentic system requires a test strategy covering three distinct layers. Tool testing: each tool the agent can call is deterministic. Test them independently — does the code-execution tool return the right output for a given input? Does the web-search tool handle rate limiting and empty results correctly? Decision logic testing: given a known task and known tool responses (mocked), does the agent select the right tool with the right parameters? This tests the model's reasoning in a controlled environment. Use pre-recorded tool responses to make this deterministic. Full chain testing: run the agent against a sandboxed environment (local API stubs, containerised services) a

How do you test an agentic system that makes tool calls and takes multi-step actions?

Short answer

Detail

// WHAT INTERVIEWERS LOOK FOR