Q15 of 38 · Test design

How do you design tests for a feature flag system?

Test designMidfeature-flagsrollouttest-designmid

Short answer

Short answer: Test the flag's two states (on/off), all combinations with other flags it interacts with, the rollout mechanism (percentage, user targeting), the off-default fallback, and the cleanup pathway. Don't trust the flag platform — assume it can return wrong values and the system should still degrade gracefully.

Detail

Feature flags are testing's force multiplier — and bug magnet. They expand the cross-product of system state, and naive tests miss the failure modes that flag rollouts cause.

Per-flag binary states. Each flag should be tested in both on and off. If a flag is added with default off, both states need explicit coverage before promotion.

Flag interactions. Two flags A and B both controlling parts of the same flow → 4 combinations. With n interacting flags it's 2^n; use pairwise once n > 4.

Targeting / rollout mechanisms:

  • Percentage rollout: user X is in the 10% bucket; assertion that they consistently get the same value across requests.
  • User targeting: specific users / cohorts get the flag; verify the targeting condition.
  • Geo / device targeting: behaviour for each segment.

Default behaviour when the flag service is down. Critical: if your flag platform (LaunchDarkly, Split, in-house) returns an error or times out, what does the system do? Most platforms recommend "use the configured default value" — test that explicitly.

Flag transitions:

  • Flag goes from off → on while user is mid-session: does the UI reflect immediately? Is there a stale-state risk?
  • Flag goes on → off (rollback): same question, plus does any data created under "on" survive cleanup?

Cleanup / decommission. Old flags accumulate. Test that the codebase has explicit cleanup paths — when the flag is removed, the "on" code path becomes the default and the "off" branch is deleted. (Most flag bugs come from stale flags.)

Test design moves: run the suite twice in CI, once with each value of any "currently rolling out" flag; canary or shadow tests in production verify the flag's effect matches the expectation for live traffic; audit logs verify the flag's state change is logged with who/when/why.

The senior signal: treating feature flags as a test design dimension, not as an afterthought.

// WHAT INTERVIEWERS LOOK FOR

Awareness that flags multiply combinatorial space, knowing the default-when-unavailable behaviour test, and the cleanup pathway concern.

// COMMON PITFALL

Testing only the 'intended' flag value (often 'on') and missing the off-path or the fallback when the flag service is unreachable.