Q8 of 38 · CI/CD & DevOps
What's the difference between blue-green and canary deployments from a testing perspective?
Short answer
Short answer: Blue-green flips 100% of traffic at once after smoke tests pass on green — easy rollback, but if smoke missed a bug, everyone hits it. Canary shifts traffic gradually (1% → 10% → 100%) while watching error rate and latency — small blast radius, but slower and needs strong observability.
Detail
Blue-green runs two identical environments. Blue is current production. Green is the new version, deployed and warmed but receiving zero user traffic. You run smoke tests against green; on pass, flip the load balancer to send 100% of traffic to green. Blue stays around for instant rollback.
Testing implications:
- Smoke tests on green need to be comprehensive — once you flip, every user sees it.
- Database migrations are tricky — if green needs a schema change, both versions need to coexist during the flip (expand-contract pattern).
- Rollback is fast (flip the LB back).
- One-shot risk: if smoke missed a bug or the bug only manifests under real production load, every user feels it.
Canary sends a small percentage of traffic to the new version (1% or one cell), watches error rate and latency vs. baseline for a soak period, then ramps to 10%, 50%, 100%.
Testing implications:
- Smoke tests less critical — production traffic is the test, with auto-rollback as the safety net.
- Rich observability is required: per-version metrics, automated comparison to baseline, alerting on error budget burn.
- Slower — full rollout takes hours or days.
- Catches bugs only real production triggers — DB load, third-party rate limits, real user behaviour.
- Multiple versions live simultaneously: clients, schemas, and feature flags must handle both.
The hybrid that wins in practice: blue-green for stateless services where rollback is cheap; canary for high-traffic or risky services where blast radius matters.
For a QA practitioner, the canary world demands different skills: synthetic monitoring, SLO-based gating, automated rollback design. Less "test before release," more "observe in production safely."