On this page4 sections
ReferenceAdvanced5-7 min reference

ETL Testing

ETL (Extract, Transform, Load) pipelines move and reshape data between systems — and silently corrupt reports when a transform is wrong. Unlike a one-off migration, pipelines run repeatedly, so tests must be repeatable too. This sheet covers what to check at each stage; it's close kin to Data Migration Testing (linked below).

Test at each stage

StageVerify
ExtractAll expected source rows pulled; correct filters/incremental window; source unchanged
TransformBusiness rules applied correctly; joins, aggregations, lookups, dedup
LoadTarget row counts; upserts vs inserts; no duplicates; constraints hold

Data-quality dimensions

DimensionAsks
CompletenessAll rows/fields present?
AccuracyValues match the rule/source of truth?
ConsistencySame value across tables/feeds?
UniquenessNo unintended duplicates?
ValidityTypes, ranges, formats correct?
TimelinessData fresh / within SLA?

Pipeline-specific checks

  • Idempotency / re-runs: running twice doesn't double-load.
  • Incremental loads: only new/changed rows; watermark correct.
  • Late / out-of-order data handled.
  • Failure & restart: a mid-run failure recovers without partial corruption.
  • Schema drift: a new/renamed source column is detected, not silently dropped.

Common mistakes

  • Testing only the final table, never the transform logic that produced it.
  • Comparing averages instead of row-level reconciliation (masks offsetting errors).
  • Ignoring incremental-load and re-run idempotency (double counting).
  • No alerting on data-quality failures — bad data reaches dashboards unnoticed.
  • Tiny test data that never exercises late/duplicate/null edge cases.