Q8 of 26 · SQL

How would you use SQL to compare an expected result set against what is actually in the database?

SQLJuniorsqldata-validationcomparisonexpected-vs-actualtesting

Short answer

Short answer: Create or populate an 'expected' table, then FULL OUTER JOIN it against the actual table and filter for rows where either side is NULL — those are the mismatches.

Detail

This pattern is the SQL equivalent of an assertion. It's useful after ETL pipelines, migrations, or batch processing.

Approach 1 — FULL OUTER JOIN for set comparison:

SELECT
  e.order_id        AS expected_id,
  a.order_id        AS actual_id,
  e.status          AS expected_status,
  a.status          AS actual_status
FROM   expected_orders e
FULL OUTER JOIN actual_orders a
  ON  a.order_id = e.order_id
 AND  a.status   = e.status
WHERE  e.order_id IS NULL   -- row in actual but not expected
    OR a.order_id IS NULL;  -- row in expected but not actual

Approach 2 — EXCEPT (set difference, supported in PostgreSQL, SQL Server, SQLite):

-- Rows in expected but not in actual
SELECT order_id, status FROM expected_orders
EXCEPT
SELECT order_id, status FROM actual_orders;

-- Rows in actual but not in expected
SELECT order_id, status FROM actual_orders
EXCEPT
SELECT order_id, status FROM expected_orders;

Approach 3 — NOT EXISTS for a specific key:

SELECT e.* FROM expected_orders e
WHERE NOT EXISTS (
  SELECT 1 FROM actual_orders a
  WHERE a.order_id = e.order_id AND a.status = e.status
);

In practice, most QA engineers keep an expected-results table seeded alongside test data and run one of these patterns in a post-test assertion step.

// EXAMPLE

-- Find differences between expected and actual order statuses
SELECT
  COALESCE(e.order_id, a.order_id) AS order_id,
  e.status AS expected_status,
  a.status AS actual_status
FROM   expected_orders e
FULL OUTER JOIN orders a ON a.order_id = e.order_id
WHERE  e.status IS DISTINCT FROM a.status;

// WHAT INTERVIEWERS LOOK FOR

FULL OUTER JOIN or EXCEPT as comparison mechanisms. Awareness that this is how you verify batch jobs, migrations, and data pipelines at the DB level.