Q4 of 26 · SQL
How do GROUP BY and HAVING work, and how would you use them to find duplicate records?
SQLJuniorsqlgroup-byhavingduplicatesaggregation
Short answer
Short answer: GROUP BY collapses rows into groups by one or more columns and lets you apply aggregate functions. HAVING filters those groups — like WHERE but applied after aggregation.
Detail
GROUP BY is the building block for spotting duplicates and summarising test data. HAVING is WHERE for aggregated results.
Finding duplicates — the classic pattern:
SELECT email, COUNT(*) AS cnt
FROM users
GROUP BY email
HAVING COUNT(*) > 1;
This returns every email address that appears more than once — meaning duplicate user registrations slipped through.
Full duplicate rows — if you want the actual IDs involved, wrap it in a subquery or CTE:
SELECT *
FROM users
WHERE email IN (
SELECT email
FROM users
GROUP BY email
HAVING COUNT(*) > 1
);
Other QA uses of GROUP BY + HAVING:
- Find test runs where more than N tests failed
- Find products with zero inventory (HAVING COUNT = 0 or SUM = 0)
- Validate that every user has exactly one active session
// EXAMPLE
-- Emails registered more than once
SELECT email, COUNT(*) AS duplicate_count
FROM users
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY duplicate_count DESC;
-- Orders with more than 10 line items (potential data anomaly)
SELECT order_id, COUNT(*) AS line_count
FROM order_items
GROUP BY order_id
HAVING COUNT(*) > 10;// WHAT INTERVIEWERS LOOK FOR
The HAVING COUNT(*) > 1 duplicate pattern is a must-know. Explain that WHERE filters before aggregation and HAVING filters after.
// COMMON PITFALL
Trying to use WHERE on an aggregate (WHERE COUNT(*) > 1) — that's a syntax error. HAVING is the right clause.