GROUP BY collapses many rows into one summary row per group. Sometimes that's exactly what you want. But sometimes you want the original rows back, with a per-group calculation attached — "every order, with a row number per user," "every test run, with the running pass rate so far," "every order, with the difference from the previous one." Window functions do that. They're the most underused feature in standard SQL, and once you've seen them, queries you couldn't write before become two-line jobs.
The defining difference vs GROUP BY
GROUP BY collapses. OVER (PARTITION BY ...) doesn't. Both compute per-group aggregates; only the latter keeps the underlying rows.
Compare:
-- GROUP BY — collapses to one row per user
SELECT user_id, COUNT(*) AS order_count
FROM orders
GROUP BY user_id;
-- 4 rows outvs:
-- Window function — every original row, plus the per-user count
SELECT
id,
user_id,
total,
COUNT(*) OVER (PARTITION BY user_id) AS order_count
FROM orders;
-- 5 rows out — every order, with its user's order_count attachedResult:
| id | user_id | total | order_count |
|----|---------|--------|-------------|
| 1 | 1 | 145.00 | 1 |
| 2 | 2 | 48.50 | 2 |
| 3 | 2 | 320.00 | 2 |
| 4 | 4 | 63.00 | 1 |
| 5 | 5 | 25.00 | 1 |
Five orders out (not four), each tagged with how many orders that user has. That's the value: aggregate without losing the rows.
ROW_NUMBER — assign sequence within a partition
-- Number each user's orders, newest first
SELECT
user_id,
id AS order_id,
total,
created_at,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS order_rank
FROM orders;Result:
| user_id | order_id | total | created_at | order_rank |
|---------|----------|--------|---------------------|------------|
| 1 | 1 | 145.00 | 2026-04-20 14:30:21 | 1 |
| 2 | 3 | 320.00 | 2026-05-01 12:08:50 | 1 |
| 2 | 2 | 48.50 | 2026-04-28 10:01:14 | 2 |
| 4 | 4 | 63.00 | 2026-05-03 17:42:30 | 1 |
| 5 | 5 | 25.00 | 2026-05-05 09:11:08 | 1 |
Two pieces in the OVER clause:
PARTITION BY user_id— restart the numbering for each user. Like GROUP BY, but doesn't collapse.ORDER BY created_at DESC— within each partition, the newest order is rank 1.
A killer pattern — "the most recent X per Y"
ROW_NUMBER + PARTITION + filter on rank = "the latest one per user." This is the canonical solution to a query that's very awkward without window functions:
-- Each user's most recent order — and ONLY that order
SELECT *
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS rn
FROM orders
) ranked
WHERE rn = 1;Result:
| id | user_id | total | status | created_at | rn |
|----|---------|--------|------------|---------------------|----|
| 1 | 1 | 145.00 | completed | 2026-04-20 14:30:21 | 1 |
| 3 | 2 | 320.00 | shipped | 2026-05-01 12:08:50 | 1 |
| 4 | 4 | 63.00 | pending | 2026-05-03 17:42:30 | 1 |
| 5 | 5 | 25.00 | cancelled | 2026-05-05 09:11:08 | 1 |
One row per user, each one's most recent order, with all the columns. "Find the latest-per-group" used to require a self-join; now it's six lines.
RANK and DENSE_RANK — handle ties
ROW_NUMBER gives every row a unique number, even ties. RANK and DENSE_RANK behave differently:
| Function | If two rows tie for 2nd, the next row is… |
|---|---|
| ROW_NUMBER() | 4th (every row gets a unique number) |
| RANK() | 4th (skips numbers — like Olympic ranking) |
| DENSE_RANK() | 3rd (no skipped numbers) |
-- Rank products by price; ties get the same rank
SELECT name, price,
RANK() OVER (ORDER BY price DESC) AS price_rank,
DENSE_RANK() OVER (ORDER BY price DESC) AS price_dense_rank
FROM products;For most QA "give me the top X" work, ROW_NUMBER is the right default. RANK and DENSE_RANK matter when ties have meaning (sports rankings, percentile bucketing).
Running totals
Aggregate window functions can be cumulative if you order them. SUM with an ORDER BY in OVER gives a running total:
SELECT
id,
total,
SUM(total) OVER (ORDER BY created_at) AS running_total
FROM orders;Result:
| id | total | running_total |
|----|--------|---------------|
| 1 | 145.00 | 145.00 |
| 2 | 48.50 | 193.50 |
| 3 | 320.00 | 513.50 |
| 4 | 63.00 | 576.50 |
| 5 | 25.00 | 601.50 |
Useful for cumulative metrics — running revenue, running pass rate, running test count over time.
LAG and LEAD — peek at neighbouring rows
LAG(col) gives you the value from the previous row. LEAD(col) gives the value from the next row. Both take an OVER clause that defines the order.
-- Compare each order's total with the previous order's total
SELECT
id,
total,
LAG(total) OVER (ORDER BY created_at) AS previous_total,
total - LAG(total) OVER (ORDER BY created_at) AS diff_from_previous
FROM orders;Result:
| id | total | previous_total | diff_from_previous |
|----|--------|----------------|--------------------|
| 1 | 145.00 | NULL | NULL |
| 2 | 48.50 | 145.00 | -96.50 |
| 3 | 320.00 | 48.50 | 271.50 |
| 4 | 63.00 | 320.00 | -257.00 |
| 5 | 25.00 | 63.00 | -38.00 |
The first row's previous_total is NULL because there's no row before it.
LAG is the right tool for "did this metric improve or regress since last time?" — exactly what a test-suite trend dashboard wants.
A QA-shaped report
For each user, find their single most recent order and tag it with how many other orders they've placed:
Each user's most recent order total (window-function output)
SELECT user_id, id AS order_id, total, total_orders
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS rn,
COUNT(*) OVER (PARTITION BY user_id) AS total_orders
FROM orders
) ranked
WHERE rn = 1
ORDER BY total DESC;Two window functions, one query. Bob shows up with total_orders = 2; everyone else shows 1. That's exactly the "per-user, latest, with context" pattern that's hard to write any other way.
Database support
Window functions are standard SQL but arrived later than the rest of the language. Compatibility:
- PostgreSQL, SQL Server, Oracle: full support, decades old.
- MySQL: full support since 8.0 (2018).
- SQLite: full support since 3.25 (2018).
If you're on a database from the early 2010s, the syntax may not work. For everything modern, you're fine.
⚠️ Common Mistakes
- Filtering on a window-function result in WHERE.
WHERE rn = 1doesn't work in the same SELECT that definesrn— window functions run after WHERE. Wrap the SELECT in a subquery and filter on the alias outside. - Forgetting ORDER BY in OVER for cumulative aggregates. Without ORDER BY,
SUM(...) OVER (...)returns the full sum on every row, not a running total. Useful sometimes; usually not what you wanted. - Treating window functions as exotic. They're standard SQL, supported almost everywhere modern. Reach for them; the syntax becomes natural after about three uses.
🎯 Practice Task
30 minutes. Use the e-commerce database.
- Use ROW_NUMBER + PARTITION BY user_id ORDER BY total DESC to rank each user's orders by total. Read the result and identify Bob's biggest order.
- Use the rank-and-filter pattern to return each user's largest order in a single result set.
- Compute a running total of order revenue ordered by
created_at. Compare the last row toSELECT SUM(total) FROM orders— they should match. - Use LAG to compute the difference between each order's total and the previous one. Sort by
created_atso the LAG is meaningful. - Use COUNT() OVER (PARTITION BY user_id) to add a "user's total orders" column to every order row, without collapsing the data.
- Stretch: find each user's second-most-recent order. Same pattern as "most recent," but filter to
rn = 2. Some users won't have a second order — observe the empty rows in the result.
Final lesson of Chapter 6: views, stored procedures, and triggers — the database objects developers build on top of plain SQL, and what testers should know about them.