A subquery is a SELECT inside another SELECT. The inner query runs first and produces a result; the outer query uses that result. Subqueries unlock questions JOINs alone struggle with — "products more expensive than the average," "users whose total spend exceeds £500," "users who have at least one completed order." They're also a common readability win when the alternative is a long chain of joins. This lesson covers the four main shapes of subquery and when each is the right tool.
The mental model
Step 1 of 4
Inner query runs
The subquery executes first, producing a value, a row, or a list.
That's the conceptual order. In practice the database optimizer often rearranges things — inlining subqueries as joins, caching scalar results, and so on. But for reading a subquery, the inner-first model is the right way to think.
Subquery in WHERE — the most common shape
A subquery returning a list of values, used with IN:
-- Users who have placed at least one order
SELECT id, name, email
FROM users
WHERE id IN (SELECT DISTINCT user_id FROM orders);Result:
| id | name | email |
|----|-------------|-------------------|
| 1 | Alice Khan | alice@example.com |
| 2 | Bob Patel | bob@example.com |
| 4 | Dan Müller | dan@example.com |
| 5 | Esha Roy | esha+qa@test.com |
The inner query produces (1, 2, 4, 5). The outer query keeps users whose id is in that list. Carol is missing — she has no orders, so her id isn't in the inner result.
A scalar subquery (returning a single value) used with >, <, =:
-- Products priced above the average
SELECT name, price
FROM products
WHERE price > (SELECT AVG(price) FROM products);The inner query computes AVG(price) — say, £73.85 — and the outer query keeps products above that:
| name | price |
|--------------------------|--------|
| Mechanical Keyboard Pro | 120.00 |
| 27-inch Monitor | 320.00 |
| Webcam HD | 75.00 |
This is one of the few times you can compare against an aggregate inside WHERE — the subquery converts the aggregate into a single number that WHERE can handle.
Subquery in FROM — the derived table
A subquery in FROM acts like a temporary table. Useful when you want to aggregate, then aggregate again:
-- Average number of orders per user
SELECT AVG(order_count) AS avg_orders_per_user
FROM (
SELECT user_id, COUNT(*) AS order_count
FROM orders
GROUP BY user_id
) AS user_orders;The inner query produces a small result — one row per user, with their order count:
| user_id | order_count |
|---------|-------------|
| 1 | 1 |
| 2 | 2 |
| 4 | 1 |
| 5 | 1 |
The outer query then averages those counts: (1+2+1+1) / 4 = 1.25. You can't write that in one query without the derived table — AVG(COUNT(*)) isn't legal SQL.
The AS user_orders alias is required in most databases — derived tables must be named.
Correlated subquery — references the outer row
A correlated subquery references a column from the outer query. The database runs the subquery once per outer row:
-- Users whose total spending exceeds £100
SELECT u.id, u.name, u.email
FROM users u
WHERE (SELECT COALESCE(SUM(total), 0) FROM orders WHERE user_id = u.id) > 100;For each user u, the subquery evaluates SUM(total) FROM orders WHERE user_id = u.id — the user's total spend. Users whose total is > £100 are kept.
Result:
| id | name | email |
|----|------------|-------------------|
| 1 | Alice Khan | alice@example.com |
| 2 | Bob Patel | bob@example.com |
Correlated subqueries are powerful but slower — running once per outer row is more work than running once. For large tables the same logic is often better expressed as a JOIN with GROUP BY. We'll compare them at the end of the lesson.
EXISTS — "does this subquery return anything?"
EXISTS returns TRUE if the subquery returns any rows, FALSE if not. The subquery's contents don't matter — only whether it's empty or not:
-- Users with at least one completed order
SELECT u.id, u.name
FROM users u
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.user_id = u.id
AND o.status = 'completed'
);Read EXISTS as "does at least one row exist matching this condition?". Result:
| id | name |
|----|------------|
| 1 | Alice Khan |
| 2 | Bob Patel |
SELECT 1 is a convention — EXISTS doesn't care what columns the subquery returns, just whether it returns rows. SELECT *, SELECT 1, SELECT col1 all work; SELECT 1 is the idiomatic choice because it makes the intent obvious.
NOT EXISTS flips the test — find rows where the subquery returns nothing:
-- Users who have NO orders
SELECT u.id, u.name
FROM users u
WHERE NOT EXISTS (
SELECT 1 FROM orders o WHERE o.user_id = u.id
);
-- Result: Carol SinghSame answer as the LEFT JOIN + IS NULL pattern from Chapter 3. NOT EXISTS is sometimes clearer; the LEFT JOIN form is sometimes more efficient. Both work.
Subquery vs JOIN — which to use
Many subqueries can be rewritten as JOINs and vice versa. Modern optimizers usually produce the same plan for either form. So the decision is mostly about clarity:
- Use a JOIN when the relationship is the point — "every order with its user's name."
- Use a subquery when the inner query is conceptually independent — "products above the average price."
- Use EXISTS / NOT EXISTS when the question is binary — "does any row match?". This often reads more naturally than a JOIN-with-DISTINCT.
When in doubt, write whichever form makes the question obvious to the next person to read your SQL. Performance tuning rarely starts at this level.
Subqueries are recursive — they can nest
A subquery can contain another subquery:
-- Users who placed an order on the same day as the most recent order in the system
SELECT DISTINCT u.id, u.name
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE DATE(o.created_at) = (
SELECT DATE(MAX(created_at)) FROM orders
);Inner-first reading: the innermost subquery finds the latest date; the outer query keeps users who ordered on that date. Two levels of nesting; readable when you take it from the inside out.
A QA query you'll actually run
-- Products that have NEVER been ordered — your "should we delist?" report
SELECT id, name, price
FROM products
WHERE id NOT IN (
SELECT DISTINCT product_id FROM order_items WHERE product_id IS NOT NULL
);The WHERE product_id IS NOT NULL inside the subquery defends against the NOT IN gotcha from Chapter 3 — if the subquery returned NULL for any row, NOT IN would silently return zero rows. Belt-and-braces SQL costs nothing and prevents real bugs.
⚠️ Common Mistakes
- NOT IN with a subquery that might return NULL. Returns zero rows, silently. Filter NULLs in the subquery (
WHERE col IS NOT NULL) or use NOT EXISTS instead. - Forgetting the alias on a derived table.
FROM (SELECT ...)withoutAS some_nameerrors out on most databases. Always alias your derived tables. - Correlated subqueries on huge tables. Once-per-row execution kills performance on millions of rows. If a query is slow, try rewriting the correlated subquery as a JOIN with GROUP BY.
🎯 Practice Task
30 minutes. Use the e-commerce database.
- Write a WHERE-IN subquery: every product that has been ordered at least once.
- Write a scalar-comparison subquery: every order whose total is above the average order total.
- Write a derived-table query: the average number of products in stock per category. (Inner query: COUNT in-stock products per category; outer query: AVG of those counts.)
- Write a correlated subquery: users whose total spending exceeds £100. Compare your answer against a JOIN + GROUP BY + HAVING version — same result, different shape.
- Write an EXISTS query: users who have at least one shipped or completed order.
- Write a NOT EXISTS query: users with zero orders. Compare to the LEFT JOIN + IS NULL form from Chapter 3 — same Carol, different SQL.
- Stretch: find each user's most expensive single order. Two ways to do it. (Hint 1: GROUP BY user_id with MAX(total). Hint 2: a correlated subquery —
WHERE total = (SELECT MAX(total) FROM orders WHERE user_id = u.id).) Run both. Compare results and compare the SQL.
Next lesson: window functions — the one feature that lets you compute "per-group" results without losing the original rows. Once you've seen them, you'll wonder how you ever lived without them.