Subqueries and Nested SELECT

A subquery is a SELECT inside another SELECT. The inner query runs first and produces a result; the outer query uses that result. Subqueries unlock questions JOINs alone struggle with — "products more expensive than the average," "users whose total spend exceeds £500," "users who have at least one completed order." They're also a common readability win when the alternative is a long chain of joins. This lesson covers the four main shapes of subquery and when each is the right tool.

The mental model

Step 1 of 4

Inner query runs

The subquery executes first, producing a value, a row, or a list.

That's the conceptual order. In practice the database optimizer often rearranges things — inlining subqueries as joins, caching scalar results, and so on. But for reading a subquery, the inner-first model is the right way to think.

Subquery in WHERE — the most common shape

A subquery returning a list of values, used with IN:

-- Users who have placed at least one order
SELECT id, name, email
FROM users
WHERE id IN (SELECT DISTINCT user_id FROM orders);

Result:

| id | name        | email             |
|----|-------------|-------------------|
| 1  | Alice Khan  | alice@example.com |
| 2  | Bob Patel   | bob@example.com   |
| 4  | Dan Müller  | dan@example.com   |
| 5  | Esha Roy    | esha+qa@test.com  |

The inner query produces (1, 2, 4, 5). The outer query keeps users whose id is in that list. Carol is missing — she has no orders, so her id isn't in the inner result.

A scalar subquery (returning a single value) used with >, <, =:

-- Products priced above the average
SELECT name, price
FROM products
WHERE price > (SELECT AVG(price) FROM products);

The inner query computes AVG(price) — say, £73.85 — and the outer query keeps products above that:

| name                     | price  |
|--------------------------|--------|
| Mechanical Keyboard Pro  | 120.00 |
| 27-inch Monitor          | 320.00 |
| Webcam HD                |  75.00 |

This is one of the few times you can compare against an aggregate inside WHERE — the subquery converts the aggregate into a single number that WHERE can handle.

Subquery in FROM — the derived table

A subquery in FROM acts like a temporary table. Useful when you want to aggregate, then aggregate again:

-- Average number of orders per user
SELECT AVG(order_count) AS avg_orders_per_user
FROM (
    SELECT user_id, COUNT(*) AS order_count
    FROM orders
    GROUP BY user_id
) AS user_orders;

The inner query produces a small result — one row per user, with their order count:

| user_id | order_count |
|---------|-------------|
| 1       | 1           |
| 2       | 2           |
| 4       | 1           |
| 5       | 1           |

The outer query then averages those counts: (1+2+1+1) / 4 = 1.25. You can't write that in one query without the derived table — AVG(COUNT(*)) isn't legal SQL.

The AS user_orders alias is required in most databases — derived tables must be named.

Correlated subquery — references the outer row

A correlated subquery references a column from the outer query. The database runs the subquery once per outer row:

-- Users whose total spending exceeds £100
SELECT u.id, u.name, u.email
FROM users u
WHERE (SELECT COALESCE(SUM(total), 0) FROM orders WHERE user_id = u.id) > 100;

For each user u, the subquery evaluates SUM(total) FROM orders WHERE user_id = u.id — the user's total spend. Users whose total is > £100 are kept.

Result:

| id | name       | email             |
|----|------------|-------------------|
| 1  | Alice Khan | alice@example.com |
| 2  | Bob Patel  | bob@example.com   |

Correlated subqueries are powerful but slower — running once per outer row is more work than running once. For large tables the same logic is often better expressed as a JOIN with GROUP BY. We'll compare them at the end of the lesson.

EXISTS — "does this subquery return anything?"

EXISTS returns TRUE if the subquery returns any rows, FALSE if not. The subquery's contents don't matter — only whether it's empty or not:

-- Users with at least one completed order
SELECT u.id, u.name
FROM users u
WHERE EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.user_id = u.id
      AND o.status = 'completed'
);

Read EXISTS as "does at least one row exist matching this condition?". Result:

| id | name       |
|----|------------|
| 1  | Alice Khan |
| 2  | Bob Patel  |

SELECT 1 is a convention — EXISTS doesn't care what columns the subquery returns, just whether it returns rows. SELECT *, SELECT 1, SELECT col1 all work; SELECT 1 is the idiomatic choice because it makes the intent obvious.

NOT EXISTS flips the test — find rows where the subquery returns nothing:

-- Users who have NO orders
SELECT u.id, u.name
FROM users u
WHERE NOT EXISTS (
    SELECT 1 FROM orders o WHERE o.user_id = u.id
);
-- Result: Carol Singh

Same answer as the LEFT JOIN + IS NULL pattern from Chapter 3. NOT EXISTS is sometimes clearer; the LEFT JOIN form is sometimes more efficient. Both work.

Subquery vs JOIN — which to use

Many subqueries can be rewritten as JOINs and vice versa. Modern optimizers usually produce the same plan for either form. So the decision is mostly about clarity:

Use a JOIN when the relationship is the point — "every order with its user's name."
Use a subquery when the inner query is conceptually independent — "products above the average price."
Use EXISTS / NOT EXISTS when the question is binary — "does any row match?". This often reads more naturally than a JOIN-with-DISTINCT.

When in doubt, write whichever form makes the question obvious to the next person to read your SQL. Performance tuning rarely starts at this level.

Subqueries are recursive — they can nest

A subquery can contain another subquery:

-- Users who placed an order on the same day as the most recent order in the system
SELECT DISTINCT u.id, u.name
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE DATE(o.created_at) = (
    SELECT DATE(MAX(created_at)) FROM orders
);

Inner-first reading: the innermost subquery finds the latest date; the outer query keeps users who ordered on that date. Two levels of nesting; readable when you take it from the inside out.

A QA query you'll actually run

-- Products that have NEVER been ordered — your "should we delist?" report
SELECT id, name, price
FROM products
WHERE id NOT IN (
    SELECT DISTINCT product_id FROM order_items WHERE product_id IS NOT NULL
);

The WHERE product_id IS NOT NULL inside the subquery defends against the NOT IN gotcha from Chapter 3 — if the subquery returned NULL for any row, NOT IN would silently return zero rows. Belt-and-braces SQL costs nothing and prevents real bugs.

⚠️ Common Mistakes

NOT IN with a subquery that might return NULL. Returns zero rows, silently. Filter NULLs in the subquery (WHERE col IS NOT NULL) or use NOT EXISTS instead.
Forgetting the alias on a derived table. FROM (SELECT ...) without AS some_name errors out on most databases. Always alias your derived tables.
Correlated subqueries on huge tables. Once-per-row execution kills performance on millions of rows. If a query is slow, try rewriting the correlated subquery as a JOIN with GROUP BY.

🎯 Practice Task

30 minutes. Use the e-commerce database.

Write a WHERE-IN subquery: every product that has been ordered at least once.
Write a scalar-comparison subquery: every order whose total is above the average order total.
Write a derived-table query: the average number of products in stock per category. (Inner query: COUNT in-stock products per category; outer query: AVG of those counts.)
Write a correlated subquery: users whose total spending exceeds £100. Compare your answer against a JOIN + GROUP BY + HAVING version — same result, different shape.
Write an EXISTS query: users who have at least one shipped or completed order.
Write a NOT EXISTS query: users with zero orders. Compare to the LEFT JOIN + IS NULL form from Chapter 3 — same Carol, different SQL.
Stretch: find each user's most expensive single order. Two ways to do it. (Hint 1: GROUP BY user_id with MAX(total). Hint 2: a correlated subquery — WHERE total = (SELECT MAX(total) FROM orders WHERE user_id = u.id).) Run both. Compare results and compare the SQL.

Next lesson: window functions — the one feature that lets you compute "per-group" results without losing the original rows. Once you've seen them, you'll wonder how you ever lived without them.