The test pyramid is a vibe, not a rule

qa.codes · 12 December 2025 · 8 min read

Intermediate

patternstest-pyramidopinionculture

The Cohn test pyramid has been gospel since 2009. It was a useful heuristic for a 2009 monolith Java app. It's been quoted unchanged ever since — and most modern stacks don't fit its shape. Here's when the pyramid still applies, when it inverts, and when it doesn't apply at all.

The pyramid in its original context

Mike Cohn introduced the test pyramid in Succeeding with Agile (2009). The problem he was solving was specific: teams with enormous, slow Selenium suites and very few unit tests. Tests that took 8 hours to run in CI. Test suites so expensive to run that they became optional, then ignored. The pyramid was corrective advice: move coverage down to unit tests (fast, cheap, numerous) and treat end-to-end tests (slow, expensive, brittle) as a thin layer at the top.

That was the right advice for that dysfunction. Selenium in 2009 was genuinely slow, genuinely brittle, and genuinely expensive to write. Unit tests were fast and gave direct feedback. The pyramid was a corrective for teams that had the shape exactly backwards.

The shape made sense: many unit tests form the wide base, fewer integration tests sit in the middle, a small number of end-to-end tests sit at the top. If you had a Java monolith with a Spring MVC backend, a traditional relational database, and a Selenium suite running through a browser — yes, this shape was right.

What teams treat it as now

Sixteen years later, the pyramid appears in every QA conference talk, every ISTQB study guide, every "intro to testing" blog post. It's presented as a universal law of test architecture, applicable to any system, any stack, any scale.

Teams get asked in interviews: "What does the test pyramid say?" They answer from memory. Engineering managers design test strategies around it without asking whether their system fits the assumptions it was built for. The pyramid became received wisdom so thoroughly that questioning it reads as iconoclasm rather than engineering judgement.

The assumption embedded in the pyramid: unit tests are fast, integration tests are slower, and end-to-end tests are slowest. The further recommendation: the slower a test is, the fewer you should have. These assumptions were correct for 2009 Selenium against a Java monolith. They don't universally hold in 2026.

When the pyramid inverts

Microservices architectures. In a system with 20 independent services, the "unit" of meaningful behaviour is an interaction between services, not a function inside one service. A unit test that verifies a function inside service A doesn't tell you whether services A and B can exchange data. Contract tests — which sit at what the pyramid calls the "integration" layer — are often the most valuable and most numerous tests in a microservices architecture. The unit layer is still useful, but the integration layer is where the real risks live. The shape flattens or sometimes genuinely inverts.

Frontend-heavy SPAs. A React application where all the business logic lives in components is poorly served by a pyramid that emphasises unit tests. React component tests — rendering a component in isolation, simulating user interaction, asserting on output — are fast (they run in jsdom without a browser), provide high coverage, and are closer to the "integration" layer than the "unit" layer of a traditional pyramid. Teams that test React apps heavily at the component level are getting the right shape for their stack, even if it doesn't match the pyramid's proportions.

GraphQL backends. As covered in the REST vs GraphQL comparison, GraphQL's single-endpoint design means there's much less to integration-test at the HTTP layer. The resolver layer is where the logic lives, and resolver unit tests are fast and granular. The pyramid flattens because the integration surface is smaller, not because the team is cutting corners.

When the pyramid doesn't apply at all

Data pipelines and ETL systems. A data pipeline that ingests records, transforms them, and writes them somewhere else doesn't have "unit" and "end-to-end" in the same sense as a request-response system. A unit test verifies that a transformation function produces the right output for a given input. But the system's actual failure modes — data quality issues in the source, schema drift, volume-triggered performance degradation, race conditions in parallel processing — are only visible when you run the pipeline with real data. The test shape for a data pipeline is almost entirely integration tests against real or near-real data.

Machine learning systems. An ML model doesn't have "logic" to unit-test in the traditional sense. The model produces outputs from inputs through a learned mapping. Testing an ML system means evaluating output quality on held-out datasets, monitoring for distribution shift in production, and testing the data preprocessing pipeline for regressions. None of this maps to the pyramid.

Embedded firmware and hardware interaction. When your "integration test" requires a physical device on a test bench, the economics of the pyramid are irrelevant. You test at the level where testing is possible, not at the level the pyramid recommends.

The healthier framing

The insight behind the pyramid is still valid: fast tests are better than slow tests, cheap tests are better than expensive tests, and tests that fail for a single clear reason are better than tests that fail for a dozen possible reasons. These principles hold across systems.

The pyramid's specific proportional advice — many unit, fewer integration, few end-to-end — is a corollary of those principles applied to one class of system. It's not the principle itself.

The framing that holds up in 2026: build the tests that verify the things most likely to break in your system, at the speed your team will tolerate running them, in the shape that maps to how your system actually fails. The resulting distribution is the right test pyramid for your system.

That might be a pyramid. It might be a diamond (few unit, many integration, few E2E). It might be a flat layer (mostly component tests for a React app). It might not be a shape at all — it might just be "integration tests and a staging environment." Optimise for confidence and speed in your specific system, not for conformance to a sixteen-year-old heuristic.

The pyramid is a vibe: test at lower layers when you can, test fast when you can, don't make end-to-end tests do work unit tests can do. That vibe is right. The specific shape is a suggestion, not a law. And dogmatic adherence to testing patterns generally is worth questioning — the pyramid is not uniquely immune to over-application.

// related

Opinions·24 April 2026 · 6 min read

You probably don't need a Page Object Model

POM was a Selenium-era solution to a Selenium-era problem. In modern Cypress and Playwright, custom commands and locator helpers cover 90% of what POM was supposed to give you.

patternspage-object-modelcypress

Deep dives·11 November 2025 · 10 min read

Contract testing, explained without the Pact marketing

Contract testing is two things wearing one name: a model and a tool. The model is genuinely useful; the marketing for the tool oversells where it fits. Here's the model, separated from any vendor's pitch.

contract-testingpactapi-testingmicroservices