The Shape of a Test Suite
Most test suites are shaped wrong. Not because the individual tests are bad, but because the overall structure creates redundancy, obscures intent, and makes the system harder to change. If your test suite feels like a burden rather than a safety net, the shape is probably the problem.
I've written before about the problems with the test pyramid as a mental model. This article is the constructive follow-up: what should you build instead?
Test Behaviors, Not Structure
Tests exist to verify that the user-facing behaviors of your application work correctly. That's it. Not the internal wiring. Not the helper functions. Not the private methods. The behaviors that matter to the people (or systems) consuming your software.
This is the same principle as coding to an interface rather than an implementation - and it applies to tests just as much as it applies to production code. Your software is built for its users, whether those users are humans clicking buttons or other developers consuming your library. Only the behaviors visible to those users matter from a testing perspective.
When you test internal behaviors, you're coupling your tests to implementation details. Every refactor becomes a test rewrite. The tests stop telling you "does this thing work?" and start telling you "does this thing work the way it used to work internally?" Those are very different questions, and only one of them is useful.
This has a direct corollary: how you organize your code is a separate concern from how you organize your tests. Take a complex component (by "component" I mean whatever unit of organization your ecosystem uses - a function, a class, a module, a package). You could implement it as a single class, split it across multiple classes in one file, or put each class in its own file. You could swap the underlying data structure or algorithm. None of that changes the behaviors the component exposes. Your tests should reflect behavior requirements, not file structure. If reorganizing your code forces you to rewrite your tests, your tests are shaped wrong.
The Overlap Problem
The traditional testing layers - unit tests, integration tests, end-to-end tests - are well-intentioned. The idea is to test at different granularities to catch different classes of bugs. In practice, they tend to create overlapping coverage of the same behaviors.
Consider a concrete example. You're building an e-commerce application. The business rule is: "orders over $100 get free shipping." Here's what the traditional layered approach produces:
// Unit test
test("calculateShipping returns 0 for orders over 100", () => {
expect(calculateShipping(150)).toBe(0);
});
// Integration test
test("POST /orders applies free shipping for orders over 100", async () => {
const res = await request(app).post("/orders").send({ items: [{ price: 150 }] });
expect(res.body.shipping).toBe(0);
});
// E2E test
test("user sees free shipping on checkout for orders over 100", async () => {
await page.addToCart(itemWorth150);
await page.goToCheckout();
expect(await page.getShippingCost()).toBe("$0.00");
});
Three tests. One behavior. When the threshold changes from $100 to $75, you update three tests. Multiply this across every behavior in the application and you start to see the problem.
The waste is real, but the deeper damage is to clarity. The test suite stops being a clean specification of what the application does. It becomes a tangled web of overlapping assertions where the actual required behaviors are buried under layers of redundancy. A good test suite should be a source of truth - executable documentation you can read to understand the system and run to verify it.
Two Approaches to Eliminating Overlap
There are two solid strategies for building a test suite without the redundancy problem.
Approach 1: Exhaustive Component Testing with Minimal Smoke Tests
This approach leans on a systems engineering perspective. Software systems are dynamic systems, and fundamentally, they are state machines. If you've studied any dynamical systems theory, you know the general form:
x(t+1) = f(a, x(t))
The state of the system at any point in time is a function of the previous state and some action. Software components typically have a finite set of actions that can occur and a finite set of interesting states. By "interesting," I mean states that actually affect behavior. A list of length 3 versus length 4? Not interesting - no behavior difference. A list of length 0 versus length 1? Interesting - that distinction drives real behavior changes.
For a given component, you can often exhaustively test all meaningful actions across all interesting states. If you use an event-driven or message-passing architecture, the connection point between components is just the ability to pass messages. Once each component is exhaustively tested in isolation, proving two components work together reduces to verifying that messages can be sent between them. The integration test surface becomes remarkably small.
The shape here is: thorough, exhaustive tests at the component level, plus a thin layer of smoke tests that verify the pieces are wired together. The smoke tests don't re-test business logic - they just confirm the plumbing works.
Component tests — every action across every interesting state:
Smoke tests — just verify the wiring:
Approach 2: A Single Acceptance Suite with Pluggable Layers
The second approach, championed by folks like Aslak Hellesoy, takes the opposite tack. Instead of testing components in isolation, you maintain a single suite of acceptance tests that describe the behaviors of the application from the user's perspective. The key insight is that this single suite can be executed at various application layers by swapping out the test actors.
Here's what that looks like concretely. You define your behaviors once:
Scenario: Free shipping on large orders
Given a cart with items totaling $150
When the user checks out
Then shipping cost is $0
Then you write multiple actor implementations that can execute this same scenario:
// Domain actor - calls business logic directly, fast
class DomainActor {
checkout(cart) { return orderService.checkout(cart); }
}
// API actor - hits the HTTP endpoints
class ApiActor {
checkout(cart) { return fetch("/orders", { body: cart }); }
}
// UI actor - drives the browser, slow but thorough
class UiActor {
checkout(cart) { await page.addToCart(cart); await page.clickCheckout(); }
}
The behavior spec is written once. The actors are interchangeable. Run against the domain actor for fast feedback during development. Run against the UI actor in CI for full confidence. Same behaviors, different execution layers, zero duplication.
When Topology Matters
There's a structural property of your system that should influence your testing strategy: the shape of component usage.
If your component graph is a straight line - a chain of components where there's ultimately one user-facing interface - you can largely get away with testing only at that interface. Conceptually, you could collapse the entire chain into a single component. The internal boundaries are organizational, not behavioral.
But if the graph branches, the calculation changes. Say you have a server API consumed by a CLI, a web app, and a desktop app. That API is now a shared interface with multiple consumers. You want tests specifically against the API, because its behavioral contract matters independently - it needs to work correctly for all its consumers. Each consumer then has its own thin layer of tests for its specific interface concerns.
The principle generalizes: wherever your system's dependency graph has a node with multiple consumers, that node's behavioral contract deserves its own tests. The test suite's shape should mirror the system's usage topology. Shared interfaces get tested. Internal pass-through layers don't.
When Is Overlap Acceptable?
I've been making a strong case against redundancy, so let me be precise about where deliberate overlap can be a reasonable tradeoff.
For critical paths where failure has severe consequences - payment processing, authentication, medical systems - you might intentionally test the same behavior at multiple layers as a defense-in-depth strategy. The cost of maintaining redundant tests is real, but for a payment flow, the cost of a missed regression is worse.
The key distinction is intentional versus accidental overlap. If you've consciously decided "this behavior is critical enough to verify at every layer," that's a defensible engineering decision. If your entire suite has triple coverage because "that's how testing works," that's a structural problem.
What About the Benefits of Traditional Layering?
The traditional approach tries to solve real problems, and those problems don't go away just because you restructure your tests.
Rapid feedback. Running a full end-to-end acceptance suite for a non-trivial application can take a long time. You want fast signals while developing. Both approaches handle this. The component-based approach gives you fast isolated tests by default. The pluggable acceptance approach lets you run the same specs against the domain layer directly, skipping the slow UI and network layers.
System debugging. When a test fails, you want to find the broken code quickly. The component-based approach handles this naturally - a failing component test points you right at the problem. The pluggable acceptance approach can be trickier here, since a failing acceptance test might not immediately tell you which component is at fault. You may need to re-run at a lower layer to narrow things down.
These are genuine benefits, but they don't require overlapping coverage. You can get fast feedback and good debuggability without testing the same behavior three times.
In Summary
A well-shaped test suite tests user-facing behaviors, not implementation details. Each behavior is specified and verified in one place. The structure reflects the usage topology of the system, not the code organization. It provides fast feedback without sacrificing confidence.
The goal isn't to have more tests or fewer tests. It's to have the right tests, organized in a way that makes the system easy to understand and easy to change. A test suite shaped well is one of the most valuable assets a codebase can have. A test suite shaped poorly is one of the most expensive liabilities.