There's a specific kind of review failure that doesn't get talked about enough: the bug that the reviewer would have caught if they'd seen it in isolation, but didn't catch because they saw it at position 340 in a 600-line diff after a full day of meetings. The reviewer is capable. The bug is not subtle. The cognitive conditions weren't right.
Null pointer dereferences and off-by-one errors are the canonical examples of this failure mode. They are individually simple — the kind of mistake a developer would catch immediately if the function were the only thing in front of them. But they reliably survive review because they require pattern recognition at a moment when the reviewer's pattern-matching capacity has been depleted. Understanding why these specific bugs slip through is the first step to catching them more consistently.
The anatomy of a null dereference that survives review
Most null dereferences that ship to production follow a predictable structure. The caller retrieves a value from a function that can return null or undefined under some condition — a database lookup that returns no result, an API call that returns an empty response, a cache miss. The value is used downstream without a null check, or with a null check that covers the obvious case but misses a second path.
Here's a simplified TypeScript example of the kind of thing that slips through:
async function getOrderSummary(orderId: string) {
const order = await db.orders.findOne({ id: orderId });
const user = await db.users.findOne({ id: order.userId });
return {
orderId: order.id,
userEmail: user.email,
total: order.total,
};
}
At a glance, this looks fine. The reviewer has seen this pattern hundreds of times. What they need to catch is that both findOne calls can return null — and in this case, a lookup for a deleted user will throw before it returns a useful error. The check on order is missing entirely; the check on user is also missing. In a review where this function is surrounded by 200 lines of equally unremarkable changes, the reviewer's eye slides past it.
The deeper issue is that modern TypeScript configurations with strict null checks enabled would surface this at compile time as a type error — order could be null | OrderRow, and accessing order.userId on a potentially null value would be a compiler error. But many codebases have accumulated strict-null-checks exceptions, or have functions that return types broader than their actual contracts, or have TypeScript configured without the strictness settings enabled. The type system that should catch this has been quietly disabled.
Off-by-one: why fencepost errors persist
Off-by-one errors have a particular quality that makes them resistant to review: they are invisible at the typical reading speed of a code review. A reviewer reading at normal pace processes the logic of a loop — "iterate from 0 to length, do something to each element" — and validates the general structure. The specific boundary condition requires slowing down and mentally executing the loop at its extremes, which most reviewers only do when something else flags the code as suspicious.
Consider a pagination implementation with a subtle boundary issue:
function getPage(items: Item[], page: number, pageSize: number): Item[] {
const start = page * pageSize;
const end = start + pageSize;
return items.slice(start, end);
}
This looks correct and for most inputs it is. But consider what happens when page is 0 and pageSize is 0: you get items.slice(0, 0), which returns an empty array silently rather than an error. More problematically, if the caller is computing the total page count as Math.floor(items.length / pageSize), a pageSize of 0 produces a division-by-zero that returns Infinity in JavaScript rather than an exception — and pagination logic built on an infinite page count will behave strangely in ways that are hard to diagnose.
The fencepost logic is exactly the kind of thing a reviewer catches when they're fresh and deliberate, and misses when they're tired and moving fast through a large diff.
The cognitive load of large diffs
Research on code review effectiveness consistently shows that review quality declines as diff size increases. This isn't a controversial finding — it follows directly from what we know about working memory and attention. A reviewer holding the semantics of 50 lines in working memory can reason carefully about each change. A reviewer processing 800 lines is pattern-matching at a higher level of abstraction, looking for the shape of changes rather than their specific behaviors.
The practical implication is that the bugs most likely to survive large-diff review are precisely the bugs that look like the correct pattern but have a subtle deviation. A null check that exists but is on the wrong variable. A loop that runs i <= n instead of i < n. An array index that should be zero-indexed but was written as one-indexed. These all look like normal, correct code at reading speed. They only reveal themselves under deliberate, slow, line-by-line inspection that is rarely applied uniformly across a large diff.
This is why encouraging smaller PRs helps with more than just review latency. A 200-line PR gets the kind of careful reading that a 900-line PR doesn't, regardless of reviewer skill or intent.
What makes automated detection hard here
Null check detection is something static analysis tools do reasonably well in strongly-typed languages with strict null checking enabled. TypeScript's strictNullChecks mode catches many dereferences statically. Kotlin's null safety, Rust's Option type, Swift's optionals — these are language-level solutions that eliminate the entire class of null dereference errors when used correctly. Teams that want to eliminate null dereferences at scale should be investing in language-level null safety, not relying on review to catch them case by case.
Off-by-one errors are harder. They're semantic, not syntactic. A static analyzer can tell you that a loop bound is computed dynamically, but it cannot tell you whether the computed bound is correct for the intended behavior without knowing what that behavior is. This is a case where review has no automation shortcut — it requires a human who understands the intended semantics to check the boundary conditions manually.
We're not saying that automated tools can't help with these bug classes — they clearly can, in the right language and with the right tooling configuration. The point is that the tooling solution requires investment before the PR is written (choosing a language with null safety, configuring strict compilation), not during review. Review is a poor compensating control for missing type safety infrastructure.
Practical patterns that improve catch rates
A few practices make a measurable difference in how consistently these bugs are caught. The first is PR size limits as a team norm rather than a hard rule — aiming for diffs under 300-400 lines of substantive logic change (not including auto-generated code, lock files, or large mechanical renames). This is not always achievable, but it's achievable more often than teams think, and the correlation between review size and review quality is real.
The second is focusing explicit reviewer attention on high-risk patterns. If you're reviewing a function that interfaces with a database, explicitly ask: what does this do on a null return? If you're reviewing a loop, explicitly trace the boundary: what happens when the input is empty? What happens when it's exactly at the maximum expected size? This slows down review but directs the slowdown to where it matters.
The third is treating compilation warnings and linter output as mandatory, not advisory. A codebase where strictNullChecks is disabled, or where linter warnings are routinely suppressed, has removed the automated guard that could catch most of these issues before a human reviewer ever sees them. Getting the tooling configuration right is the highest-impact intervention — it prevents the whole class of problems rather than making the review process slightly better at catching them after the fact.
The bugs that slip through review aren't a mystery. They're predictable, they follow patterns, and the conditions under which they survive are well understood. Which means they're preventable — with the right combination of language choice, tooling configuration, and review process design.