The Architecture Review You Actually Need (But Never Get)
Imagine running a post-mortem on a system that hasn’t failed yet. That’s what a good architecture review actually is. What most organisations run instead is a formatting check with a sign-off box at the bottom.
The Architecture Review You Actually Need (But Never Get)
You’ve been in this room. The diagram is on the screen, clean and well-labelled. Someone walks through the components. A few questions about the message broker, the caching layer, a service boundary. Notes get taken. The review wraps. Design approved, a handful of “items to monitor.”
Six months later, something fails in a way nobody should have been surprised by. The failure mode was right there in the design. Nobody went looking.
That’s not a skills problem. It’s a structural one. The review wasn’t set up to find problems. It was set up to document that a process was followed. Those are not the same thing, and treating them as if they are is where most of the rot starts.
The Review That Arrives Already Dead
The most common reason architecture reviews don’t work has nothing to do with the questions asked or the expertise in the room. It’s timing.
By the time a design reaches a formal review, the decisions are already locked in. The team has spent weeks, sometimes months, on this. Budget is committed. The approach has been walked up the chain. The engineers presenting it built it together and they believe in it. That’s the moment most organisations choose to conduct their scrutiny.
The psychology here is brutal and predictable. Nobody in that room wants to hear that a foundational choice is wrong. Not the team, who’d have to redo months of work. Not the sponsor, who’s already told stakeholders this is on track. Not the reviewer, who knows that saying “this is broken at the foundation” blows up the timeline and makes enemies. So concerns get softened. Questions get framed diplomatically. The review produces feedback rather than findings, and feedback on a finished design gets absorbed as implementation detail, not structural challenge.
A framing I keep coming back to: if a review can’t lead to structural change, it’s not a steering mechanism. It’s a record of decisions already made. Most reviews are records. They have the shape of scrutiny without the function of it.
The fix sounds obvious and isn’t. Reviews need to happen while the design is still actually open, when a hard finding can redirect the work rather than just delay it. That means the organisation has to accept that architects will raise concerns before the design is polished, which means tolerating uncertainty in a process that’s built around appearing certain. Most organisations won’t do that. So they review late, get comfortable feedback, and call it governance.
What Gets Checked, and What Doesn’t
Set the timing problem aside for a moment and assume the review is happening early enough to matter. The next question is what it’s actually checking.
Most reviews are coherence checks. Is the diagram legible? Are approved patterns in use? Have the standard non-functional requirements been mentioned? Is there a security section? None of that is useless. Incoherent designs fail in their own ways. But coherence and survivability are not the same thing, and almost every review I’ve seen stops at the former without noticing it’s done so.
The assumptions that go unchallenged tend to fall into a few consistent categories. Naming them is useful because they’re not random gaps. They’re the same gaps, in organisation after organisation, year after year.
Load and scale projections almost never get interrogated at the source. The numbers are in the document, and nobody asks where they came from. In practice, they’re often inherited from a legacy system, copied from a competitor’s architecture post, or generated by someone who rounded up from a gut feeling to make them look precise. The gap between assumed load and actual load at the 99th percentile is where a lot of production incidents live.
External dependencies get treated as solved problems. “We depend on Service X” appears in the diagram and then disappears from the conversation. What nobody asks: what does degraded look like for Service X, not just down? What’s the latency contract under realistic load? What’s the retry and backpressure story when Service X starts returning errors 15% of the time? Reviewers rarely challenge this because the team usually has a contractual SLA to point at. SLAs feel like answers. They’re actually just promises.
The operational story is almost always absent. The design explains how the system works. It rarely explains how it gets run at 2am by someone who didn’t build it, six months after the original team has been reorganised. Are runbooks written to the level where an unfamiliar on-call engineer can diagnose a degraded state and act on it? Does the monitoring surface the failure modes the design is actually vulnerable to, or the ones that were easy to instrument? If you can’t tell the operational story, the design isn’t finished. That’s not a post-launch concern. It’s an architectural decision.
Team capability is where review rooms get the most politely silent. The design might be technically sound and operationally catastrophic for the people who’ll actually run it. A service mesh requires engineers who understand service meshes at 3am under pressure. An event-driven architecture requires on-call people who can reason about ordering guarantees and idempotency when they’re half-awake. Reviews that skip the question of whether the team’s real operational maturity matches what the design demands are effectively approving a system the organisation can’t execute.
And then there’s reversibility. High-inertia decisions (database engine, core service boundaries, messaging topology) get made with the same apparent confidence as low-inertia ones like configuration choices or API surface design. A review should be disproportionately hard on decisions that can’t be undone. It rarely is, because reviewers treat all decisions with roughly equal scrutiny, and teams present all decisions with roughly equal confidence.
The Half the Review Never Sees
There’s one specific thing almost no review interrogates, even good ones. Conway’s Law.
The technical design gets examined. The team structure that will execute and maintain it barely comes up. Conway’s observation, made in 1967 and still routinely ignored, is that organisations produce systems shaped by their communication structures, not their architecture diagrams. Your diagram shows clean ownership and sensible boundaries. Your actual team topology might have three teams sharing responsibility for the same data store, or one team owning so many components they’ve lost meaningful operational context across half of them.
Ruth Malan’s version of the principle is the one worth keeping: “If the architecture of the system and the architecture of the organization are at odds, the architecture of the organization wins.” The org structure is not an implementation detail. It’s a constraint. It determines what the technical design can actually become over time, regardless of what got approved in the review room.
A review that doesn’t ask who owns each component end-to-end, what steady-state coordination this architecture requires between teams, and what happens to operational knowledge when the team that built it gets reorganised, is reviewing half the picture. The half it’s skipping is often the half that determines whether the system survives contact with the organisation that built it.
The Questions That Actually Matter
None of this is original. These are the questions that experienced engineers ask, the ones that separate a conversation about survivability from a conversation about whether the diagram makes sense. They apply whether you’re the reviewer or the team presenting.
On assumptions: where did the load numbers actually come from, and has anyone tested them against anything real? What is the design assuming about how its dependencies behave under load, as opposed to how they behave in the documentation?
On failure modes: what’s the blast radius of the most likely failure? More interesting is partial degradation: the service is up, returning responses, but bad data or elevated timeouts on 20% of requests. That scenario is harder to detect and usually more damaging than a clean outage. And: how do you know you’re in it? Does your monitoring tell you, or does a customer?
On operational readiness: who’s on-call for this, and from when? What does a rollback actually take, and has it been rehearsed or just assumed to work because the runbook exists? What’s the day-two story, when the people who know the system have moved on to the next thing?
On reversibility: which choices in this design are genuinely hard to undo? Has the database selection been scrutinised the same way a config value has, or did it go through because it was familiar?
These aren’t gotcha questions. They’re what the team should be asking itself before it walks into the room. The most useful preparation I’ve seen is a pre-mortem. Assume the system failed badly six months after launch and work backwards from that. What went wrong? It’s a more honest exercise than a formal review because nobody’s defending anything. You’re just investigating a failure that hasn’t happened yet.
What Has to Change
The format of the review signals what you actually think it’s for. A review structured as a compliance gate produces compliance behaviour. Teams learn what gets flagged, optimise for it, and show up prepared for exactly that. Problems that don’t fit the checklist don’t surface, because the checklist didn’t invite them and no team is going to volunteer its own doubts.
A review structured around survivability produces different preparation. Different honesty, sometimes. But only if the conditions are there: it has to happen early enough to actually change things, the reviewer needs a genuine mandate to say hard things, and those hard things need to carry weight when they’re said. That last part is where most attempts fall apart. Reviews with no authority to block anything teach teams that the review doesn’t matter.
Neither the reviewer nor the team is usually the real problem. The process is. Most review processes were designed to generate documentation, and they do that reliably. If what you want is genuine scrutiny, you need to build a process around that goal, not bolt better questions onto an existing compliance ceremony.
So: if your last review couldn’t have blocked anything, what was it actually for?
If this resonated, I’d be curious what your experience has been on either side of the table. The failure modes tend to be consistent across organisations, but the specifics are always interesting. Drop a comment or reach out directly.