Architecture Weekly Issue #177 . Articles, books, and playlists on architecture and related topics. Split by sections, highlighted with complexity: 🤟 means hardcore, 👷‍♂️ is technically applicable right away,  đźŤĽ - is an introduction to the topic or an overview. Now in telegram and Substack as well.

Highlights

Trade-off Analysis, Microservices Myths and the Origin of "The Hard Parts" 👷‍♂️

Last week my interview with Neal Ford went live. We discussed where the book came from, what is the hardest part in architecture and busted a couple of microservices myths. Grab yourself a hot drink and enjoy the conversation!

#interview #video

Shrink It to 14 kB — the TCP sweet-spot 🤟

The article argues that due to the nature of TCP slow start which is limited by 14 KB, there is a huge reason to have your webpage under this limit, as adding a single KB above that hits you with almost a half second. Find the technicalities inside.

Why your website should be under 14kB in size | endtimes.dev

#performance

A Crowd-Sourced Reliability Glossary 🤟

Antithesis have opened an A-to-Z of distributed-systems reliability terms—concise definitions of “Byzantine fault,” “gray failure,” “coordinated omission,” “write amplification,” and dozens more. Each entry cites a canonical paper and tags the relevant failure mode so post-mortem debates can skip the semantics and jump straight to root cause. Even better, the repo accepts pull-requests, letting ops teams contribute new jargon as it appears.

A distributed systems reliability glossary
A list of key concepts for building and testing reliable distributed systems, with basic definitions and deep references.

#distributedsystems

Wanna know how to Design Systems that deliver business value?

Business Oriented System Design Course Cohort #6 is officially open!

Looking for a way to advance your career? Felt you overgrew the mere feature development, but lack skills to design complete systems? Want to make the business impact? 10 hours of content packed lectures, engaging practice and the final work you will be proud to showcase as well as Credly(by Pearson)-based digital certificate proving your experience. More than 70 engineers already passed the course with amazing feedback and advanced their careers. New cohort starts on 23rd of July. Find the Details, Feedbacks and Enrollment into the course here. Only a single place left!

Follow-Up

LISTEN/NOTIFY: Postgres’ Surprise Bottleneck 👷‍♂️

Recall.ai learned the hard way that every NOTIFY grabs a global commit lock, stalling thousands of otherwise unrelated writers. Under 1 k messages / s their meeting-recording pipeline saw P95 latency explode from 3 ms to 1.8 s. A switch to Redis Streams and a Rust side-car restored headroom to 35 k msg/s and single-digit-ms writes. Follow the technical investigation inside.

Postgres LISTEN/NOTIFY does not scale
Postgres LISTEN/NOTIFY can cause severe performance issues under high write concurrency due to a global lock during commit. Learn why it doesn’t scale and how to avoid outages.

#db #performance #postgres

Notion’s 200-Billion-Note Machine, Explained in 2 Minutes 👷‍♂️

Every paragraph, checkbox, or emoji in Notion is a “block” row in PostgreSQL. When that single DB hit 20 billion blocks, latency spiked and index bloat set in. Sharding by workspace_id, Notion got 480 logical shards spread across 96 Postgres instances, all routed with a simple hash(workspace_id) % 480. Find the key lessons about choosing boring tech stack and attitude towards observability.

How Notion Handles 200+ BILLION Notes (Without Crashing)
How Notion uses horizontal scalability and sharding to handle 200+ billion notes without crashing.

#performance #observability

Inside the 62-Minute 1.1.1.1 Outage 👷‍♂️

There was an incident with Cloudflare DNS last week, but nobody noticed :) The reason for it was A dormant topology rule accidentally grouped Cloudflare’s public DNS prefixes with an internal service. Fixes include dual control on topology edits, synthetic DNS canaries, and diffs that forbid overlap between public and DLS address sets. Textbook example of “latent coupling + manual push”—add it to your outage-drill deck.  

Cloudflare 1.1.1.1 Incident on July 14, 2025
On July 14th, 2025, Cloudflare made a change to our service topologies that caused an outage for 1.1.1.1 on the edge, resulting in downtime for 62 minutes for customers using the 1.1.1.1 public DNS Resolver as well as intermittent degradation of service for Gateway DNS. We’re deeply sorry for this outage. This outage was the result of an internal configuration error and not the result of an attack or a BGP hijack. In this blog post, we’re going to talk about what the failure was, why it occurred, and what we’re doing to make sure this doesn’t happen again.

#casestudy #pm

The Myth of the “Silent” Lambda Crash 👷‍♂️

A viral gist claimed NodeJS Lambdas die quietly during outbound HTTPS in a VPC. AJ Stuyvenberg though shows that it is rather a feature, not a bug - Lambda was never meant to work as EC2, and the termination happens by design according to scale to zero policy. If you're not happy with this behavior, you have 3 options listed in the article.

Does AWS Lambda have a silent crash in the runtime?
Understanding what’s happening in the “AWS Lambda Silent Crash” blog post, what went wrong, and how to fix it

#lambda #serverless

The Four Building Blocks of an Agentic App 👷‍♂️

Ever wondered how products like Lovable or Bolt.new work? Beam shares the 4 critical components of such apps: Model Client, Model, Sandboxed Exec and Frontend. Follow the article for details.

The Cloud for AI Products

#ai #architecture

Time, Angle & Depth — A 3-D Lens on Coupling 👷‍♂️

Another piece on data modelling and context segregation. Discover how how looking at the models from different angles force you creating different classes and apply business logic checks. Starts with a very realistic story :)

Time, angle and depth: dimensions in software design
Can we use physical qualities while reasoning about systems?

#ddd

Big thanks to Nikita, Constantin, Anatoly, Oleksandr, Dima, Pavel B, Pavel, Robert, Roman, Iyri, Andrey, Lidia, Vladimir, August, Roman, Egor, Roman, Evgeniy, Nadia, Daria, Dzmitry, Mikhail, Nikita, Dmytro, Denis and Mikhail for supporting the newsletter on Patreon!