Architecture Weekly #166

Architecture Weekly Issue #166. Articles, books, and playlists on architecture and related topics. Split by sections, highlighted with complexity: 🤟 means hardcore, 👷‍♂️ is technically applicable right away, 🍼 - is an introduction to the topic or an overview. Now in telegram and Substack as well.

System Design Course Cohort #5 is closed, but you can apply for Cohort #6 waiting list here.

Highlights

Understanding Distributed Consensus with Paxos 🤟

Paxos is a consensus algorithm. It may apparent as highly complex, but in the core it's just two methods - get and set - implemented for a group of 3 nodes. This awesome post will guide you step by step of the whole problematic of getting and setting a value in distributed system given timeouts and partitioning. Absolute must read.

Distributed consensus

#distributedsystems

Choose Boring Technology 🍼

Technology is indeed fun, even if it's not art anymore. However the technology should primarily solve business problems. Apparently, if you pick up a narrow circle of technologies to do, you experience less risk and spend more time building the product rather than fighting the tech in the first place. So, choose boring technology!

#philosophy

Understanding transaction visibility in PostgreSQL clusters with read replicas 🤟

See why your read replica might disagree with the primary! 🔍 AWS’s latest post breaks down the “Long Fork” quirk in PostgreSQL clusters—where replicas can show commits in a different order than the primary—and why it doesn’t risk data loss. You’ll get a plain‑English tour of snapshot vs. WAL timing, learn how future Commit Sequence Numbers aim to fix it, and pick up quick tips to keep your apps safe until the patch lands.

#db

Follow-Up

How to avoid Single Point of Failure? 🍼

If a component goes down and the system stops functioning, that components becomes a single point of failure. Having such points is big risks for availability, and system architects should avoid it. This post is a good starting point to understand what SPoFs are all about.

#reliability

Postgres as a Graph Database 👷‍♂️

Turn your everyday Postgres into a mini graph powerhouse! Supabase’s new post shows how the pgRouting extension lets you run classic graph tricks—shortest paths, critical‑path scheduling, even smart server‑to‑server routing—without leaving SQL. It’s a quick, code‑packed tour that proves you don’t need Neo4j to think in nodes and edges.

#postgresql #db

Monarch: Google’s Planet-Scale In-Memory Time Series Database 👷‍♂️

Borgmon was the initial system at Google responsible for monitoring the behavior of internal applications and infrastructure. Each team has to deploy and maintain their own instance of Borgmon, thus requiring specialized knowledged about the tool. In 2010 Google moved from Borgmon to Monarch: in-memory time series database now handling all the internal monitoring across the globe. Read the paper to understand it's distributed architecture and scale.

#observability #distributedsystems

How Cursor Works 👷‍♂️

AI IDEs truly blew up. And it's even more interesting how they work underhood - do they just really send a piece of code and a prompt to an LLM? This post will show how it happens.

#ai

How Forethought saves over 66% in costs for generative AI models 👷‍♂️

Forethought’s engineering team shows how moving their fleet of fine‑tuned, customer‑specific generative‑AI models from EKS to Amazon SageMaker multi‑model endpoints chopped hosting costs by 66 %, while SageMaker’s smart model loading keeps latency sub‑second. The article walks through the old vs. new stacks, shares real $/hour numbers, and offers tips for anyone wrangling lots of small LLMs on shared GPUs.

#ai #sagemaker

Big thanks to Nikita, Constantin, Anatoly, Oleksandr, Dima, Pavel B, Pavel, Robert, Roman, Iyri, Andrey, Lidia, Vladimir, August, Roman, Egor, Roman, Evgeniy, Nadia, Daria, Dzmitry, Mikhail, Nikita, Dmytro, Denis and Mikhail for supporting the newsletter on Patreon!