Architecture Weekly #136

Architecture Weekly Issue #136. Articles, books, and playlists on architecture and related topics. Split by sections, highlighted with complexity: 🤟 means hardcore, 👷‍♂️ is technically applicable right away,  🍼 - is an introduction to the topic or an overview. Now in telegram and Substack as well.

Highlights

System Design Interview: Designing Google Docs 👷‍♂️

Designing Google Docs is not a simple task due to the collaborative nature of the product. Alexander Barmin, a Senior Engineer at Wise, shows how to do that in a mock system design interview. Watch it here!

#video #casestudy #mockinterview

How to learn Rust in 2024 👷‍♂️

I used to write C/C++ code in the beginning of my career and honestly this article made me think, what if abandon everything and go again with Rust? Anyway, great read by Vitaly Bragilevsky, who explains the memory model of Rust, learnings paths and toolchains.

How to Learn Rust in 2024: A Complete Beginner’s Guide to Mastering Rust Programming | The RustRover Blog
So, you’re thinking about choosing Rust as your next programming language to learn. You already know what it means to write code and have some experience with at least one programming language, probab

#rust #programminglanguage

Postgres Partition Pains - LockManager Waits 🤟

Postgres is praised for it's performance, flexibility, # of extensions and of course opensource. It is so well rounded it just works... until it does not. This moment you will find yourself banging against the wall like Kyle Hailey did with his case of unfortunate partitioning, caused by the LockManager behavior. A detective story about a database usage, as how we love it.

Postgres Partition Pains - LockManager Waits
1000 Fetch errors a second, 500 sessions waiting on LockManager, 150,000 locksIn above graphs, our production system hit the wall by a blind siding pile up on LockManager waits causing the application to start hitting 1000 errors a second. I managed to mitigate the pileup by detaching 7 partitions for the core table.Let’s blend the mundanity of database administration with a dash of fictional narrative for palatability. Consider it akin to lacing my son’s morning ‘green’ drink, brimming with deh

#postgres #db #casestudy

Follow-Up

Kubernetes Configuration in 2024 👷‍♂️

Great overview of popular(read: adopted) open-source tools for working with Kubernetes. Starting with obvious Helm charts and Kustomize tool, followed up by Kompose, Kapitan and Terraform Kubernetes provider.

Kubernetes Configuration in 2024
What are the most popular Kubernetes configuration tools now, what has changed since 2017, and what friction do users encounter with Helm?

#kubernetes

What is BGP? 👷‍♂️

Last week I got curios, what exactly happened during Facebook outage on October 4, 2021. To remind you, the servers got offline and even the staff was locked out of their offices(1-minute version of this story). Apparanetly, BGP protocol was involved, so find the explanation what it is and why it is important in CloudFlare glossary.

#foundation #network

System Design Course Cohort #3 is open!

Typical system design courses teach technical skills but often overlook the connection to business problems. This course fills that gap, emphasizing the importance of recognizing and addressing business priorities with technical approaches. Learn to go beyond load balancing options and performance tactics by focusing on solving real business challenges. Course is completed by 30+ engineers with great feedback!

SIGN UP HERE

The problem with Generic Observability Tools  🍼

Generic Observability tools suffer from the same 4 problems: offering too much of functionality, Not offering enough at the same time; becoming a cost center(remember Coinbase paid $65 mln to Datadog in a single year), and not providing the insights for proper actions. PerfectScale got an article on those problems, explaining how we ended up in this situation and what to do about it.

The Problem with Generic Observability Tools: Why We’re Paying Too Much | PerfectScale
Stop paying the high price for generic observability tools - take control of your infrastructure costs today!

#observability

Scaling: The State of Play in AI  🍼

Large Language Models, but how large they are? How much money it is required to train a GPT-3 like model? Ethan Mollick explains the size in LLMs, the generations of models, the cost required for training and how larger general models always beat smaller specialized ones.

Scaling: The State of Play in AI
A brief intergenerational pause…

#ai

Write Buffering to Reduce Raft Consensus Latency in YugabyteDB 🤟

YugabyteDB claims to be highly fault-tolerant. They achieve this through the synchronous batched writes to the Raft quorum via buffered writes. In this post Franck Pachot explains how this concept works and how it improves resiliency and performance.

Write Buffering to Reduce Raft Consensus Latency in YugabyteDB
YugabyteDB is an open-source distributed SQL database that is compatible with PostgreSQL. It…

#db #performance #raft

Queues and topics  🍼

People typically understand well how to draw the containers in C4 model. When it comes dealing with queue though the intuition tells us to get a rectangle for the whole Kafka deployment and get all the arrows to it. That's not the optimal way! Simon Brown explains how to create better diagrams in such case.

Queues and topics
The C4 model for visualising software architecture

#documentation #c4model

WARNING 🇺🇦

The brutal and unjustified war against Ukraine continues already 2 years. If you want to help Ukraine directly visit this fund.

Big thanks to Nikita, Constantin, Anatoly, Oleksandr, Dima, Pavel B, Pavel, Robert, Roman, Iyri, Andrey, Lidia, Vladimir, August, Roman, Egor, Roman, Evgeniy, Nadia, Daria, Dzmitry, Mikhail, Nikita, Dmytro, Denis and Mikhail for supporting the newsletter. They receive early access to the articles, videos, influence the content and participate in the closed group where we discuss the architecture problems. Join them at Patreon or Boosty!