Architecture Weekly Issue #151. Articles, books, and playlists on architecture and related topics. Split by sections, highlighted with complexity: 🤟 means hardcore, 👷‍♂️ is technically applicable right away,  đźŤĽ - is an introduction to the topic or an overview. Now in telegram and Substack as well.

System Design Course

Looking to advance your system design skills further? I've got a Business Oriented System Design Course to help you! The Cohort #3 is running now, so you can sign up for the next one starting end of January. Follow this page: https://vvsevolodovich.dev/business-oriented-system-design-course/

Highlights

Books to read in 2025 as Software Architect

Books is the best source of knowledge. New Year means a new plan for reading: I picked up the 12 books I personally decided to read in 2025. See if you find the list useful!

Books to read in 2025 as Software Architect
Books are an essential driver for one’s growth. Never stop reading! If you’re in software architecture and distributed systems, here’s 12 book I would like to read myself. Go through it and tell me, if you want to add or replace anything! 1. Distributed Systems (3rd Edition, 2017) Authors: Maarten

#reading

Snapshot Isolation vs Serializability

Isolation level in databases are a crucial element avoid troubles like reading uncommited data in databases. But do you understand the exact difference between Snapshot Isolation and Serializability levels? A magnificent explanation from Marc Brooker.

Snapshot Isolation vs Serializability - Marc’s Blog

#db #performance

Visualizing SQL Plan Execution Time

SQL query plan execution is actually a normal program execution as well. It means that you can apply flamegraphs to sql plans too, which is especially useful with long plans. Tanel Poder an original piece in 2018, but also provided a follow-up article which is accessible by the link in the post update.

Visualizing SQL Plan Execution Time With FlameGraphs - Tanel Poder Consulting
Update: I wrote a follow-up article about adding Loop Counters and Row Counts to SQL Plan FlameGraphs. Check it out after reading this one first. Introduction Brendan Gregg invented and popularized a way to profile & visualize program response time by sampling stack traces and using his FlameGraph concept & tools. This technique is a great way for visualizing metrics in nested hierarchies, what stack-based program execution uses under the hood for invoking and tracking function calls. - Linux, Oracle, SQL performance tuning and troubleshooting - consulting & training.

#db #performance

Follow-Up

Alerts with Clickhouse for a Startup

I am a fan of frugal approch. Clickhouse is frequently used for timeseries data, which all observability is, despite the difficulties with getting proper tooling for alerting. Well, this article will give you the self-made tools for an MVP alerting with Clickhouse.

Building Alerts w/ Clickhouse. The MVP Approach.
Follow my journey as I design and build near real-time, pull-based alerts using Clickhouse. The anti-over engineered approach.

#observability

How to actually migrate complex systems in infrastructure

There is unlimited number of ideas how to migrate complex systems: like going from a monolithic application to microservies in Kubernetes or going from Python 2 to Python 3. Kyle argues that they are all terrible; however there is a sane way to do it. Read it here.

Kyle Cascade - How to Actually Migrate Complex Systems in Infrastructure

#migration

Cognitive load is what matters

Have you ever wondered why Separation of Concern Principle is so profound? The answer is rooted deep insight the human biology. Turns out that we as humans can not hold more than 5-7 items in our working memory at the same time. Complexity of the software is thousands times harder and the only way we still can create it is by abstraction and concern separation. Read how you can reduce the cognitive load in the software by examples.

Cognitive load is what matters
There are so many buzzwords and best practices out there, but let’s focus on something more fundamental. What matters is the amount of confusion developers feel when going through the code.

#systemdesign

Revisiting Compute Scaling

Yelp used Clusterman for scaling the Kubernetes pods up and down, but discovered it was not flexible enough, so they decided to migrate to Karpenter instead. Find out how they performed the migration!

Revisiting Compute Scaling
Revisiting Compute Scaling Ilkin Mammadzada and Ankit Tripathi, Site Reliability Engineers Dec 13, 2024 As mentioned in our earlier blog post Fine-tuning AWS ASGs with Attribute Based Instance Selection, we…

#kubernetes #scalability

Big thanks to Nikita, Constantin, Anatoly, Oleksandr, Dima, Pavel B, Pavel, Robert, Roman, Iyri, Andrey, Lidia, Vladimir, August, Roman, Egor, Roman, Evgeniy, Nadia, Daria, Dzmitry, Mikhail, Nikita, Dmytro, Denis and Mikhail for supporting the newsletter on Patreon! If you like the newsletter, feel free to support it there - with one-time support for example!