Architecture Weekly Issue #185. Articles, books, and playlists on architecture and related topics. Split by sections, highlighted with complexity: 🤟 means hardcore, 👷‍♂️ is technically applicable right away, 🍼 - is an introduction to the topic or an overview. Now in telegram and Substack as well.

Highlights

What caused the large AWS outage? 👷‍♂️

Indeed the main topic of the last week was the AWS outage, and boy, the internet went crazy. Rather than write histerical posts, it's always better to dig into details. Gergely Orosz did great job combining the reasons for perfect storm happened at AWS. Find out why DynamoDB, EC2 and Route53 were down.

What caused the large AWS outage?

On Monday, a major AWS outage hit thousands of sites & apps, and even a Premier League soccer game. An overview of what caused this high-profile, global outage

The Pragmatic EngineerGergely Orosz

#reliability

1 million nodes Kubernetes Cluster 🤟

What it practically takes to build a 1 million nodes? The answer is 3 major areas: networking, managing state and scaling scheduler. This articles goes over those areas and suggest applicable solutions. Check them out!

k8s-1m Overview

#kubernetes #scalability

Why I code as a CTO 🍼

John Wang is a CTO just as myself. He shares his experience which I can definitely relate to - the excitement about AI tools, the need to fix a bug or introduce a new PoC, working on a new feature to stay hands on - can be a great addition to the CTO skillset. I still has a doubt - is that a best application of your time though? Let me know what you think in the comments!

Why I code as a CTO

Assembled CTO John Wang on why coding makes him a better leader—and how AI tools are redefining what it means to build at scale.

John WangCo-Founder and CTO

Business Oriented System Design Course Cohort #7 is closed, but waiting list is open for the next one!

I have got an entire course to help you design software solutions and eventually pass interviews. 10 hours of content packed lectures, engaging practice and the final work you will be proud to showcase as well as Credly(by Pearson)-based digital certificate proving your experience. More than 100 engineers already passed the course with amazing feedback and advanced their careers. People report 15%+ increase in salary after passing the course. New cohort starts in January.

Follow-Up

How Netflix optimized it's petabyte-scale logging system with ClickHouse 👷‍♂️

Logging at Netflix is mental: 5 PB of data generated each day serving up to 1000 requests per second generated by both engineers and monitoring solutions. A robust architecture is required for grouping log entries, serializing and quering - with the last bit served by Clickhouse.

How Netflix optimized its petabyte-scale logging system with ClickHouse

“To make our logging system work, we had to make a lot of choices. The key is how you simplify things in order to do the least amount of work.” Daniel Muino, Software Engineer

ClickHouseThe ClickHouse Team

#observability

Solving the wrong problem 🍼

LLMs and AI Agents can now generate software in minutes not only in the hands of engineers, but by totally tech newbies too. Uwe nails the ultimate problem though: the more powerful tool we get, the more responsibility we should adopt. Follow the thought process in this article.

Solving the wrong problem

The nagging feeling that something does not fit

Uwe FriedrichsenUwe Friedrichsen

#ai

Why Self-host? 🍼

This is not an article on on-premise vs cloud; this is rather an argument why you might want to abandon google/apple/phillips ecosystem and host calendar, mail, location and smart home services on your own. Sounds impossible for regular people, but for engineers - not so much. Take a read.

Why Self-host?

Let me make the argument why you should start self-hosting more of your personal services.

Roman ZippRoman Zipp

#privacy

Our journey to affordable logging 👷‍♂️

One of the biggest problem in Observability is the cost. At CloudKitchens the log storage was at 20% of their total infrastructure cost, so they decided to move off the managed service and ended up with their own solution, designed on the principles of horizontal scalability, effective durability, cost efficiency and the performance granted by a Rust based engine.

Our journey to affordable logging

The architecture of our in-house Rust based logging engine

CloudKitchensCloudKitchens

#observability #cost

GraphQL Myths 👷‍♂️

Looks like GraphQL didn't find it place in the industry: there is no massive adaption and I personally evaluated it several times and it did never fit. However, maybe we're just using it incorrecly? Jovi De Croock shares a set of myths on GraphQL, that should be debunked.

GraphQL Myths

Common misconceptions about GraphQL and how persisted operations address them.

Jovi De CroockJovi De Croock

#graphql

Big thanks to Nikita, Constantin, Anatoly, Oleksandr, Dima, Pavel B, Pavel, Robert, Roman, Iyri, Andrey, Lidia, Vladimir, August, Roman, Egor, Roman, Evgeniy, Nadia, Daria, Dzmitry, Mikhail, Nikita, Dmytro, Denis and Mikhail for supporting the newsletter on Patreon!