Architecture Weekly Issue #185. Articles, books, and playlists on architecture and related topics. Split by sections, highlighted with complexity: 🀟 means hardcore, πŸ‘·β€β™‚οΈ is technically applicable right away, 🍼 - is an introduction to the topic or an overview. Now in telegram and Substack as well.

Highlights

What caused the large AWS outage? πŸ‘·β€β™‚οΈ

Indeed the main topic of the last week was the AWS outage, and boy, the internet went crazy. Rather than write histerical posts, it's always better to dig into details. Gergely Orosz did great job combining the reasons for perfect storm happened at AWS. Find out why DynamoDB, EC2 and Route53 were down.

What caused the large AWS outage?
On Monday, a major AWS outage hit thousands of sites & apps, and even a Premier League soccer game. An overview of what caused this high-profile, global outage

#reliability

1 million nodes Kubernetes Cluster 🀟

What it practically takes to build a 1 million nodes? The answer is 3 major areas: networking, managing state and scaling scheduler. This articles goes over those areas and suggest applicable solutions. Check them out!

k8s-1m Overview

#kubernetes #scalability

Why I code as a CTO 🍼

John Wang is a CTO just as myself. He shares his experience which I can definitely relate to - the excitement about AI tools, the need to fix a bug or introduce a new PoC, working on a new feature to stay hands on - can be a great addition to the CTO skillset. I still has a doubt - is that a best application of your time though? Let me know what you think in the comments!

Why I code as a CTO
Assembled CTO John Wang on why coding makes him a better leaderβ€”and how AI tools are redefining what it means to build at scale.


Business Oriented System Design Course Cohort #7 
is closed, but waiting list is open for the next one!

I have got an entire course to help you design software solutions and eventually pass interviews. 10 hours of content packed lectures, engaging practice and the final work you will be proud to showcase as well as Credly(by Pearson)-based digital certificate proving your experience. More than 100 engineers already passed the course with amazing feedback and advanced their careers. People report 15%+ increase in salary after passing the course. New cohort starts in January. 

Follow-Up

How Netflix optimized it's petabyte-scale logging system with ClickHouse πŸ‘·β€β™‚οΈ

Logging at Netflix is mental: 5 PB of data generated each day serving up to 1000 requests per second generated by both engineers and monitoring solutions. A robust architecture is required for grouping log entries, serializing and quering - with the last bit served by Clickhouse.

How Netflix optimized its petabyte-scale logging system with ClickHouse
β€œTo make our logging system work, we had to make a lot of choices. The key is how you simplify things in order to do the least amount of work.” Daniel Muino, Software Engineer

#observability

Solving the wrong problem 🍼

LLMs and AI Agents can now generate software in minutes not only in the hands of engineers, but by totally tech newbies too. Uwe nails the ultimate problem though: the more powerful tool we get, the more responsibility we should adopt. Follow the thought process in this article.

Solving the wrong problem
The nagging feeling that something does not fit

#ai

Why Self-host? 🍼

This is not an article on on-premise vs cloud; this is rather an argument why you might want to abandon google/apple/phillips ecosystem and host calendar, mail, location and smart home services on your own. Sounds impossible for regular people, but for engineers - not so much. Take a read.

Why Self-host?
Let me make the argument why you should start self-hosting more of your personal services.

#privacy

Our journey to affordable logging πŸ‘·β€β™‚οΈ

One of the biggest problem in Observability is the cost. At CloudKitchens the log storage was at 20% of their total infrastructure cost, so they decided to move off the managed service and ended up with their own solution, designed on the principles of horizontal scalability, effective durability, cost efficiency and the performance granted by a Rust based engine.

Our journey to affordable logging
The architecture of our in-house Rust based logging engine

#observability #cost

GraphQL Myths πŸ‘·β€β™‚οΈ

Looks like GraphQL didn't find it place in the industry: there is no massive adaption and I personally evaluated it several times and it did never fit. However, maybe we're just using it incorrecly? Jovi De Croock shares a set of myths on GraphQL, that should be debunked.

GraphQL Myths
Common misconceptions about GraphQL and how persisted operations address them.

#graphql

Big thanks to Nikita, Constantin, Anatoly, Oleksandr, Dima, Pavel B, Pavel, Robert, Roman, Iyri, Andrey, Lidia, Vladimir, August, Roman, Egor, Roman, Evgeniy, Nadia, Daria, Dzmitry, Mikhail, Nikita, Dmytro, Denis and Mikhail for supporting the newsletter on Patreon!