Architecture Weekly #185
Architecture Weekly Issue #185. Articles, books, and playlists on architecture and related topics. Split by sections, highlighted with complexity: ๐ค means hardcore, ๐ทโโ๏ธ is technically applicable right away, ๐ผ - is an introduction to the topic or an overview. Now in telegram and Substack as well.
Highlights
What caused the large AWS outage? ๐ทโโ๏ธ
Indeed the main topic of the last week was the AWS outage, and boy, the internet went crazy. Rather than write histerical posts, it's always better to dig into details. Gergely Orosz did great job combining the reasons for perfect storm happened at AWS. Find out why DynamoDB, EC2 and Route53 were down.
#reliability
1 million nodes Kubernetes Cluster ๐ค
What it practically takes to build a 1 million nodes? The answer is 3 major areas: networking, managing state and scaling scheduler. This articles goes over those areas and suggest applicable solutions. Check them out!
#kubernetes #scalability
Why I code as a CTO ๐ผ
John Wang is a CTO just as myself. He shares his experience which I can definitely relate to - the excitement about AI tools, the need to fix a bug or introduce a new PoC, working on a new feature to stay hands on - can be a great addition to the CTO skillset. I still has a doubt - is that a best application of your time though? Let me know what you think in the comments!
Business Oriented System Design Course Cohort #7ย is closed, but waiting list is open for the next one!
I have got an entire course to help you design software solutions and eventually pass interviews. 10 hours of content packed lectures, engaging practice and the final work you will be proud to showcase as well as Credly(by Pearson)-based digital certificate proving your experience. More than 100 engineers already passed the course with amazing feedback and advanced their careers. People report 15%+ increase in salary after passing the course. New cohort starts in January.ย
Follow-Up
How Netflix optimized it's petabyte-scale logging system with ClickHouse ๐ทโโ๏ธ
Logging at Netflix is mental: 5 PB of data generated each day serving up to 1000 requests per second generated by both engineers and monitoring solutions. A robust architecture is required for grouping log entries, serializing and quering - with the last bit served by Clickhouse.
#observability
Solving the wrong problem ๐ผ
LLMs and AI Agents can now generate software in minutes not only in the hands of engineers, but by totally tech newbies too. Uwe nails the ultimate problem though: the more powerful tool we get, the more responsibility we should adopt. Follow the thought process in this article.
#ai
Why Self-host? ๐ผ
This is not an article on on-premise vs cloud; this is rather an argument why you might want to abandon google/apple/phillips ecosystem and host calendar, mail, location and smart home services on your own. Sounds impossible for regular people, but for engineers - not so much. Take a read.
#privacy
Our journey to affordable logging ๐ทโโ๏ธ
One of the biggest problem in Observability is the cost. At CloudKitchens the log storage was at 20% of their total infrastructure cost, so they decided to move off the managed service and ended up with their own solution, designed on the principles of horizontal scalability, effective durability, cost efficiency and the performance granted by a Rust based engine.
#observability #cost
GraphQL Myths ๐ทโโ๏ธ
Looks like GraphQL didn't find it place in the industry: there is no massive adaption and I personally evaluated it several times and it did never fit. However, maybe we're just using it incorrecly? Jovi De Croock shares a set of myths on GraphQL, that should be debunked.
#graphql
Big thanks to Nikita, Constantin, Anatoly, Oleksandr, Dima, Pavel B, Pavel, Robert, Roman, Iyri, Andrey, Lidia, Vladimir, August, Roman, Egor, Roman, Evgeniy, Nadia, Daria, Dzmitry, Mikhail, Nikita, Dmytro, Denis and Mikhail for supporting the newsletter onย Patreon!