Architecture Weekly #176

Architecture Weekly Issue #176 . Articles, books, and playlists on architecture and related topics. Split by sections, highlighted with complexity: 🤟 means hardcore, 👷‍♂️ is technically applicable right away,  🍼 - is an introduction to the topic or an overview. Now in telegram and Substack as well.

Highlights

Figma's $300k Daily AWS Bill Isn't the scandal you think it is  🍼

Burning 300k a day may sound scary, however every number should be put into a perspective. With 91% profit margin and growing almost 50% YoY having 11% of revenue spent on the infrastructure is a deal to take any day of the week. Follow the analysis further in this post by Corey Quinn.

Figma’s $300k Daily AWS Bill Isn’t the Scandal You Think It Is
Well, the internet did what the internet does best this week: it collectively lost its mind over a number in an S-1 filing. Figma disclosed they signed a ~$550 million contract with AWS, someone used arithmetic (the secret weapon of Cloud Finance) to determine that this was roughly $300,000 per day on AWS, and suddenly everyone with a social media account became a cloud economics expert.

#finops #cloud

The Real Failure Rate of EBS 👷‍♂️

Elastic Block Store is promised to deliver at least 90 percent of their provisioned IOPS performance 99 percent of the time. PlanetScale has unique experience of running millions of volumes in EBS - and got a real understanding of the delivery guarantees and how it translates in the real failure rate.

The Real Failure Rate of EBS — PlanetScale
Our experience running AWS EBS at scale for critical workloads

#reliability

Good Performance for Bad Days 👷‍♂️

Measuring performance for happy cases is relatively straightforward. Metastable errors are still a thing - and rarely engineers measure performance at the pick capacity and overload situations. Marc argues that it's not an optional - it's a must.

Good Performance for Bad Days - Marc’s Blog

#performance

Wanna know how to Design Systems that deliver business value?

Business Oriented System Design Course Cohort #6 is officially open!

Looking for a way to advance your career? Felt you overgrew the mere feature development, but lack skills to design complete systems? Want to make the business impact? 10 hours of content packed lectures, engaging practice and the final work you will be proud to showcase as well as Credly(by Pearson)-based digital certificate proving your experience. More than 70 engineers already passed the course with amazing feedback and advanced their careers. New cohort starts on 23rd of July. Find the Details, Feedbacks and Enrollment into the course here. Only 3 places left!

Follow-Up

Distributed Async Await 👷‍♂️

Concurrent and distributed system introduce the problems of partial order and partial failures. Current programming models were designed before wide accept of distributed systems, so is it the time of a new mental model? Dominik Tornow shows how cleaner the distributed concurrent code can become with distributed async await.

Distributed Async Await | Introduction
Solve complex problems with simple code-Enjoy peace of mind

#distributedsystems

Getting started with LLM Inference 👷‍♂️

LLM usage is all over the place - however without deep understanding. Designing systems means knowing guarantees and limitations. This practical guide will explain the foundations of LLM work and share the calculations you need to make for compute and memory so that your inference runs smooth!

Getting started | LLM Inference Handbook

#llm #performance

How Amazon maintains accurate totals at scale with Amazon DynamoDB 👷‍♂️

For finance-related data your typical pick will be a relational database, but DynamoDB despite being a key-value storage can do a good job too. Find an example of it powering the tax rates engine for Amazon itself.

How Amazon maintains accurate totals at scale with Amazon DynamoDB | Amazon Web Services
Amazon’s Finance Technologies Tax team (FinTech Tax) manages mission-critical services for tax computation, deduction, remittance, and reporting across global jurisdictions. The Application processes billions of transactions annually across multiple international marketplaces. In this post, we show how the team implemented tiered tax withholding using Amazon DynamoDB transactions and conditional writes.

#casestudy

Unbearable complexity of Helm charts 👷‍♂️

Help charts are meant to be installers but end up as a configuration interface. But here's the kicker: the inputs for Helm charts are more complex than the output of them! Brian Grant analyzes the example of Traefic Helm charts to prove this statement.

#kubernetes #helm

Rethinking Logging, Checkpoints, and Recovery for High-Performance Storage Engines 🤟

Can't leave you without a paper! This one looks at how traditional databases handle saving and recovering data after a crash. The old method (ARIES) is reliable but slow and uses too much storage. The authors suggest a new approach that saves changes bit by bit, even while the database is running, and fixes only the parts that need repair after a crash. This makes recovery much faster and uses less disk space—great for modern high-speed systems.

#db #paper

P.S.

My Interview with Neal Ford on is available here only for patrons. Consider supporting the newsletter while getting early access to such content.

Big thanks to Nikita, Constantin, Anatoly, Oleksandr, Dima, Pavel B, Pavel, Robert, Roman, Iyri, Andrey, Lidia, Vladimir, August, Roman, Egor, Roman, Evgeniy, Nadia, Daria, Dzmitry, Mikhail, Nikita, Dmytro, Denis and Mikhail for supporting the newsletter on Patreon!