Architecture Weekly Issue #165. Articles, books, and playlists on architecture and related topics. Split by sections, highlighted with complexity: 🤟 means hardcore, 👷‍♂️ is technically applicable right away,  🍼 - is an introduction to the topic or an overview. Now in telegram and Substack as well.

System Design Course Cohort #5 is open!

Typical system design courses teach technical skills but often overlook the connection to business problems. This course fills that gap, emphasizing the importance of recognizing and addressing business priorities with technical approaches. Learn to go beyond load balancing options and performance tactics by focusing on solving real business challenges. Now - accompanied by the AI tools! Course was already completed by 50+ engineers with great feedback!

SIGN UP HERE: ONLY a single slot left!

Highlights

You won't be surprised seeing agentic AI in software architecture. What is interesting though is the Cell-based architecture and Socio-Technical engineering. If you interested in the latter, feel free to watch the relevant interview with an expert in this field Yevgen Nebesov.

#architecture

So you want to use Object Storage 👷‍♂️

Object storages are a standard de-facto: multiple databases use them without any additional layers in between, mutiple SaaS work with S3 directly, etc. At scale though you will face the issue of tale latency which will byte you. You will find 3 mitigation strategies and the research results which states that waiting just 250ms before sending the same request pretty much solves the problem.

So you want to use Object Storage
Tips and lessons learned from building systems directly against object stores

#performance

Striping Postgres data volumes 👷‍♂️

“Stripes” here are individual AWS EBS gp3 volumes combined in a RAID-0 / LVM-stripe, where each 8–32 kB chunk of your Postgres data lands on a different volume, letting you multiply throughput and IOPS with the same cheap disks—a genuine free lunch if you don’t mind RAID-0’s fragility.

Striping Postgres data volumes - a free lunch?
A small follow up on my previous post on various Postgres scale-up avenues in my preferred order. The post received quite a bit of interest - meaning people at least still “wish” performant databases :) And - the ones who are willing to put in a bit of effort with…

#db #performance

Follow-Up

Generating 1 Million PDFs in 10 minutes 👷‍♂️

Generating tons of PDF files are is relatively common: let's say you want to regenerate old invoices(like what happened in Bolt). The AWS Lambda with the addition of SQS makes this task relatively simple, and you will find the working design in the post.

Generating 1 Million PDFs in 10 Minutes
How to build a modern and scalable PDF rendering service using AWS Lambda.

#serverless

Process millions of observability events 👷‍♂️

Apache Flink is a cloud-native engine that treats streams as the default data model, offering exactly-once stateful processing, unified batch execution, and a rich SQL/stream API palette—making it the go-to choice whenever you need analytics or ETL on data that never stops coming. With the new Flink-Prometheus connector you can preprocess your observability data(like enrichment, filtering, etc.) and stream it right into Prom.

Process millions of observability events with Apache Flink and write directly to Prometheus | Amazon Web Services
In this post, we explain how the new connector works. We also show how you can manage your Prometheus metrics data cardinality by preprocessing raw data with Flink to build real-time observability with Amazon Managed Service for Prometheus and Amazon Managed Grafana.

#observability

Lessons Timeouts, Retries and Idempotency from Sam Newman 👷‍♂️

A short refresher on the pillars of distributed systems: timeouts, retries and idempotency. Remind yourself that timeouts should be balanced, retries should be and it is safe to retry the requests.

Lessons on How to Get Timeouts, Retries and Idempotency Right from Sam Newman at QCon London
At QCon London, Sam Newman - the architect who has attributed the coining of the term microservices, went back to the basics to underline the three critical things to get right when working with distributed systems: timeouts, retries and idempotency. Through the talk, he provided mechanisms allowing distributed systems to be more robust.

#distributedsystems

Five years of React Native at Shopify 🍼

In 2018 I was making a taste of React-Native, while it was at it infancy. But later big players bet on RN too. Choosing mobile technology is a frequent task for software architects; and now you can see from the experience of the whales like Shopify that it can be a viable choice. Don't forget that native expertise is important too!

Five years of React Native at Shopify (2025) - Shopify
Five years ago, we announced that React Native (RN) is the future of mobile at Shopify. Today, we are excited to share the progress we’ve made, lessons learned, and what the future holds. To recap, we decided to switch to RN for 3 main reasons: Write it once - Stop building the same features twice, once on iOS and once on Android Talent portability - Enable devs to work fluently across iOS, Android, and Web Ship more value - Spend more time delivering value to users instead of chasing feature parity We’re happy to share that our transition has been quite successful: Not having to build the same features twice has given us a step change in productivity Engineers are able to work across web and mobile allowing teams to do more with the same number of people and unlocked new growth opportunities Maintaining feature parity between iOS and Android has become a non-issue, freeing up capacity to ship a lot more value Our apps are blazing fast (<500ms screen loads) and stable (>99.9% crash-free sessions) We continue to leverage native wherever it is the best tool for the job, giving us the best of both worlds Over the past 5 years, we have migrated all our apps to React Native. Instead of using a one-size-fits-all approach to do so, each team chose when and how to migrate their app. This allowed them to continue shipping features while also aligning with our strategy of leveraging RN. What did we learn? React Native apps are fast We care very deeply about performance at Shopify. As our CEO Tobi Lutke says, “not all fast software is great, but all great software is fast”. The biggest question we had while switching to RN and the main reason we didn’t do it sooner was whether we’d be able to achieve our performance goals with it. Before making the decision to switch, we did extensive prototyping which led to promising results. We also saw all the work that Meta

#mobile

Big thanks to Nikita, Constantin, Anatoly, Oleksandr, Dima, Pavel B, Pavel, Robert, Roman, Iyri, Andrey, Lidia, Vladimir, August, Roman, Egor, Roman, Evgeniy, Nadia, Daria, Dzmitry, Mikhail, Nikita, Dmytro, Denis and Mikhail for supporting the newsletter on Patreon!