Architecture Weekly #175

Architecture Weekly Issue #175. Articles, books, and playlists on architecture and related topics. Split by sections, highlighted with complexity: 🤟 means hardcore, 👷‍♂️ is technically applicable right away,  🍼 - is an introduction to the topic or an overview. Now in telegram and Substack as well.

Highlights

Building a dynamic inventory optimization system 👷‍♂️

Every eCommerce application's core is the warehouse and replenishment optimzation problem. Zalando shows how modern data engineering and machine learning approaches help to solve it at scale.

Zalando Engineering Blog - Building a dynamic inventory optimisation system: A deep dive
This technical blog outlines how we built a scalable inventory optimization system to help partners maintain a profitable inventory.

#casestudy

How Convex Took Down T3 Chat: PM 👷‍♂️

Postmortems frequently is a source of production experience, and Convex PM is no exception. Convex is a Backend-as-a-Service company: they provide a database and functions to build a backend for their customers. Last Month they had an incident introducing severe degradation for one of their customers. The timeline and lessons learned inside.

How Convex Took Down T3 Chat: June 1, 2025 Postmortem
I used to co-host a podcast called The Downtime Project. Each episode, my old friend Tom Kleinpeter and I walked through a public tech postmortem, extracted lessons, and related our own stories about outages of our past projects. Back then, in 2021, Convex was just a prototype. Once or twice,

#pm

Avoiding Safety Violations in Distributed Systems 🤟

Dominik Tornow highlights the importance of System thinking. Recently TigerBeetle - a db for financial transactions - conducted a testing with Jepsen - to make the promised guarantees hold. They found an interesting case where an internal execution error was not leveraged to the client. Find out if it should have been in the first place in this post.

Jepsen & TigerBeetle
Avoiding Safety Violations in Distributed Systems with Systems Engineering

#distributedsystems

Wanna know how to Design Systems that deliver business value?

Business Oriented System Design Course Cohort #6 is officially open!

Looking for a way to advance your career? Felt you overgrew the mere feature development, but lack skills to design complete systems? Want to make the business impact? 10 hours of content packed lectures, engaging practice and the final work you will be proud to showcase as well as Credly(by Pearson)-based digital certificate proving your experience. More than 70 engineers already passed the course with amazing feedback and advanced their careers. New cohort starts on 23rd of July. Find the Details, Feedbacks and Enrollment into the course here. Only 3 places left!

Follow-Up

Running a DB on preemptible machines 👷‍♂️

One of the solutions to have a highly cost-efficient cloud database is to separate compute from storage and use preemptible instances. To not surprise, this decisions brings significant challenges, and Thor Hansen from Polar Signals explains thim here.

Best Laid Plans
What we learned from building a database

#db #performance #cloud

Most Valuable When Least Visible 🍼

A small note on the security attitude: true that developers and PMs focus on features first and the security is almost always an afterthought. What if security is your passion - how would you navigate the challenge then? Grab an advise.

Most Valuable When Least Visible | The Security Paradox | Danielle’s Blog
Security -- The Most Important Work You’ll Never See in a Sprint Demo

#security

How AI is changing software engineering at Shopify with Farhan Thawar 🍼

Most companies bans AI tools during the interview process, but Shopify embraces it. Yeah, you read that right. Farhan Thawar, a Head of Engineering at Shopify, shares the stories about the Shopify experience with AI, the engineering model and Why Shopify places no limit on AI token spending.

#interview #video

Distributed Locking: A Practical Guide 👷‍♂️

How do you make sure you won't overwrite the data which is being written? It's easy with a single node, but in distributed systems it's not that easy: you need an external coordinator. Oskar wrote a beautiful piece explaining the Distributed Locking and shares practical ways to implement it.

Distributed Locking: A Practical Guide
If you’re wondering how and when distributed locking can be useful, here’s the practical guide. I explained why distributed locking is needed in real-world scenarios. Explored how popular tools and systems implement locks (Redis, ZooKeeper, databases, Kubernetes single-instance setups, etc.). Discussed potential pitfalls—like deadlocks, lock expiration, and single points of failure—and how to address them. By the end, you should have a decent grasp of distributed locks, enough to make informed decisions about whether (and how) to use them in your architecture.

#distributedsystems

A Survey and Evaluation of Database Management System Extensibility 🤟

Multiple databases like PostgreSQL, DuckDB, SQLite and Redis allow extensions to be installed. The extensions can add procedures, custom data types, parser changes and many more. The question though is the compatibility of different extensions. This paper by Andrew Pavlo and his colleagues discovers that only 16% of extensions are intercompatible. Follow the paper further.

#paper #db

Big thanks to Nikita, Constantin, Anatoly, Oleksandr, Dima, Pavel B, Pavel, Robert, Roman, Iyri, Andrey, Lidia, Vladimir, August, Roman, Egor, Roman, Evgeniy, Nadia, Daria, Dzmitry, Mikhail, Nikita, Dmytro, Denis and Mikhail for supporting the newsletter on Patreon!