Architecture Weekly Issue #157. Articles, books, and playlists on architecture and related topics. Split by sections, highlighted with complexity: 🤟 means hardcore, 👷‍♂️ is technically applicable right away,  đźŤĽ - is an introduction to the topic or an overview. Now in telegram and Substack as well.

A blueprint to building a scalable authorization system 👷‍♂️

Authorization can make or break your application’s security and scalability. From managing dynamic permissions to implementing fine-grained access controls, the challenges grow as your requirements and users scale. This ebook is based on insights from 500+ interviews with engineers and IAM leads. It explores over 20 technologies and approaches, providing practical guidance to design a future-proof authorization system. Learn how to create a solution that evolves with your business needs while avoiding technical debt.

Building a scalable authorization system
A practical guide to setting a strong foundation for your application’s authorization layer. Pulling from our founders’ experience and interviews with over 500 developers, we share the six key requirements that all authorization layers have to include to avoid technical debt, and how we satisfied them while building our authorization layer.

#authorization #security

Highlights

Versioning versus Coordination 👷‍♂️

While two concurrent transactions are running, we expect each transaction to see consistent data. One way to do it is blocking, which requires coordination and hits the performance. Another way is to store the versions of values used in the transactions. What versions to use for reading though? This is where physical clocks help. Follow Marc's explanations, great as always.

Versioning versus Coordination - Marc’s Blog

#db #distributedsystems

Tolerating full cloud outages with Monzo Stand-in 👷‍♂️

Huge production systems come with a cloud cost. Having a full copy for availability will hit the cost as well. So Monzo went with a hybrid solutions: they indeed introduced redundancy, but limited to the absolute minimum: only 1% of the services is replicated. Figure out why they decided to go this way.

Tolerating full cloud outages with Monzo Stand-in

#casestudy #reliability

Distributed Systems Programming Has Stalled 🍼

Despite a clickbait title, this article tells us what are the 3 distributed systems underlying paradigms there are out there and argues(not forgetting to mention LLMs of course) that we need a new programming model as LLMs work best having all the knowledge colocated. Details inside.

Distributed Systems Programming Has Stalled
Over the last decade, we’ve seen great advancements in distributed systems, but the way we program them has seen few fundamental improvements. While we can sometimes abstract away distribution (Spark, Redis, etc.), developers still struggle with challenges like concurrency, fault tolerance, and versioning. There are lots of people (and startups) working on this. But nearly all focus on tooling to help analyze distributed systems written in classic (sequential) programming languages. Tools like J…

#distributedsystems

Follow-Up

I sent 500 million HTTP requests to 2.5 million hosts 👷‍♂️

It can sound like an easy task, but trying to do that consequently will take almost 8 years, so you need good parallelisation and understanding what's going on under the hood. DNS, TLS handshake, connection reuse, go libraries and many more here.

I sent 500 million HTTP requests to 2.5 million hosts
How I sent 500 million HTTP requests to 2.5 million hosts in a couple of hours. Deep dive into HTTP/1.1 and Go.

#performance

Why We Use Apache Kafka for Real-Time Data at Scale 👷‍♂️

This article explains how companies use Apache Kafka to handle large amounts of data in real time. This helps them quickly detect security threats by analyzing data within milliseconds instead of relying on slower traditional methods. The faster processing improves cybersecurity by allowing organizations to react to threats immediately.

Why We Use Apache Kafka for Real-Time Data at Scale
Discover how cybersecurity vendor SecurityScorecard leverages data streaming to enhance its business capabilities.

#security

When Imperfect Systems are Good 👷‍♂️

A great example of business impacting the software design. Celebrity problem is pretty common in social networks: distributing the tweets of people followed by millions of followers can provide significant performance challenges. However, do we actually need to be superfast here? Bluesky knows the answer.

When Imperfect Systems are Good, Actually: Bluesky’s Lossy Timelines
By examining the limits of reasonable user behavior and embracing imperfection for users who go beyond it, we can continue to provide service that meets the expectations of users without sacrificing scalability of the system.

#performance #casestudy

Every pod eviction in Kubernetes, explained 👷‍♂️

Pod eviction is a crucial mechanism of allocating limited compute resources to applications in K8s. That's the reason to understand how Kubernetes evicts Pods: what components can do this and by which policy.

Every pod eviction in Kubernetes, explained
Anyone who is running Kubernetes in a large-scale production setting cares about having a predictable Pod lifecycle. Having unknown actors that can terminate your Pods is a scary thought, especially when you’re running stateful workloads or care…

#k8s #kubernetes

3FS: Distributed File System for LLMs 🤟

Deepseek - a new LLM from China - opensourced its new file system. 3FS is a modern distributed file system optimized for high-speed, low-latency, and AI workloads, whereas traditional file systems are designed for single-machine or basic network storage and do not scale as efficiently in distributed environments. Follow the design notes for more details.

3FS/docs/design_notes.md at main · deepseek-ai/3FS
A high-performance distributed file system designed to address the challenges of AI training and inference workloads. - deepseek-ai/3FS

Big thanks to Nikita, Constantin, Anatoly, Oleksandr, Dima, Pavel B, Pavel, Robert, Roman, Iyri, Andrey, Lidia, Vladimir, August, Roman, Egor, Roman, Evgeniy, Nadia, Daria, Dzmitry, Mikhail, Nikita, Dmytro, Denis and Mikhail for supporting the newsletter on Patreon! If you like the newsletter, feel free to support it there - with one-time support for example!