Architecture Weekly Issue #139. Articles, books, and playlists on architecture and related topics. Split by sections, highlighted with complexity: 🤟 means hardcore, 👷‍♂️ is technically applicable right away,  🍼 - is an introduction to the topic or an overview. Now in telegram and Substack as well.

Sponsored

Depot Managed GitHub Actions runners offer caching that's 10x faster than GitHub's own solution, and at half the cost. It’s the secret to boosting your CI/CD pipelines without breaking the bank. Find out how it works!

Highlights

Leader Election with S3 Conditional Writes 🤟

Some leader election protocols rely on obtaining a lock of some sort. With the recent release of conditional writes on S3 - like creating a file only if it does not yet exist - you can actually implement a leader election protocol using this lock. Follow Gunnar Morling's post for details.

Leader Election With S3 Conditional Writes
Update Aug 30: This article is discussed on Hacker News and lobste.rs. In distributed systems, for instance when scaling out some workload to multiple compute nodes, it is a common requirement to select a leader for performing a given task: only one of the nodes should process the records from a Kafka topic partition, write to a file system, call a remote API, etc. Otherwise, multiple workers may end up doing the same task twice, overwriting each other’s data, and worse.

#distributedsystems

Common Deployment Strategies 👷‍♂️

Monoliths used to be deployed as a single unit universally for all the users at once. Bearing the risks of faulty deployment and shipping unoperational app, this strategy is subpar. That's why we have blue-green deployments, canary releases and rolling updates. Read about them in Fractional Architect newsletter!

#30 Worth To Know: Common Deployment Strategies
Choosing the right deployment strategy is critical to efficient software delivery. In this post, I describe four battle-tested approaches: basic, blue-green, canary, and rolling deployments.

#sre

Estimated Time of Arrival Reliability at Lyft 👷‍♂️

You open a ride-hailing app and see that the taxi can be at your place in 5 minutes. But how this estimation ends up in the app? Surprisingly, it's more complex than a factor of number of drivers and distance. Lyft is solving this problem through a tree-based classification model. Find out how.

ETA (Estimated Time of Arrival) Reliability at Lyft
The ETA Conundrum: Speed vs Accuracy

#casestudy

Business Oriented System Design Course

"The quality of documentation and design skills has improved a lot for my engineer after he passed this course" - This is what Engineering Managers tell me as a feedback for the engineers who passed my Business Oriented System Design course. Do you want your EM tell the same about you? Join group #3 and bring your software architecture and system design skills to a new level. Engineers from Semrush, Bolt, Clyx, OpSec, Qase and other already advanced their careers. Now it's your turn. Only a week left as start on October, 18!Use your learning budget as invoices will be provided. Sign up here: https://vvsevolodovich.dev/business-oriented-system-design-course/

Follow-Up

6 API Gateway Configuration You Should Set 👷‍♂️

API Gateway can centralize authentication, help with rate limit, increase observability and improve performance through caching... if you set them right. Find the 6 configuration you need to take a look at while using AWS API Gateway.

6 API Gateway Configurations You Should Set
Essential Settings for Optimising Your AWS API Gateway

#aws #api

Improving Application Availability: Redundancy and Persistence 🍼

Some tactics for availability include redundant instances, which require load balancing to operate. There are nice explanations of underlying problems and mechanics to apply to have that implemented smoothly!

Improving Application Availability: Redundancy and Persistence
Continuing on our road through application availability, let’s expand on what we started in the first part of this series. In our previous…

#avilability

Infrastructure Drift Detection 🍼

You got you IaC solution nice and clear, however a production incident happen and you modify some structure right in the production. Now the prod and the configuration in terraform differ - that is called infrastrure drift. What to do about it? Grab an article!

Infrastructure Drift Detection: Why Do You Need It?
When you provision infrastructure at any scale, how do you manage it long-term? What safeguards are there to ensure that the infrastructure…

#iac #cloud

Data Modelling example at ManyChat 🍼

At some point you will need an analytics platform for your business, and OLTP queries just won't cut it, so you will search for a DWH and seek how to put your data there. This is the moment of modelling your data for analytics - and ManyChat blog has a nice piece showing one of the approaches to do so with modified Data Vault modelling approach.

Data Modeling today: launching cost-effective analytics for ManyChat
In this article, I’d like to consider a rather big practical case — launching an analytical system for a rather complex business.

#dwh

Do not use secrets in environment variables and here's how to do it better 👷‍♂️

.env file frequently contains not only the urls and configuration parameters and guess what - api tokens as well. Grab a long read explaining why storing secrets in env variables is a bad practice for almost a dozen of reasons and how to do it better.

Do not use secrets in environment variables and here’s how to do it better
Stop storing secrets in environment variables. It’s a bad practice and only fits hobby or side projects with no real business impact. Here are all the reasons why you should never store secrets in environment variables and how to do it better.

#security

Using HBase Quotas to Share Resources at Scale 👷‍♂️

Even if you're not using HBase per se, this post from HubSpot engineering blog can showcase how throttling and different strategies of rate limiting can help you building a resilient solution in the presense of sudden load spikes.

Using HBase Quotas to Share Resources at Scale
At HubSpot, thousands of applications generate workloads on our HBase clusters. In this post we will describe our approach for fair resource sharing between all of our end users, how it ensures reliable performance for our most critical systems, and how we proactively prevent buggy systems from wreaking havoc.

#performance #casestudy

Big thanks to Nikita, Constantin, Anatoly, Oleksandr, Dima, Pavel B, Pavel, Robert, Roman, Iyri, Andrey, Lidia, Vladimir, August, Roman, Egor, Roman, Evgeniy, Nadia, Daria, Dzmitry, Mikhail, Nikita, Dmytro, Denis and Mikhail for supporting the newsletter!