Architecture Weekly #127

Architecture Weekly Issue #127. Articles, books, and playlists on architecture and related topics. Split by sections, highlighted with complexity: ๐ŸคŸ means hardcore, ๐Ÿ‘ทโ€โ™‚๏ธ is technically applicable right away, ย ๐Ÿผ - is an introduction to the topic or an overview. Now in telegram and Substack as well.

This Friday the world stopped. At least the part of the world running on Windows using Crowdstrike security software. ย A critical error was preventing Windows to boot and manual procedure through a recovery mode was required. So the highlights of the newsletter is reliability and security.

Highlights

Safety-Critical Software: Things Every Developer Should Know ๐Ÿ‘ทโ€โ™‚๏ธ

There is casual software: mobile applications, websites, games and there is Safety-Critical Software: airplanes firmware, medical devices soft, aerospace software and some more. The standards for it are completely different from what you're used to. I am sharing the 15 things that you should know to even remotely touch it.

Safety-Critical Software: 15 things every developer should know
This post explains what safety-critical software is, how itโ€™s supposed to be constructed, how itโ€™s actually constructed, and where the field is heading.

#security #reliability

Engineering Principles for Building Financial Systems ๐Ÿ‘ทโ€โ™‚๏ธ

Financial Systems are not safety-critical, however the impact of mistakes are pretty high from the financial perspective. The proper principles though would help you build a solid foundation for anything working with money: be it a banking engine or billing stuff for a ride-hailing company.

Engineering Principles for Building Financial Systems
Best practices and principles to create accurate and reliable software based financial systems.

#bestpractices

Improving push processing on Github ย ๐Ÿ‘ทโ€โ™‚๏ธ

There are multiple tasks to perform on pushing code to a github repo: send notifications, trigger merge checks, run security scans and many more. Previously, a huge job was doing all of that. The business metrics like reliability and performance, alongside with the technical ones like maintainability was hurting. So Github went to a refactoring

How we improved push processing on GitHub
Pushing code to GitHub is one of the most fundamental interactions that developers have with GitHub every day. Read how we have significantly improved the ability of our monolith to correctly and fully process pushes from our users.

#casestudy #reliabilty

Follow-Up

Managing 80 Developers as VP of Engineering ๐Ÿผ

An Engineering Manager is leading a team of 6-7 developers. Senior Engineering Management is running 5-6 teams. VP of Engineering is at another 2 levels above running 80-100 engineers. That's the one I got from Flo Health - Greg Stewart and we're talking about technical debt, OKRs, mentoring EMs and plenty of other stuff.

#interview #video #management

Feature flags are ruining your codebase ๐Ÿ‘ทโ€โ™‚๏ธ

Rolling out a feature to millions of users can cause a disaster, as we just learned. Feature flags can help mitigate potential problems disabling the troublesome functionality. But they become a liability increasing the complexity of your solution. Consider reading about types of feature flags and when and how you should delete them here.

Feature flags are ruining your codebase
The dangers of letting PMs control them

#configurability

Non-repeatable read anomaly ๐ŸคŸ

Databases provide different isolation levels(MySQL supports at least 4). Non-repeatable read anomaly is the first one the isolation levels get rid of. Let's understand what's the anomaly is about and how it is prevented.

A beginnerโ€™s guide to Non-Repeatable Read anomaly - Vlad Mihalcea
Non-Repeatable Read is a data integrity anomaly that can occur when one transaction observes two successive versions of the same database record.

#databases

Cassandra Storage Engine Explained ๐ŸคŸ

Every database storage has it's own approach to store data files and indexes. Cassandra is a column-oriented database and it's interesting to know how it manages the files and indexes. Luckily, there is a good article on it touching on SSTables, memory storages and related stuff.

#databases

Anycast as Load Balancing Technique ๐ŸคŸ

And let's finish with a paper on the load balancing. Typical setup will include the DNS switch, but TTL can cause minutes of the downtime. In this paper you will find out how Anycast network capability is used for more responsive load balancing.

#performance #reliability

WARNING ๐Ÿ‡บ๐Ÿ‡ฆ

The brutal and unjustified war against Ukraine continues already 2 years. If you want to help Ukraine directly visit this fund.

Big thanks to Nikita, Constantin, Anatoly, Oleksandr, Dima, Pavel B, Pavel, Robert, Roman, Iyri, Andrey, Lidia, Vladimir, August, Roman, Egor, Roman, Evgeniy, Nadia, Daria, Dzmitry, Mikhail, Nikita, Dmytro, Denis and Mikhail for supporting the newsletter. They receive early access to the articles, influence the content and participate in the closed group where we discuss the architecture problems. Join them at Patreon or Boosty!