Architecture Weekly Issue #142. Articles, books, and playlists on architecture and related topics. Split by sections, highlighted with complexity: 🤟 means hardcore, 👷‍♂️ is technically applicable right away,  🍼 - is an introduction to the topic or an overview. Now in telegram and Substack as well.

Saving 80% of cloud cost by moving from AWS to Cloudlfare 👷‍♂️

Baselime is an observability solution recently acquired by Cloudflare. With this, it was a natural choice to transition the payloads from Lambdas to AWS workers. But you never gonna believe the cloud cost plummeted 80%. Find out architecture before and after the transition and multiple tech details inside the article.

Moving Baselime from AWS to Cloudflare: simpler architecture, improved performance, over 80% lower cloud costs
Post-acquisition, we migrated Baselime from AWS to the Cloudflare Developer Platform and in the process, we improved query times, simplified data ingestion, and now handle far more events, all while cutting costs. Here’s how we built a modern, high-performing observability platform on Cloudflare’s network.

#serverless

Why do I need CDC? 👷‍♂️

Do you know what Change Data Capture? Do you know about Debezium and other solutions? Yeah, but do you know why you need it in the first place? Grab a refresher on the reasons why CDC is required.

Why Do I Need CDC?
This technical blog post explores the importance of Change Data Capture (CDC) for developers. It covers the fundamentals of CDC, its common use cases, and the advantages of log-based CDC compared to other approaches. Understand how CDC can improve operational performance, enable real-time analytics, and streamline data workflows in your applications.

#db

Fault Injection Service for AWS Lambda 🍼

Chaos Testing was invented at Netflix, but quickly got adopted in every mature software organization. Chaos testing for serverless though was underdeveloped, but not anymore! Now with Fault Injection Service coming to lambda, you can find out how your system behaves if something breaks there.

AWS Lambda now supports AWS Fault Injection Service (FIS) actions - AWS
Discover more about what’s new at AWS with AWS Lambda now supports AWS Fault Injection Service (FIS) actions

#serverless #reliability

Follow-Up

Scanning documents with Claude 3 Sonnet and serverless 👷‍♂️

Do you remember ABBYY was firing engineers recently? No wonder, as multi-model LLMs are now capable of doing the job faster, better and cheaper. At supplied.eu, we moved from Abbyy Vantage solution to ChatGPT API, and we run it in cloud. Here you will find an article of how to do the same, but with Claude. I think that having a queue and a second lambda is an overcomplication here, but can be considered as well.

Multimodal Bill Scan System with Claude 3 Sonnet, AWS CDK, Bedrock, DynamoDB
Scanning documents and extracting key information can now be accomplished with high accuracy using multimodal models like Claude 3 Sonnet…

#llm #serverless

Vector Databases Are the Wrong Abstraction 👷‍♂️

If you're building an application with RAG, if probably faced a problem of synchronization between the source data and the embeddings in the vector database. This articles argues, that the problem in the wrong abstration: the embeddings are derived data and should be stored right next to the original one, essentially making them like indexes. Find out the solution to this problem:

Vector Databases Are the Wrong Abstraction
Today’s vector databases disconnect embeddings from their source data. We should treat embeddings more like database indexes—here’s how.

#db #ai

Decision-Making Pitfalls for Technical Leaders 🍼

I frequently observe engineers making suboptimal technical decisions because multiple reasons: focusing too much on the former experience, rather than the problem, failing to account for risks and others. That's why I recommend to read this article: it does a good job how to think for better decision making.

Decision-Making Pitfalls for Technical Leaders
Tech’s favorite party trick is promoting programmers into leadership roles with zero transition coaching, or even a briefing on what the role entails. The programmer accepts the promotion bec…

Replication in Distributed Systems - Part 1 🍼

Once you need to improve performance after data exceeds one-machine capacity, you typically look at the replication. In this concise article you will find explanation how it happens and what are the different types of replication mechanisms exist.

Replication in Distributed Systems - Part 1
Welcome, fellow nerds, to the 1st part of a blog series on replication. We will be discussing why we even need to distribute a database across multiple machines, what are leaders and followers, how to handle the failure of leaders and followers, etc. It will set it up nicely for our future blog in this series. Why distribute a database across multiple machines? Scalability - If your data volume, read load, or write load grows larger than what a single machine can handle, you can spread the load across multiple machines High availability - If your application needs to continue to serve even if one or more machines goes down, you can use multiple machines to give you redundancy/ fault tolerance.

#distributedsystems

The Engineering behind Booking.com's Ranking Platform 👷‍♂️

What happens when you type "Hotel in Tallinn" in booking.com? Surprisingly, there's a lot of stuff from going from microfrontends to API Gateway to complex ML infrastructure, that should account for your personal preference. Booking.com explains the components and connections of the search at global scale.

The Engineering Behind Booking.com’s Ranking Platform | A System Overview
A peek into the system architecture of Booking.com’s Ranking Platform that provides personalized ranking at scale.

#casestudy

Big thanks to Nikita, Constantin, Anatoly, Oleksandr, Dima, Pavel B, Pavel, Robert, Roman, Iyri, Andrey, Lidia, Vladimir, August, Roman, Egor, Roman, Evgeniy, Nadia, Daria, Dzmitry, Mikhail, Nikita, Dmytro, Denis and Mikhail for supporting the newsletter on Patreon!