🎄Architecture Weekly #149

Architecture Weekly Issue #149. Articles, books, and playlists on architecture and related topics. Split by sections, highlighted with complexity: 🤟 means hardcore, 👷‍♂️ is technically applicable right away,  🍼 - is an introduction to the topic or an overview. Now in telegram and Substack as well.

This is the last issue of 2024. I plan to publish a post with the year summary, so make sure you're subscribed!

System Design Course

Looking to advance your system design skills further? I've got a Business Oriented System Design Course to help you! The Cohort #3 is running now, so you can sign up for the next one starting end of January. Follow this page: https://vvsevolodovich.dev/business-oriented-system-design-course/

Highlights

Building MCP for AI tools on Cloudflare Workers 👷‍♂️

Model Context Protocol is a new, open, unified way of connecting the LLM apps to the data sources and applications. The whole concept is exciting: letting the AI app to grab the data it lacks and allow it make actions on your behalf... Follow the article by Cloudflare and make sure to visit every link to understand the MCP better. Trust me, it's gonna be a popular topic in 2025.

Hi Claude, build an MCP server on Cloudflare Workers
Want Claude to interact with your app directly? Build an MCP server on Workers. That will enable you to connect your service directly, allowing Claude to understand and run tasks on your behalf.

#llm

OpenAI Outage Postmortem 👷‍♂️

Everyone's got a downtime: Slack, Google and now OpenAI. Typically you build telemetry services to improve observability and reduce downtime, but this time it caused the downtime in the first place. Follow the reasons behind an incident and lessons learned in OpenAI incident report.

API, ChatGPT & Sora Facing Issues
OpenAI’s Status Page - API, ChatGPT & Sora Facing Issues.

#observability

The secret life of DNS packets: investigating complex networks 👷‍♂️

Another example of an issue, which required a deep analysis on what's happening with DNS traffic at Stripe with the root cause in the AWS VPC limitations. A detective read with tons of technical details!

The secret life of DNS packets: investigating complex networks
The secret life of DNS packets: investigating complex networks.

#casestudy #networking

Follow-Up

Building an Open, Multi-Engine Data Lakehouse with S3 and Python 👷‍♂️

Grab an amazing 6-part series of post to build a Data Lakehouse based on the Apache Iceberg table format. It will explain how Iceberg table format works, why open multi-engine option is a need nowadays and how to build it in the first place. Great read for anyone working with analytics in any form!

Building an Open, Multi-Engine Data Lakehouse with S3 and Python - Tower
Run Python data apps reliably in production

#dwh #dataengineering

Consistent hashing and rendezvous hashing explained 👷‍♂️

While sharding the database you need to figure out how to allocate data ranges to different shards. Naive approach is a simple mod based hashing, but it triggers the whole data reallocation while adding/removing the nodes. Thus the consistent hashing. But it's not perfect too - and this article introduces rendezvous hashing too. Check it out.

Consistent hashing and rendezvous hashing explained.
A guide to hashing in distributed systems, including consistent hashing and rendezvous hashing.

Preferring throwaway code over design docs 🍼

We used to write docs before the implementation, however there are a couple of problems with them. First, they are based on our current understanding rather than the experience dealing with the problem. Second, they get outdated fast. Dough Turnbull argues that approach of throwaway code is a way to solve both problems.

Preferring throwaway code over design docs
If you have discipline to throw away your first idea, draft, throwaway PRs often drives more progress than a design doc.

#designdocs #process

Shrinking a Postgres Table 👷‍♂️

Facing a database running out of disk space is pretty common problem. It's not a problem if it runs on the under-your-control VM, which disk you can simple enlarge. But what if it's a managed database? Then you will need to shrink the database tables. Find the example here.

Shrinking a Postgres Table
Ok folks, this is kind of a weird one. I’m going to put it in the “you won’t ever need this, but if you do, you are going to be glad I wrote this up for ya” category. As you may or may not know, I recently acquired fireside.fm,

#db #postgresql

7 techniques for API Gateway Scaling

API Gateways is a crucial element in modern software systems. I explained why we need them and how to set them up here. Now grab a piece with the whole list of techniques used to scale them!

API Gateway Scaling: 7 Techniques for High Availability
Explore effective techniques for scaling your API gateway to ensure high availability, including load balancing, caching, and cloud-based solutions.

#api

Big thanks to Nikita, Constantin, Anatoly, Oleksandr, Dima, Pavel B, Pavel, Robert, Roman, Iyri, Andrey, Lidia, Vladimir, August, Roman, Egor, Roman, Evgeniy, Nadia, Daria, Dzmitry, Mikhail, Nikita, Dmytro, Denis and Mikhail for supporting the newsletter on Patreon! If you like the newsletter, feel free to support it there - with one-time support for example!