Books are an essential driver for one's growth. Never stop reading! If you're in software architecture and distributed systems, here's 12 book I would like to read myself. Go through it and tell me, if you want to add or replace anything!

1. Distributed Systems (3rd Edition, 2017)

Authors: Maarten van Steen, Andrew S. Tanenbaum

Key Themes: Fundamentals of distributed computing, communication, synchronization, consistency, fault tolerance.

Why Read It: This academic classic provides a robust theoretical foundation. It covers the major challenges in designing large, fault-tolerant systems and examines the underlying mechanisms of modern distributed architectures.

2. Implementing Domain-Driven Design (2013)

Author: Vaughn Vernon

Key Themes: Domain-driven design (DDD), strategic design, bounded contexts, aggregates, ubiquitous language.

Why Read It: A modern deep dive into Eric Evans’ original DDD concepts. Demonstrates practical techniques for modeling complex software systems, which is crucial for scalable architectures—distributed or otherwise.

3. Balancing Coupling in Software Design (2024)

Author: Vlad Khononov

Key Themes: Managing coupling and cohesion in software, understanding dependencies, trade-offs in modularity, decoupling strategies.

Why Read It: Offers a fresh perspective on balancing coupling and cohesion to build maintainable, scalable systems. Practical guidance helps navigate trade-offs in real-world software design, emphasizing long-term adaptability and robustness.

4. Chaos Engineering: System Resiliency in Practice (2020)

Authors: Casey Rosenthal, Nora Jones

Key Themes: Resilience testing, failure injection, risk management, continuous verification of distributed systems.

Why Read It: Explores how intentionally “breaking things” in controlled experiments can reveal weaknesses and help build bulletproof systems. Ideal if you’re interested in resilience and fault tolerance beyond typical high-level theory.

5. Systems Performance: Enterprise and the Cloud (2nd Edition, 2020)

Author: Brendan Gregg

Key Themes: Performance tuning, benchmarking, Linux kernel internals, observability, distributed tracing.

Why Read It: An authoritative resource on diagnosing performance bottlenecks in modern systems. Gregg’s methodologies and instrumentation practices are invaluable when scaling distributed services.

6. Designing Distributed Systems: Patterns and Paradigms for Scalable, Reliable Services (2018)

Author: Brendan Burns

Key Themes: Container-based architectures, cluster scheduling, sharding, service discovery, load balancing.

Why Read It: Provides patterns drawn from real-world usage, particularly around Kubernetes, but goes beyond microservices. Focuses on core distributed concepts—like replication, queue-based load leveling, and orchestrating containerized workloads.

7. Release It! (2nd Edition, 2018)

Author: Michael T. Nygard

Key Themes: Production readiness, circuit breakers, bulkheads, observability, risk management.

Why Read It: A must-read on building resilient applications that can handle real-world failures. Introduces “stability patterns” and “anti-patterns” gleaned from extensive in-the-trenches experience.

8. Enterprise Integration Patterns (2003)

Authors: Gregor Hohpe, Bobby Woolf

Key Themes: Messaging systems, routing, transformations, asynchronous communication, event-driven architecture.

Why Read It: The classic reference for designing robust message-based systems. Although it’s older, the patterns remain integral to modern distributed architectures—especially for event-oriented solutions that don’t revolve solely around microservices.

9. The Art of Scalability (2nd Edition, 2015)

Authors: Martin L. Abbott, Michael T. Fisher

Key Themes: Scaling technology, scaling organizations, the “Scale Cube,” performance architecture.

Why Read It: Addresses both the technical and managerial challenges of operating large systems. Offers a structured approach to analyzing and remediating scale bottlenecks at different layers of the stack.

  1. Building Multi-tenant SaaS Architectures: Principles and Best Practices (O’Reilly, 2023/2024)

Authors: Tod Golding

Key Themes: Multi-tenant fundamentals (silo vs. pooled models), data isolation and security, cost optimization, automated tenant onboarding, monitoring and observability, scaling strategies, and compliance requirements.

Why Read It: Offers clear guidance on how to design, build, and operate shared-infrastructure SaaS platforms that support multiple customers efficiently. Covers everything from database partitioning and identity management to DevOps workflows and cost management—ensuring a secure, compliant, and scalable multi-tenant environment.

11. Building Secure & Reliable Systems (2020)

Authors: Heather Adkins, Betsy Beyer, Paul Blankinship, the Google Security & Reliability Teams

Key Themes: Security best practices, reliability engineering, risk management, secure-by-design principles.

Why Read It: A follow-up of sorts to Google’s SRE-related works, focusing specifically on designing resilient systems that are also hardened against security threats. Ties in nicely with production-readiness and system reliability.

12. Scalability Rules: 50 Principles for Scaling Web Sites and Applications (2011)

Authors: Martin L. Abbott, Michael T. Fisher

Key Themes: Pragmatic scaling principles, capacity planning, caching strategies, parallelism, concurrency.

Why Read It: Written by the same authors of The Art of Scalability but offers actionable, distilled guidelines in a format that’s easy to reference. Each “rule” serves as a best-practice template for tackling growth challenges.

Let me know if you would like to add anything here!

System Design Course

Looking to advance your system design skills further? I've got a Business Oriented System Design Course to help you! The Cohort #3 is running now, so you can sign up for the next one starting end of January. Follow this page: https://vvsevolodovich.dev/business-oriented-system-design-course/