System Design
  • Introduction
  • Glossary of System Design
    • System Design Basics
    • Key Characteristics of Distributed Systems
    • Scalability - Harvard lecture
      • Scalability for Dummies - Part 1: Clones
      • Scalability for Dummies - Part 2: Database
      • Scalability for Dummies - Part 3: Cache
      • Scalability for Dummies - Part 4: Asynchronism
    • Trade-off
      • CAP Theorem
      • Performance vs scalability
      • Latency vs throughput
      • Availability vs consistency
    • Load Balancing
      • Load balancer
    • Proxies
      • Reverse proxy
    • Cache
      • Caching
    • Asynchronism
    • Processing guarantee in Kafka
    • Database
      • Relational database management system (RDBMS)
      • Redundancy and Replication
      • Data Partitioning
      • Indexes
      • NoSQL
      • SQL vs. NoSQL
      • Consistent Hashing
    • Application layer
    • DNS
    • CDN
    • Communication
      • Long-Polling vs WebSockets vs Server-Sent Events
    • Security
    • Lambda Architecture
  • OOD design
    • Concepts
      • Object-Oriented Basics
      • OO Analysis and Design
      • What is UML?
      • Use Case Diagrams
    • Design a parking lot
  • System Design Cases
    • Overview
    • Design a system that scales to millions of users on AWS
    • Designing a URL Shortening service like TinyURL
      • Design Unique ID Generator
      • Designing Pastebin
      • Design Pastebin.com (or Bit.ly)
    • Design notification system (scott)
      • Designing notification service
    • Designing Chat System
      • Designing Slack
      • Designing Facebook Messenger
    • Design Top K System
    • Designing Instagram
    • Design a newsfeed system
      • Designing Facebook’s Newsfeed
      • Design the data structures for a social network
    • Designing Twitter
      • Design the Twitter timeline and search
      • Designing Twitter Search
    • Design Youtube - Scott
      • Design live commenting
      • Designing Youtube or Netflix
    • Designing a Web Crawler
      • Designing a distributed job scheduler
      • Designing a Web Crawler/Archive (scott)
      • Design a web crawler
    • Designing Dropbox
    • Design Google Doc
    • Design Metrics Aggregation System
      • Design Ads Logging System
    • Design Instacart
    • Design a payment system
      • Airbnb - Avoiding Double Payments in a Distributed Payments System
    • Design Distributed Message Queue
      • Cherami: Uber Engineering’s Durable and Scalable Task Queue in Go
    • Design Distributed Cache
      • Design a key-value cache to save the results of the most recent web server queries
    • Design a scalable file distribution system
    • Design Amazon's sales ranking by category feature
    • Design Mint.com
    • Design Autocomplete System
      • Designing Typeahead Suggestion
    • Designing an API Rate Limiter
      • Designing Rate Limiter
    • Design Google Map
      • Designing Yelp or Nearby Friends
      • Designing Uber backend
    • Designing Ticketmaster
      • Design 12306 - Scott
    • Design AirBnB or a Hotel Booking System
  • Paper Reading
    • MapReduce
  • Other Questions
    • What happened after you input the url in the browser?
Powered by GitBook
On this page
  • Layer 4 load balancing
  • Layer 7 load balancing
  • Horizontal scaling
  • Disadvantage(s): horizontal scaling
  • Disadvantage(s): load balancer

Was this helpful?

  1. Glossary of System Design
  2. Load Balancing

Load balancer

PreviousLoad BalancingNextProxies

Last updated 4 years ago

Was this helpful?

Load balancers distribute incoming client requests to computing resources such as application servers and databases. In each case, the load balancer returns the response from the computing resource to the appropriate client. Load balancers are effective at:

  • Preventing requests from going to unhealthy servers

  • Preventing overloading resources

  • Helping eliminate single points of failure

Load balancers can be implemented with hardware (expensive) or with software such as HAProxy.

Additional benefits include:

  • SSL termination - Decrypt incoming requests and encrypt server responses so backend servers do not have to perform these potentially expensive operations.

eg. Removes the need to install X.509 certificates on each server

  • Session persistence - Issue cookies and route a specific client's requests to same instance if the web apps do not keep track of sessions

To protect against failures, it's common to set up multiple load balancers, either in active-passive or active-active mode.

Load balancers can route traffic based on various metrics, including:

  • Random

  • Least loaded

  • Session/cookies

  • Layer 4

  • Layer 7

Layer 4 load balancing

When the Layer 4 load balancer receives a request and makes the load balancing decision, it also performs Network Address Translation (NAT) on the request packet, changing the recorded destination IP address from its own to that of the content server it has chosen on the internal network. Similarly, before forwarding server responses to clients, the load balancer changes the source address recorded in the packet header from the server’s IP address to its own. (The destination and source TCP port numbers recorded in the packets are sometimes also changed in a similar way.)

Layer 7 load balancing

Layer 7 load balancers look at the application layer to decide how to distribute requests. This can involve contents of the header, message, and cookies. Layer 7 load balancers terminates network traffic, reads the message, makes a load-balancing decision, then opens a connection to the selected server. For example, a layer 7 load balancer can direct video traffic to servers that host videos while directing more sensitive user billing traffic to security-hardened servers.

At the cost of flexibility, layer 4 load balancing requires less time and computing resources than Layer 7, although the performance impact can be minimal on modern commodity hardware.

Horizontal scaling

Load balancers can also help with horizontal scaling, improving performance and availability. Scaling out using commodity machines is more cost efficient and results in higher availability than scaling up a single server on more expensive hardware, called Vertical Scaling. It is also easier to hire for talent working on commodity hardware than it is for specialized enterprise systems.

Disadvantage(s): horizontal scaling

  • Scaling horizontally introduces complexity and involves cloning servers

  • Servers should be stateless: they should not contain any user-related data like sessions or profile pictures

  • Sessions can be stored in a centralized data store such as a database (SQL, NoSQL) or a persistent cache (Redis, Memcached)

  • Downstream servers such as caches and databases need to handle more simultaneous connections as upstream servers scale out

Disadvantage(s): load balancer

  • The load balancer can become a performance bottleneck if it does not have enough resources or if it is not configured properly.

  • Introducing a load balancer to help eliminate single points of failure results in increased complexity.

  • A single load balancer is a single point of failure, configuring multiple load balancers further increases complexity.

Layer 4 load balancers look at info at the transport layer to decide how to distribute requests. Generally, this involves the source, destination IP addresses, and ports in the header, but not the contents of the packet. Layer 4 load balancers forward network packets to and from the upstream server, performing .

Round robin or weighted round robin
Network Address Translation (NAT)