System Design
  • Introduction
  • Glossary of System Design
    • System Design Basics
    • Key Characteristics of Distributed Systems
    • Scalability - Harvard lecture
      • Scalability for Dummies - Part 1: Clones
      • Scalability for Dummies - Part 2: Database
      • Scalability for Dummies - Part 3: Cache
      • Scalability for Dummies - Part 4: Asynchronism
    • Trade-off
      • CAP Theorem
      • Performance vs scalability
      • Latency vs throughput
      • Availability vs consistency
    • Load Balancing
      • Load balancer
    • Proxies
      • Reverse proxy
    • Cache
      • Caching
    • Asynchronism
    • Processing guarantee in Kafka
    • Database
      • Relational database management system (RDBMS)
      • Redundancy and Replication
      • Data Partitioning
      • Indexes
      • NoSQL
      • SQL vs. NoSQL
      • Consistent Hashing
    • Application layer
    • DNS
    • CDN
    • Communication
      • Long-Polling vs WebSockets vs Server-Sent Events
    • Security
    • Lambda Architecture
  • OOD design
    • Concepts
      • Object-Oriented Basics
      • OO Analysis and Design
      • What is UML?
      • Use Case Diagrams
    • Design a parking lot
  • System Design Cases
    • Overview
    • Design a system that scales to millions of users on AWS
    • Designing a URL Shortening service like TinyURL
      • Design Unique ID Generator
      • Designing Pastebin
      • Design Pastebin.com (or Bit.ly)
    • Design notification system (scott)
      • Designing notification service
    • Designing Chat System
      • Designing Slack
      • Designing Facebook Messenger
    • Design Top K System
    • Designing Instagram
    • Design a newsfeed system
      • Designing Facebook’s Newsfeed
      • Design the data structures for a social network
    • Designing Twitter
      • Design the Twitter timeline and search
      • Designing Twitter Search
    • Design Youtube - Scott
      • Design live commenting
      • Designing Youtube or Netflix
    • Designing a Web Crawler
      • Designing a distributed job scheduler
      • Designing a Web Crawler/Archive (scott)
      • Design a web crawler
    • Designing Dropbox
    • Design Google Doc
    • Design Metrics Aggregation System
      • Design Ads Logging System
    • Design Instacart
    • Design a payment system
      • Airbnb - Avoiding Double Payments in a Distributed Payments System
    • Design Distributed Message Queue
      • Cherami: Uber Engineering’s Durable and Scalable Task Queue in Go
    • Design Distributed Cache
      • Design a key-value cache to save the results of the most recent web server queries
    • Design a scalable file distribution system
    • Design Amazon's sales ranking by category feature
    • Design Mint.com
    • Design Autocomplete System
      • Designing Typeahead Suggestion
    • Designing an API Rate Limiter
      • Designing Rate Limiter
    • Design Google Map
      • Designing Yelp or Nearby Friends
      • Designing Uber backend
    • Designing Ticketmaster
      • Design 12306 - Scott
    • Design AirBnB or a Hotel Booking System
  • Paper Reading
    • MapReduce
  • Other Questions
    • What happened after you input the url in the browser?
Powered by GitBook
On this page
  • Design Distributed Message Queue
  • Problem Statement
  • synchronous communication
  • asynchronous communication - add a queue
  • Requirements
  • High-level Architecture
  • VIP and Load Balancer
  • FrontEnd Service
  • Metadata Service
  • Backend Service
  • What else is important?
  • Queue creation and deletion
  • Message deletion
  • Message replication
  • Message delivery semantics
  • Push vs. pull
  • FIFO
  • Security
  • Monitoring
  • Final look
  • Is it scalable?
  • Is it highly available?
  • Is it highly performant?
  • Is it durable?

Was this helpful?

  1. System Design Cases

Design Distributed Message Queue

PreviousAirbnb - Avoiding Double Payments in a Distributed Payments SystemNextCherami: Uber Engineering’s Durable and Scalable Task Queue in Go

Last updated 3 years ago

Was this helpful?

Design Distributed Message Queue

Ref:

Problem Statement

synchronous communication

  • Pros:

    • easier and faster to implement

  • Cons:

    • harder to deal with consumer service failures

asynchronous communication - add a queue


Requirements


High-level Architecture


VIP and Load Balancer

What happens if LB goes down?

  • LB uses primary and secondry nodes.

  • The primary node accepts connections and serves requests while the secondary node monitors the primary. If primary node doesn't respond, the secondary node takes over.

What if traffic increases and reaches LB's limit?

  • As for scalablity concerns, a concept fo multiple VIPs can be utilized.

  • By spreading load balancers across several data centers, we improve both avalibility and performance.


FrontEnd Service


Metadata Service


Backend Service

Option 1: Leader-Follower relationship

Option 2: Small cluster of independent hosts


What else is important?

Queue creation and deletion

  • API is a good option to control over queue configuration parameters. Delete queue is a little bit controversial as it may cause a lot of harm and must be executed with caution.

Message deletion

  • One option is not to delete a message right after it was consumed. Messages can be deleted several days later by a job. This is used by Kafka.

    • We need to maintain some kind of an order for messages in the queue and keep track of offset, which is the position of a message within a queue.

  • Another option is used by Amazon SQS. Messages are also not deleted immediately, but marked as invisible. So other consumers may not get already retrieved message.

    • Consumer that retrieved the message, needs to then call delete message API to delete the messages from a backend host.

    • And if the message was not explicitly deleted by a consumer, message becomes visible and may be delivered and processed twice.

Message replication

  • Synchronously replication: when backend host receives new message, it waits until data is replicated to other hosts. And only if replication is fully completed, successful response is returned to a producer.

    • higher durability but with a cost of higher latency for send message operation.

  • Asynchronous replication: response is returned back to a producer as soon as a message is stored on a single backend host. Message is later replicated to other hosts.

    • more performant, but not guarantee that message will survive backend host failure.

Message delivery semantics

  • At most once: when messages may be lost but are never redelivered.

  • At least once: when messages are never lost but maybe redelivered.

  • Exactly once: when each message is delivered once and only once.

Push vs. pull

  • Pull model: consumer constantly sends retrieve message requests and when new message is available in the queue, it is sent back to a consumer.

  • Push model: consumer is not constantly bombarding FrontEnd service with receive calls. Instead, consumer is notified as soon as new message arrives to the queue.

  • From producer side, pull is easy to implement compared to push. But from a consumer perspective, we need to do more work if we pull.

FIFO

  • It's hard to maintain the order. Some system either not guarantee strict order or have limitations around throughput.

Security

  • Encryption using SSL over HTTPS helps to protect messages in transit.

  • We may also encrypt messages while storing them on backend hosts.

Monitoring

  • We need to monitor health of our distributed queue system and give customers ability to track state of their queues.

  • Each service we built has to emit metrics and write log data.


Final look

Is it scalable?

  • Yes. Every component is scalable. When load increases, we just add more load balancers, more FrontEnd hosts, more Metadata service cache shards, more backend clusters and hosts.

Is it highly available?

  • Yes. No single point of failure. Each component is deployed across several data centers.

Is it highly performant?

  • Yes. Each individual microservice needs to be fast.

Is it durable?

  • Yes. We replicate data while storing and ensure messages are not lost during the transfer from a producer to a consumer.

original video