Designing notification service

Ref: https://www.youtube.com/watch?v=bBTPZ9NdSk8

Problem Statement

  • Difficulty:

    • hard to scale when messages grow

    • hard to extend the solution to support different types of subscribers

Requirements

  • Ask interviewer about functional and non-functional requirements

    • functional:

      • what functions we have?

      • how many APIs need to define?

    • non-functional:

      • reliability, scalability, consistency, etc.

High-level architecture

why have a metadata service?

  • separation concerns: have a well-defined interface of database to access

  • act as a data layer, prevent database to be hit directly by traffic

FrontEnd Service

Reverse Proxy

  • SSL termination: HTTPS request decrypted and pass forward in unencrypted form; encrypt response when sending back to clients

  • compress response to clients, saving network bandwidth

Local Disk

  • service healthiness

  • emitting metrics

  • auditing logs

Metadata Service

  • hash topic name and topic owner and put into corresponding hosts

  • first option: hosts constantly send heartbeat to configuration service; every time we scale up/down hosts, configuration service is aware of the change and re-map these hosts with hash value;

  • second option: don't use coordinator; frontend host can obtain the metadata of all hosts via gossip protocol(each host will randomly pick a peer and share data)

Temporary Storage

  • each message will be expected to stay a short period of time in the temporary storage

Database

Can we use a database?

Yes.

SQL or NoSQL?

NoSQL.

  • compare differences between SQL and NoSQL db

  • we don't need ACID transactions; don't need to run complex dynamic queries; don't use it for analytics or data warehousing

  • expected to be scaled for reads and writes easily

what types of NoSQL?

  • small size of message: exclude document data store;

  • data has no relations with each other: exclude graph data store;

  • so we choose: column or key-value NoSQL db. (说几个 big names!!!) Apache Cassandra and Amazon DynamoDB

In-memory storage

How about distributed memory caching system?

Sure. As long as persistence is supported.

Any specific names?

Redis.

Message Queues

Any specific names?

Apache Kafka, Amazon SQS.

Stream-process Platform

Any specific names?

Apache Kafka, Amazon Kinesis.

Sender

  • we can use multiple threads to retrieve data from Temporary Storage;

  • threads too low, might experience performance degradation; threads too high, some threads will be idle; better to dynamically adjust the number of running threads

how to dynamically control the number of running threads?

Use a semaphore which controls the permit of threads. When a thread finish retrieving the message, semaphore allows another thread to initialize in the pool. We can control the semaphore permits to dynamically tune the number of running threads.

why we call metadata service here not passing subscriber info along with message itself from frontend server?

Because the list of subscribers can be really big. e.g. thousands of HTTP endpoints or email addresses. It may require us to use document db in temporary storage.

How we send to all subscribers? Can we iterate over the list of subscribers and make a remote call to each of them?

No. Some of the delivery may failed in the list. Hot subscriber could be slow and impact other subscribers.

Better solution: split message delivery into tasks. Each task is responsible for delivering to a single subscriber. In this way, we can deliver all messages in parallel and isolate "bad" subscribers.

How can we implement this?

  • Create a pool of threads and each thread is responsible for executing a task(ThreadPoolExecutor)

  • similarly, use a semaphore to keep track of available sender threads.

Some other questions

How can make sure it will not spam customer?

When register subscribers, the owner of HTTP endpoints or emails need will get a notification from the service and they need to confirm this request.

How to handle duplicate messages?

At frontend service side, we need to dedupe same requests. However, when network issues happen or service retries, there will be duplicate messages. Subscribers should handle this.

Retry of the delivery attempts?

  • we can retry hours or days to retry delivery until it is successful

  • or we can set a maximum number of retries

  • or notifying customers about the undelivered messages. customer can decide what to proceed

How we support message order?

This solution doesn't maintain message order.

Security

  • only authenticated publishers can publish

  • only registered subscribers can receive messages

  • encryption using SSL over HTTP helps to protect message transition

  • encrypt messages while storing them

Monitoring

  • num of messages waiting for delivery

  • num of messages failed for delivery

  • num of messages delivered, etc.

Last updated