Designing notification service
Problem Statement
Difficulty:
hard to scale when messages grow
hard to extend the solution to support different types of subscribers
Requirements
Ask interviewer about functional and non-functional requirements
functional:
what functions we have?
how many APIs need to define?
non-functional:
reliability, scalability, consistency, etc.
High-level architecture
why have a metadata service?
separation concerns: have a well-defined interface of database to access
act as a data layer, prevent database to be hit directly by traffic
FrontEnd Service
Reverse Proxy
SSL termination: HTTPS request decrypted and pass forward in unencrypted form; encrypt response when sending back to clients
compress response to clients, saving network bandwidth
Local Disk
service healthiness
emitting metrics
auditing logs
Metadata Service
hash topic name and topic owner and put into corresponding hosts
first option: hosts constantly send heartbeat to configuration service; every time we scale up/down hosts, configuration service is aware of the change and re-map these hosts with hash value;
second option: don't use coordinator; frontend host can obtain the metadata of all hosts via gossip protocol(each host will randomly pick a peer and share data)
Temporary Storage
each message will be expected to stay a short period of time in the temporary storage
Database
Can we use a database?
Yes.
SQL or NoSQL?
NoSQL.
compare differences between SQL and NoSQL db
we don't need ACID transactions; don't need to run complex dynamic queries; don't use it for analytics or data warehousing
expected to be scaled for reads and writes easily
what types of NoSQL?
small size of message: exclude document data store;
data has no relations with each other: exclude graph data store;
so we choose: column or key-value NoSQL db. (说几个 big names!!!) Apache Cassandra and Amazon DynamoDB
In-memory storage
How about distributed memory caching system?
Sure. As long as persistence is supported.
Any specific names?
Redis.
Message Queues
Any specific names?
Apache Kafka, Amazon SQS.
Stream-process Platform
Any specific names?
Apache Kafka, Amazon Kinesis.
Sender
we can use multiple threads to retrieve data from Temporary Storage;
threads too low, might experience performance degradation; threads too high, some threads will be idle; better to dynamically adjust the number of running threads
how to dynamically control the number of running threads?
Use a semaphore which controls the permit of threads. When a thread finish retrieving the message, semaphore allows another thread to initialize in the pool. We can control the semaphore permits to dynamically tune the number of running threads.
why we call metadata service here not passing subscriber info along with message itself from frontend server?
Because the list of subscribers can be really big. e.g. thousands of HTTP endpoints or email addresses. It may require us to use document db in temporary storage.
How we send to all subscribers? Can we iterate over the list of subscribers and make a remote call to each of them?
No. Some of the delivery may failed in the list. Hot subscriber could be slow and impact other subscribers.
Better solution: split message delivery into tasks. Each task is responsible for delivering to a single subscriber. In this way, we can deliver all messages in parallel and isolate "bad" subscribers.
How can we implement this?
Create a pool of threads and each thread is responsible for executing a task(
ThreadPoolExecutor
)similarly, use a semaphore to keep track of available sender threads.
Some other questions
How can make sure it will not spam customer?
When register subscribers, the owner of HTTP endpoints or emails need will get a notification from the service and they need to confirm this request.
How to handle duplicate messages?
At frontend service side, we need to dedupe same requests. However, when network issues happen or service retries, there will be duplicate messages. Subscribers should handle this.
Retry of the delivery attempts?
we can retry hours or days to retry delivery until it is successful
or we can set a maximum number of retries
or notifying customers about the undelivered messages. customer can decide what to proceed
How we support message order?
This solution doesn't maintain message order.
Security
only authenticated publishers can publish
only registered subscribers can receive messages
encryption using SSL over HTTP helps to protect message transition
encrypt messages while storing them
Monitoring
num of messages waiting for delivery
num of messages failed for delivery
num of messages delivered, etc.
Last updated