Design notification system (scott)
Design notification system (scott)
Problem
Create a system for your company that supports the notifications. The notification includes:
- In-app notification like apple / android built-in notification 
- Email notification 
- Phone notification 
- SMS notification 
Various integration with third party services like sendGrid, twilio.
Support more than one delivery method:
- At least once 
- At most once 
- Exactly once [if possible] 
An unified interface for other services to use your system. A real time system dashboard to show the processes and how many notifications are sent, in progress and queued.
Business Use Case
MVP
- Delivery of notifications for varies of receivers (apps, email, phone, sms) 
- Delivery support different of MODE (at least once, at most once) 
- Push / pull model for active subscribers / idle subscribers 
Bonus
- Delivery support for exactly once (2PC, transactional) 
- Recurring notification 
- Scheduled notification 
- Images/videos 
Non Goal
- Latency for delivering the notifications 
- Maintain the order of notifications 
Constraints
- High Availability 
- High Scalability 
- Flexibility 
Traffic Estimation
data points: Facebook 200M active user per day, 5 notification per user
- DAU: 200 M 
- QPS: 
- Peak: 
High-level design


- 可以用 Kafka/Flink 来做 monitoring system - 注意:不能用 log 来做 real-time 的 dashboard,因为它会有数据的丢失。 
 
API Design
createTopic(TopicName, SearviceType, Metadata)
- example data: Ads_campian_1234, In_app, Priority, SecurityMetadata 
- Topic - Topic ID, Topic Name, Service Type, Topic MetaData, Messages 
send(TopicID, SEND_MODE)
- SEND_MODE: at_least_once, at_most_once, exactly_once 
subscribe(TopicID, SUB_MODE)
- SUB_MODE, Priority 
Database Design
Message Storage Table - NoSQL
- DynamoDB 
- Cassandra - write heavy - Cassendra's log structured merge tree is suitable for write heavy worklord. Also, it has multi-master architecture and partioning data across all nodes. 
Message Storage Table(DynamoDB)
abc_123
897987686
223
"hello word"
112
Metadata Table
abc_123
PENDING
AT_LEAST_ONCE
112
24253535
Detailed Design
Message Status

- message status: - PENDING,- SENDING,- DELIVERED/FAILED(- CLICK|- UNSUBSCRIBE)
- 当我们把 publisher 给的数据存到数据库之后我们就可以告诉publisher 你的 notification we received. 
- 这样优点是 availabilify 高,一旦保存好就直接告诉 publisher 了,之后有一个 async 的 thread 来读数据库中 PENDING 的 record 
Life Cycle(Service_A send SMS to User_1)
- Call API with metadata and msg send(topicID, at_least_once, message). message status label to PENDING. (但这时不能返回给客户,因为 server 有可能 crash。只有当第 4 步存到 DB,才可以返回给客户收到) 
- LB route msg to Kafka/Flink for monitoring 
- Call Metadata Service to get topic object(json) // new topic including topic storage 
- store the message(update the msg status to SENDING/FAILED) -> return to client with msg receipt; client can poll the receipt to check status. (只有我们把 msg 存好,才能返回给 customer 收到!) 
- Sender send the msg 
- If Sender go Timeout/Exceptions -> retry Queue(DLQ) -> (send a kafka topic to monitoring system, update the msg status to SENDING) 
- send to SMS/Email/Phone -> send back Ack 
如何防止数据丢失?
我们保存 notification log 在 database. worker 在从 queue 里面拿到数据后还会保存notification log
用户只会收到一次 notification 吗?
- 我们无法保证,实际上用户很有可能收到多次 notification,我们需要在客户端也做 dedupe mechanism 
- 我们可以根据 notificationID 来去重,server 端也可以加过滤不过这是为了防止垃圾邮件重复多次提醒 
使用模板来加速 Notification Template
- 很多时候邮件都是相似的,只有日期和姓名不一样,比如给你发 offer 或者拒信,都是现成的数据,所以我们只需要个人信息直接填充模板即可 
- 格式更少出错,并且速度更快 
信息发送失败 retry

- 下游 dependency 出问题很正常,比如 firebase down 了,信息没发出去。这个 task 会再被丢回 queue,假设我们 retry 3次(设置max retry number), 还失败,那就需要告诉 producer发送者,同时 oncall 起来修修看。 
- backoff retry mechanism: SNS retry 机制:开始很快 retry,然后间隔时间逐步加大,过一会儿再 retry,然后再加大…… 
我们的信息是保证发送顺序 in order 的吗?
- 不是的,这个和只 deliver y一次是同一个问题,因为网络可能出错,用户手机接收可能出错,在有 retry 的情况下我们无法保证前后的顺序。 

- 我们可以设置不同的 queue,做一些 hash,把同一个 user 的消息尽量放到同一个 queue 里。这样就能尽可能保证 message 是按顺序的。 - 另外,一个 worker access 一个 queue。如果多个 worker 同时 access 同一个 queue,很容易 mess it up,duplicate message 之类的。 
 
信息发送的 priority 设计?
我们可以在 queue 前面加一个模块来做 prioritize
- 第一优先级 OTP(one time password), 用户没这个不能登录游戏了! 
- transaction notification, 您好,快递到了请签收一下,您排队 2 小时的小肥羊终于轮到你了。 
- promotion message, 恭喜您这个月我们衣服价格打九七折! 
Last updated
Was this helpful?