message_type is one of [direct message, group message]
create_timestamp, update_timestamp, expired_time, STATE will be appended by channel service.
STATE could be one of [received, sent, viewed, notification_sent, deleted], the multimedia message may need to be processed, therefore need some time between received and sent
permission_group: used to manage who can upload files
file_descriptor: a file descriptor (FD, less frequently fildes) is a unique identifier (handle) for a file or other input/output resource, such as a pipe or network socket. More read here.
LookupEntity(user_id)
search locally for given entity
use trie to build a local search tree
joinGroup(user_id, group_id)
getDirectMessages(user_id, friend_id)
getGroupMessages(user_id, group_id)
CRUD for group info getGroupNotice / getGroupMembers / getGroupDetail / GetGroupSharedFiles / update...
status should store here. If this field is modified often, we can pull it out to a separate table.
Friend Table (SQL)
Friend 关系存两遍,要不然找的时候很麻烦
这样 shard user_id_1 就可以了
user_id_1
user_id_2
connected_date
last_view_date
yang_123
bob_456
123455
56891874
user_id_1
user_id_2
connected_date
last_view_date
bob_456
yang_123
123455
29642145
Group Membership Table (SQL)
group_id
user_id(shard key)
join_date
role
last_view_date
uuid
yang_123
1235678
Admin
1256564878
uuid
bob_456
12345679
user
1238168745
Channel Table (Redis)
Message Storage Table (DynamoDB)
PartitionKey
SortKey
msg_id
sender_id
message
creation_timestamp
containerId
timestamp-msg_id
uuid
bob_456
"hello world!"
568749321
Group Chat: containerId is group_id.
Direct Message: containerId is sorted {user_id_1, user_id_2}
Emoji Table (redis)
This table is for emoji added on a message.
Good to have(if have some time left): also list group tables, etc.
Bottleneck & scale
Infinite scale: sharding, periodically cleanup
Throttle needed for "bad" user
Design dive deep
Network Protocols
basic relationship:
so for a chat service, the choice of network protocols is important.
Polling
Cons:
expensive; consume precious resources to answer a question that offers empty response
Long Polling
holds the connection open until there are actually new messages available or timeout
Cons:
sender and receiver may not connect to the same chat server. HTTP based servers are usually stateless and the server receives the msg might not have a long-polling connection with the client who receives the msg.
inefficient if a user doesn't chat much
server doesn't have a good way to know if a client is disconnected
WebSocket (Recommended)
WebSocket is the most common solution for sending asynchronous updates from server to client.
It starts its life as a HTTP connection and could be “upgraded” via some well-defined handshake to a WebSocket connection.
Pros:
connection is bi-directional and persistent
simplifies the design and makes implementation on both client and server more straightforward
Stateless vs. Stateful
some services are stateless, like authentication service, service discovery, etc.
some services are stateful, like chat services. All users should talk on chat services
scalability
stateless 的 hosts 可以 easily scale up
对于 stateful 的怎么办呢?
chat server: send/receive msg
presence server: manage online/offline status
API server: user login service, signup, profile service. etc.
notification server: send/push notification
kv store: store chat history
DB selection
generic data,like profile, setting, friend list: relational DB
chat history data: key-value store
easy horizontal scaling
low latency to access data
indexes grow large, random access in relational DB is expensive
fb messenger: HBase; Discord: Cassandra
Data models
message table (1 to 1)
primary key: message_id
message_id
message_from
message_to
content
created_at
uuid
big_int
big_int
text
timestamp
message table for group chat
primary key: (channel_id, message_id)
channel_id(partition key)
message_id
user_id
content
created_at
899876
809707908098
123
"hello world"
24353546
message id
must be unique and sortable
server ticket
twitter snowflake is a good approach
local sequence number generator: easier to implement; sufficient in 1-on-1 or group chat
Service Discovery
recommend best chat server for a client
use Zookeeper
User A tries to log in to the app.
The load balancer sends the login request to API servers.
After the backend authenticates the user, service discovery finds the best chat server for User A. In this example, server 2 is chosen and the server info is returned back to User A.
User A connects to chat server 2 through WebSocket
Message Workflow
1-on-1 chat
User A sends a chat message to Chat server 1.
Chat server 1 obtains a message ID from the ID generator.
Chat server 1 sends the message to the message sync queue.
The message is stored in a key-value store.
If User B is online, the message is forwarded to Chat server 2 where User B is connected.
If User B is offline, a push notification is sent from push notification (PN) servers.
Chat server 2 forwards the message to User B. There is a persistent WebSocket connection between User B and Chat server 2.
Message synchronization across multiple devices
use a local cur_max_message_id to track which is the latest msg
then read from db: find message_id which is larger than cur_max_message_id
Group chat
sending message
User A copied msg to each group member's message sync queue
each client (B, C) only needs to check its own inbox to get new msgs
when group is small, storing a copy in each client's inbox is not expensive. WeChat is using a similar approach and have a limits of group size of 500