Design 12306 - Scott

Design 12306 - Scott

Problem Description

Design a 12306 ticket booking system to support the high loads. The peak qps is expected around 1 M. Potentially could grow to 10 M or even higher. There are T number of train routes and each route could be further fragmented to numerous stations.

A customer booking tickets from city 1 to city 2 could require booking several legs from different routes. (eg. Beijing to Hongkong could be separated to: route1 Beijing to Hangzhou, route2, Hangzhou to Guangzhou, route3 Guangzhou to Hongkong). Customers could revoke the booking at any given time.


Traffic Estimation

Data points

  • Total seats per train is around 1000;

  • 2018 total traffic: 3.37 billion;

  • Growth rate: 9.4%

  • highest record is 13M per day Spring of 2019

QPS Estimation

  • System should be able to handle 5 billion passengers per year. That is ~14 million per day.

  • 14 million passengers per day, assuming 85% occupation rate. Number of trains per day 16.5k

  • Peak time throughput doubled to 33k trains with 100% occupation rate. Passengers double times to the 28 million per day.

为什么算的是每个车厢的 QPS?因为如果看后面就会发现:DB 经过 sharding 以后,真正的 bottleneck 就转化成了每个车厢的 qps。总的 qps 再大也没事儿,因为可以把他们分给那个 range 的 servers。


High-level Analysis

  • The bottleneck is with database, in particular, when many people rush to the same trains, there is a possible case of read-modify-write conflicts, ie. A and B checked the empty seat of train 1 and found there was an empty seat available, when they booked it, they had a conflict and only one was able to get the seat.

read-modify-write: 就是两个人在读同一个 record 的时候,也在 modify 这个 record。Either modify existing ones or add a totally new one.

  • Relational database is used to guarantee a strong transaction.

  • Databases have to be sharded in order to handle the traffic. The normal shard key is train-travel-time (Beijing To Shanghai G1234 departs at 12:30pm)

    • 这样可以把 traffic narrow 到某时间的某个车次

    • 如果 QPS 依然很高,可以考虑再根据车厢的节数进行 sharding

  • The separation of train seats are pre-determined and pre-allocated. Pre-allocation is required because there are still some offline agencies to sell the tickets in person, they will need to reserve some tickets. Additionally the traffic pattern is relatively predictable, few rounds of optimization of distributions of tickets should optimize the throughput to near max.

    • 每个车次通过历史数据预留一些车票。因为有些人是通过去火车站买票,有人是通过网上买票

    • 北京 - HongKong,分为:北京-杭州,杭州-广州,广州-HongKong。每个分段的票也都是通过历史数据预留好的。不会出现北京-广州订票,会抢不同分段的票的情况。

    • 把订票和选座分开。这样可以保证订票的 efficiency。先保证把票订上,然后慢慢来选座。


High-level architecture

The central idea is separating the booking(concurrent resource competing) with seat allocation(compute intensive, need dynamic programming sometimes).

The booking DB is focusing on grabbing the seat and showing how many seats remain.

Booking life-cycle

  1. Admin pre-determine and pre-allocate the train schedules, how many seats from Beijing to Shanghai directly, how many seats from Beijing to Shijiangzhuang etc. the data is then stored in the database[In a Configuration database redis is ok].

    • 管理员根据历史数据,提前把分段的票分好。这些数据输入到 DB

    • 后面 booking system 去这个 DB 分票

  2. There are many ways to shard the booking database, here a simple shard based on Hash(Beijing To Shanghai G1234, departs on June 08, 2020 12:30pm) is fine.

  3. The database is using snapshot isolation concurrent control. So read not blocking write and vice versa.

    • snapshot isolation is a guarantee that all reads made in a transaction will see a consistent snapshot of the database (in practice it reads the last committed values that existed at the time it started), and the transaction itself will successfully commit only if no updates it has made conflict with any concurrent updates made since that snapshot.

  4. Proposed booking table schema:

  5. State could be [EMPTY, BOOKED, CANCELLED], EMPTY either means it is not decided or it means that it is in a waiting list state.

  6. Booking table indexed on passenger id, request id and route, primary key is auto increment id. Auto increment id will be used to decide the winner.

    • 这里通过用 auto incremental id 来判断先来后到,比较巧妙!

  7. Each book is therefore reduced to append one row in the booking table. Read write ratio could be 10 : 1,. Estimated read + write qps for one train is 100k most. One sql shards could easily handle.

  8. When a user books tickets, request insert a row in the booking table, then return the increment id and a request id. Clients could async polling(not too frequent) the status of the booking status. Since read / write not blocking each other, the following read is not slowing booking(write).

  9. The display of remaining tickets available is simply a read, therefore not blocking the write. Not a bottleneck.

  10. The tricky part is some users may cancel the booking, so users in the waiting list could be promoted.

    • 我们可以比如设置:1 - 1000 是肯定可以有票的;1000 - 1200 是 waiting list;1200 以后是不可能抢到票的。

    • If the user is certainly within threshold, return success and move to seat allocation.

    • If the user is beyond the waiting list capacity, return failed, mark the booking cancelled.

    • If the user is in the waiting list, wait for a certain period of time(1 - 5 mins), if some prior users cancel the booking, return success. Return failed when timeout.

    • If the waiting list is cleared and there are still some empty seats. Open up to fill again.

  11. A background process could read the rows in the table periodically (every 10 seconds), and decide who claimed the seats, who is in the waitlist and who is out.

  12. A fully booked train could be marked unavailable in a configuration server.

    • 卖完了票的车次,显示 unavailable,不允许继续订票了。

  13. From a technical perspective, dynamic allocation is possible, a greedy algorithm or simple dynamic programming could be used to dynamically allocate the seat. A longer time window may be needed.

  14. Book a trip across several trains will insert rows in all tables, if one insert failed, rest of trips could be automatically deleted. There should be an upper bound of how many trips requested (max = 2 or 3).

  15. The data after train departure could be moved to a historical log db. To keep the db lean.

Seat allocation life-cycle

  1. Seat allocation part is the computing intensive part and may require some heavy lifting.

  2. Seat allocation part could be done sequentially, since users already know they got a seat, they could wait a few more mins to get the exact seat allocation.

  3. Could use either DP or greedy algorithms to allocate seats.

  4. Once a seat is allocated, write to a seat allocation table.

  5. If the seat allocation server crashes, restart by reading the seat allocation table to find out who has been handled and the booking table to find out who needs to be allocated.

Since the number of seats per train is fixed, this design could be scaled infinitely.

Last updated