Design Google Doc

Design Google Doc

Conclave doc 写得不错。

Google Doc official blog 讲了 OT resolve conflict 的流程。写得不错。

Ref: Tech Dummies video 讲得非常非常一般。


Requirements

Functional requirements

  • Users can edit / view the document. They can upload image, photos, videos

  • Size of image or videos can range from 2MB-2GB

  • 100 writers and edit at same time, 10,000 readers can see at same time.

Non-functional requirements

  • Every update should be available to other person within 3 secs.


Traffic Estimation

  • DAU: 50 M

  • Storage:

    • daily: 50 M * 1000 bytes = 50 TB

    • 5 years: 50 TB * 365 * 5 = 90000 TB

  • bandwidth:

    • 50 TB / (24 * 3600) = 600 MB / s


High-level architecture


Detailed Design

Text editing

  • text editing can be split into several operations

    • insert("c", 0)

    • delete(0)

    • update()

    • retain()

Real-time communication

  • we introduce a central relay server to facilitate the communication


Option 1: Operational Transformation (OT)

  • Google Doc use OT

  • if user 1 made changes and sent to user 2, OT will perform transformation to user2 to make it valid, vice versa

  • Con: hard to implement


Option 2: Conflict-Free Replicated Data Type (CRDT)

  • CRDT is a version to strengthen and simplify OT

Globally Unique Characters

  • Each character object must be globally unique

  • This is achieved by assigning Site ID and Site Counter properties whenever a new character is inserted. Since the Site Counter at each site increments whenever inserting or deleting a character, we ensure the global uniqueness of all characters.

Globally Ordered Characters

  • We need all the characters to be globally ordered and consistent.

  • That means that when a user inserts a character, it will be placed in the same position on every user’s copy of the shared document.

  • We can use fractional indices as opposed to numerical indices.


Option 3: Differential Synchronization

  • just like use git diff to do patch

  • 具体读 paper,放在了这个 folder 下面


Limitation of a Central Relay Server

Limitations

  • high latency between users with long distance

  • costly to scale

  • client-server model requires that users trust the server and anyone that has access to it with their data

  • reliance on a central server creates a single point-of-failure

Solution: Peer-to-Peer Architecture

  • This means that instead of relying on a server to relay operations, we can have our users perform that work for free (at least in terms of money).

  • In other words, our users will be responsible for relaying operations to other users they’re connected to.

Protocol for peer-to-peer communication - WebRTC

  • WebRTC is a protocol that was designed for real-time communication over peer-to-peer connections.

  • While WebRTC enables our users to talk directly to one another, a small server is required to initiate those peer-to-peer connections in a process called “signaling”.

Version Vector for solving out-of-order message delivery

  • Let’s say we have 3 peers collaborating on a document. Two of the peers are next to each other while the third is far away. Peer1 types an “A” and sends the operation out to both peers. Since Peer2 is nearby, it quickly receives the operation but decides it doesn’t like it and promptly deletes it.

  • Solution: Version Vector.

    • Whenever an operation is sent out, in addition to the character object and whether it’s an insertion or deletion, we also include the character’s Site ID and Site Counter value. The SiteID indicates who originally sent the operation, and the Counter indicates which operation number it is from that particular user.

    • When a peer receives a delete operation, it’s immediately placed in a Deletion Buffer. If it were an insert, we could just apply it immediately. But with deletes, we have to make sure the character has been inserted first.

    • After every operation (insert or delete), the deletion buffer is “processed” to check if the characters have been inserted yet. In this example, the character has a SiteID of 1 and Counter of 24.

    • After some more time, the insert operation finally arrives at Peer3, and its version vector is updated to reflect that it’s seen 24 operations from Peer1. Since we’ve received a new operation, we again process the deletion buffer. This time, when the deletion operation’s character is compared to the version vector, the delete operation can be removed from the buffer and applied.

Last updated