System Design
  • Introduction
  • Glossary of System Design
    • System Design Basics
    • Key Characteristics of Distributed Systems
    • Scalability - Harvard lecture
      • Scalability for Dummies - Part 1: Clones
      • Scalability for Dummies - Part 2: Database
      • Scalability for Dummies - Part 3: Cache
      • Scalability for Dummies - Part 4: Asynchronism
    • Trade-off
      • CAP Theorem
      • Performance vs scalability
      • Latency vs throughput
      • Availability vs consistency
    • Load Balancing
      • Load balancer
    • Proxies
      • Reverse proxy
    • Cache
      • Caching
    • Asynchronism
    • Processing guarantee in Kafka
    • Database
      • Relational database management system (RDBMS)
      • Redundancy and Replication
      • Data Partitioning
      • Indexes
      • NoSQL
      • SQL vs. NoSQL
      • Consistent Hashing
    • Application layer
    • DNS
    • CDN
    • Communication
      • Long-Polling vs WebSockets vs Server-Sent Events
    • Security
    • Lambda Architecture
  • OOD design
    • Concepts
      • Object-Oriented Basics
      • OO Analysis and Design
      • What is UML?
      • Use Case Diagrams
    • Design a parking lot
  • System Design Cases
    • Overview
    • Design a system that scales to millions of users on AWS
    • Designing a URL Shortening service like TinyURL
      • Design Unique ID Generator
      • Designing Pastebin
      • Design Pastebin.com (or Bit.ly)
    • Design notification system (scott)
      • Designing notification service
    • Designing Chat System
      • Designing Slack
      • Designing Facebook Messenger
    • Design Top K System
    • Designing Instagram
    • Design a newsfeed system
      • Designing Facebook’s Newsfeed
      • Design the data structures for a social network
    • Designing Twitter
      • Design the Twitter timeline and search
      • Designing Twitter Search
    • Design Youtube - Scott
      • Design live commenting
      • Designing Youtube or Netflix
    • Designing a Web Crawler
      • Designing a distributed job scheduler
      • Designing a Web Crawler/Archive (scott)
      • Design a web crawler
    • Designing Dropbox
    • Design Google Doc
    • Design Metrics Aggregation System
      • Design Ads Logging System
    • Design Instacart
    • Design a payment system
      • Airbnb - Avoiding Double Payments in a Distributed Payments System
    • Design Distributed Message Queue
      • Cherami: Uber Engineering’s Durable and Scalable Task Queue in Go
    • Design Distributed Cache
      • Design a key-value cache to save the results of the most recent web server queries
    • Design a scalable file distribution system
    • Design Amazon's sales ranking by category feature
    • Design Mint.com
    • Design Autocomplete System
      • Designing Typeahead Suggestion
    • Designing an API Rate Limiter
      • Designing Rate Limiter
    • Design Google Map
      • Designing Yelp or Nearby Friends
      • Designing Uber backend
    • Designing Ticketmaster
      • Design 12306 - Scott
    • Design AirBnB or a Hotel Booking System
  • Paper Reading
    • MapReduce
  • Other Questions
    • What happened after you input the url in the browser?
Powered by GitBook
On this page
  • Design Google Map
  • Requirements
  • High-level design
  • Traffic
  • Database
  • Data schema
  • Database selection
  • Algorithms
  • Quadtrees
  • Geohashes("Z" shape)
  • Hilbert Curves("U" shape)
  • Sharding
  • Based on the business model with region-based sharding
  • Consistent hashing use LocationID
  • Questions

Was this helpful?

  1. System Design Cases

Design Google Map

PreviousDesigning Rate LimiterNextDesigning Yelp or Nearby Friends

Last updated 3 years ago

Was this helpful?

Design Google Map

Ref:

Ref:

Ref: 老印的这个视频讲解了 quad tree 和 Hilbert Curve 的一些原理:


Requirements

  • Design a google map service

  • Support display maps for certain cities(San Francisco, Seattle etc)

  • Support tracking the user movement(GPS, telematics data)

  • Support routing from place A to place B

  • Support the place look up service

  • (Bonus) Tracking of the traffic

  • (Bonus) tracking of traffic light etc


High-level design

  • location service: get location related info that we collect from users

  • navigation service: communicate with user for navigation

  • Spark Streaming cluster: All location pings coming into Kafka will be read by a spark streaming service, which can then use this information to add unidentified roads in our data, identify hot spots, identify change in avg speed etc.

  • graph processing service: processing and rendering map graph to user

  • area search service: covert locations to coordinates; search given locations


Traffic

  • Google Map: 1 billion monthly user

  • Read/Write Ratio:

    • Location service: 1 : 1

    • Navigation service: 1 : 10^5

    • Graph processing service: 10^5 : 1

    • area search service: 10^5 : 1


Database

Data schema

  • Uber uses a Hexagon to index each of the spatial data

    • storing 4 points of (latitude, longitude), area, road length, locale, etc.

  • sample data:

City: Seattle,
Id: 245324,
Name: "McDonald's",
Address: "1520 Eastlake St"
Location: (latitude, longitude)

Database selection

  • use Graph Database: like neo4j

  • Cassandra also has GeoSpatial support

    • use BinTree to store geo data(Store)


Algorithms

Quadtrees

  • Quadtrees are a very straightforward spatial indexing technique.

  • Each node represents a bounding box covering some part of the space being indexed, with the root node covering the entire area.

  • To query a quadtree, starting at the root, examine each child node, and check if it intersects the area being queried for. If it does, recurse into that child node. Whenever you encounter a leaf node, examine each entry to see if it intersects with the query area, and return it if it does.

Geohashes("Z" shape)

  • At this point, you can actually throw out the quadtree itself - the node number, or geohash, contains all the information we need about its location in the tree.

  • Each leaf node in a full-height tree is a complete geohash, and each internal node is represented by the range from its smallest leaf node to its largest one.

  • Range Query: Thus, you can efficiently locate all the points under any internal node by indexing on the geohash by performing a query for everything within the numeric range covered by the desired node.

Hilbert Curves("U" shape)

  • Hilbert Curves are part of a class of one-dimensional fractals known as space-filling curves, so named because they are one dimensional lines that nevertheless fill all available space in a fixed area.


Sharding

Based on the business model with region-based sharding

One problem with region-based sharding is that the server for a region that is more popular can become overloaded with requests in comparison to servers of regions that aren’t as popular.

Consistent hashing use LocationID

  • new POI's LocationID will be put through a hash function

  • query proximity places will use an aggregation server to aggregate geohashing server data. (Use Yelp's design graph as an example below)


Questions

Support tracking the user movement(GPS, telematics data)

  • use navigation service, constantly upload location data(latitude, longitude)

Support routing from place A to place B

  • It has been discussed in neo4j's design: https://neo4j.com/blog/building-spatial-search-algorithms-neo4j/

  • Dijkstra algorithm, which considers weights. As such, Dijkstra will always find the shortest path based on weights. However, it doesn’t go very fast because it’s searching in all directions. It starts from one side and just searches in all directions until it finally finds the other node.

  • A* search is more clever. You give it two functions, one for the weights and the other one for a preferred direction Or it’s an additional cross-function, which in our case implies a preferred direction. A Star tends to be much faster.

Support the place look up service

  • Elasticsearch

Tracking of the traffic

  • Collect user's upload data for analysis

Ref this

Read this , explaining how to do location based search, route finding, etc.

Ref this doc:

Original Video
系统设计系列讲解 Design Point of Interest (POI) or Nearby Interests
Food delivery algorithms: Designing a location database
doc
neo4j spatial search doc
Damn Cool Algorithms: Spatial indexing with Quadtrees and Hilbert Curves