Design a key-value cache to save the results of the most recent web server queries
Design a key-value cache to save the results of the most recent web server queries
Step 1: Outline use cases and constraints
Use cases
User sends a search request resulting in a cache hit
User sends a search request resulting in a cache miss
Service has high availability
Constraints and assumptions
State assumptions
Traffic is not evenly distributed
Popular queries should almost always be in the cache
Need to determine how to expire/refresh
Serving from cache requires fast lookups
Low latency between machines
Limited memory in cache
Need to determine what to keep/remove
Need to cache millions of queries
10 million users
10 billion queries per month
Calculate usage
Cache stores ordered list of key: query, value: results
query
- 50 bytestitle
- 20 bytessnippet
- 200 bytesTotal: 270 bytes
2.7 TB of cache data per month if all 10 billion queries are unique and all are stored
270 bytes per search * 10 billion searches per month
Assumptions state limited memory, need to determine how to expire contents
4,000 requests per second
Handy conversion guide:
2.5 million seconds per month
1 request per second = 2.5 million requests per month
40 requests per second = 100 million requests per month
400 requests per second = 1 billion requests per month
Step 2: Create a high level design
Step 3: Design core components
Use case: User sends a request resulting in a cache hit
Popular queries can be served from a Memory Cache such as Redis or Memcached to reduce read latency and to avoid overloading the Reverse Index Service and Document Service. Reading 1 MB sequentially from memory takes about 250 microseconds, while reading from SSD takes 4x and from disk takes 80x longer.
Since the cache has limited capacity, we'll use a least recently used (LRU) approach to expire older entries.
The Client sends a request to the Web Server, running as a reverse proxy
The Web Server forwards the request to the Query API server
The Query API server does the following:
Parses the query
Removes markup
Breaks up the text into terms
Fixes typos
Normalizes capitalization
Converts the query to use boolean operations
Checks the Memory Cache for the content matching the query
If there's a hit in the Memory Cache, the Memory Cache does the following:
Updates the cached entry's position to the front of the LRU list
Returns the cached contents
Else, the Query API does the following:
Uses the Reverse Index Service to find documents matching the query
The Reverse Index Service ranks the matching results and returns the top ones
Uses the Document Service to return titles and snippets
Updates the Memory Cache with the contents, placing the entry at the front of the LRU list
Cache implementation
The cache can use a doubly-linked list: new items will be added to the head while items to expire will be removed from the tail. We'll use a hash table for fast lookups to each linked list node.
Query API Server implementation:
Node implementation:
LinkedList implementation:
Cache implementation:
When to update the cache
The cache should be updated when:
The page contents change
The page is removed or a new page is added
The page rank changes
The most straightforward way to handle these cases is to simply set a max time that a cached entry can stay in the cache before it is updated, usually referred to as time to live (TTL).
Refer to When to update the cache for tradeoffs and alternatives. The approach above describes cache-aside.
Step 4: Scale the design
Important: Do not simply jump right into the final design from the initial design!
State you would 1) Benchmark/Load Test, 2) Profile for bottlenecks 3) address bottlenecks while evaluating alternatives and trade-offs, and 4) repeat. See Design a system that scales to millions of users on AWS as a sample on how to iteratively scale the initial design.
It's important to discuss what bottlenecks you might encounter with the initial design and how you might address each of them. For example, what issues are addressed by adding a Load Balancer with multiple Web Servers? CDN? Master-Slave Replicas? What are the alternatives and Trade-Offs for each?
We'll introduce some components to complete the design and to address scalability issues. Internal load balancers are not shown to reduce clutter.
To avoid repeating discussions, refer to the following system design topics for main talking points, tradeoffs, and alternatives:
Expanding the Memory Cache to many machines
To handle the heavy request load and the large amount of memory needed, we'll scale horizontally. We have three main options on how to store the data on our Memory Cache cluster:
Each machine in the cache cluster has its own cache - Simple, although it will likely result in a low cache hit rate.
Each machine in the cache cluster has a copy of the cache - Simple, although it is an inefficient use of memory.
The cache is sharded across all machines in the cache cluster - More complex, although it is likely the best option. We could use hashing to determine which machine could have the cached results of a query using
machine = hash(query)
. We'll likely want to use consistent hashing.
Additional talking points
Additional topics to dive into, depending on the problem scope and time remaining.
SQL scaling patterns
NoSQL
Caching
When to update the cache
Asynchronism and microservices
Communications
Discuss tradeoffs:
External communication with clients - HTTP APIs following REST
Internal communications - RPC
Security
Refer to the security section.
Latency numbers
See Latency numbers every programmer should know.
Ongoing
Continue benchmarking and monitoring your system to address bottlenecks as they come up
Scaling is an iterative process
Last updated