Design Ads Logging System

Requirements

Ads Table

adId(PK)

userID

description

created_time

123485

4541

"new ad"

456487

we can use lambda architecture to do handle calculation of how many counts for different ads
Real-time processor is the speed layer - not that accurate count (calculate lastest 1 min, 5 min results)
Batch Processor is the batch layer - accurate count (calculate recent 1 day, 1 month, 1 year results)

{
    eventId: 1254, // specify a unique ad
    advertiserId: 15464,
    userId: 676467,
    eventType: "click",
    timestamp: 45613854,
}

it contains multiple consumers to poll messages from the queue
impression calculated by the stream processor; impression + 1 if it is the user
sharding based on eventId or advertiserId
each of the consumer can handle all events for the same eventId
it is write-heavy, we can use Cassandra (append only) or ElastiSearch to store the results
in the processor, we can also build in-memory hashtable
- use LSM(log-structured merge-tree) as the data structure

Last updated 3 years ago

Was this helpful?