Scalability for Dummies - Part 1: Clones

Ref: https://www.lecloud.net/post/7295452622/scalability-for-dummies-part-1-clones

Just recently I was asked what it would take to make a web service massively scalable. My answer was lengthy and maybe it is also for other people interesting. So I share it with you here in my blog and split it into parts to make it easier to read. New parts are released on a regular basis. Have fun and your comments are always welcomed!

The other parts of the series “Scalability for Dummies” you can (soon) find here.

Part 1 - Clones

Public servers of a scalable web service are hidden behind a load balancer. This load balancer evenly distributes load (requests from your users) onto your group/cluster of application servers. That means that if, for example, user Steve interacts with your service, he may be served at his first request by server 2, then with his second request by server 9 and then maybe again by server 2 on his third request.

Steve should always get the same results of his request back, independent what server he “landed on”. That leads to the first golden rule for scalability: every server contains exactly the same codebase and does not store any user-related data, like sessions or profile pictures, on local disc or memory.

每个 server 上的配置都要一样，因为用户 hit 到任意一个 server 上，得到的结果都应该是相同的。

Sessions need to be stored in a centralized data store which is accessible to all your application servers. It can be an external database or an external persistent cache, like Redis. An external persistent cache will have better performance than an external database. By external I mean that the data store does not reside on the application servers. Instead, it is somewhere in or near the data center of your application servers.

But what about deployment? How can you make sure that a code change is sent to all your servers without one server still serving old code? This tricky problem is fortunately already solved by the great tool Capistrano. It requires some learning, especially if you are not into Ruby on Rails, but it is definitely both the effort.

After “outsourcing” your sessions and serving the same codebase from all your servers, you can now create an image file from one of these servers (AWS calls this AMI - Amazon Machine Image.) Use this AMI as a “super-clone” that all your new instances are based upon. Whenever you start a new instance/clone, just do an initial deployment of your latest code and you are ready!

为了保证 deployment 的时候，所有的 server 都得更新，并且更新成相同的一个 instance （否则不同的 server 上的环境不一样了）。有很多 tool 可以解决这个事情，类似于像 Amazon 的 VFI 的概念，打包成一个特殊的 instance，对应一个 id，这样 server 就可以对应不同的 code change 了。

PreviousScalability - Harvard lecture NextScalability for Dummies - Part 2: Database

Last updated 5 years ago

Was this helpful?