System Design 学习上手

总有一些犀利的公司试图用System Design来考察new grad的即战力,还是得准备一个。先以设计Twitter为例:

问清楚需求再开始设计,常识。问题例如:

设计API,抽象软件功能。例如:

postTweet(user_id, tweet_data, tweet_location, user_location, timestamp, )

generateTimeline(user_id, current_time, user_location, )

估算scale of the system, 思考scaling, load balancing, caching, partitioning, etc

这一部分的估算详见此贴

For Twitter, e.g.

User: UserID, Name, Email, DoB, CreationData, LastLogin, etc.

Tweet: TweetID, Content, TweetLocation, NumberOfLikes, TimeStamp, etc.

UserFollowos: UserdID1, UserID2

FavoriteTweets: UserID, TweetID, TimeStamp

画block diagram设计图。

For Twitter, at a high level, we would need multiple application servers to serve all the read/write requests with load balancers in front of them for traffic distributions. If we’re assuming that we’ll have a lot more read traffic (as compared to write), we can decide to have separate servers for handling these scenarios. On the backend, we need an efficient database that can store all the tweets and can support a huge number of reads. We would also need a distributed file storage system to store photos and videos.


A plain english introduction to CAP Theorem


1. Horizontal vs Vertical Scaling

2. Load Balancer(负载均衡器)

通常在一个scalable website 的前端会放置一个load balancer, this allows a system to distribute the load evenly so that one server doesn’t crash and take down the whole system

3. Denormalization and NoSQL

Denormalization means adding redundant information into a database to speed up reads

4. Database Partitioning (Sharding)

Sharding means split the data across multiple machines while 你仍然知道哪些data在哪些机器上

5. Caching

缓存,类比LRU cache

6. Asynchronous Processing (异步处理)

耗时较长的slow operation最好be done asynchronously, 否则用户将会被阻塞过长时间。有时可以通过pre-process来加速。

7. Networking metrics

8. Read-heavy vs. Write-heavy

Write-heavy: queuing up the writes (but think about the potential failure)

Read-heavy: cache大法好


设计Facebook, 实现find shortest path between two people 的功能

Step1: 不考虑数据规模,利用bidirectional-BFS找两人之间最短路径

Step 2: Scale up. 实现以下接口,

int machine_index = getMachineIDForUser(personID);

//go to machine # machine_index

Person friend = getPersonWithID(person_id);

首先找server中存储personID的机器ID,再从机器中找到对应的人

Step 3: Optimization.