How to design a TikTok
Functional Requirements
- Video Upload: Allow users to upload videos to the platform.
- Video Playback: Enable users to watch videos.
By developing these, we can construct a minimum viable product.
Non-functional Requirements
- Scalability: Support a massive user base with high concurrency.
- Availability: Ensure the system remains operational even during partial failures.
- Low Latency: Minimize delays in video loading and playback.
- Fault Tolerance: Handle hardware/network failures gracefully without data loss.
Assumptions
- User Base: 1 billion daily active users (DAU).
- Usage Patterns:
- Each user watches 100 videos per day.
- Each user uploads 1 video per day.
- Video Size: Average video size is 10MB.
Database Selection: SQL vs NoSQL
- SQL Databases:
- Pros: Strong consistency, relational data support.
- Cons: Challenges with sharding and hotspot management.
- NoSQL Databases:
- Pros: Cost-effective, horizontally scalable.
- Cons: Limited transactional support.
Video Storage Strategy
Blob Storage (Binary Large Object):
- Optimized for unstructured data (videos, images, audio).
- Ideal for storing and retrieving large volumes of small files efficiently.
How to upload a video
- Since we don’t know what users are uploading, exposing the storage to the interface directly is unsafe.
- A better option is allocating a temporary space to store the original videos uploaded by users.
- When uploading, a video can be cut into small pieces to support breakpoint resume upload when a break happens, and also parallel uploading, which means multiple segments can be uploaded simultaneously. (for a mobile app, the network environment is not stable)
- Once all segments are uploaded, we can use a message queue and a worker pool to merge all the segments and do a file integrity verification. After that, this video should be encoded into different formats because videos of different qualities should be played according to devices and network.
How to watch a video
- To avoid hotspots caused by frequent access to popular videos, we can deploy a CDN near user locations to offload traffic from the blob storage.
- Although storing videos in a CDN speeds up delivery and reduces latency, it also comes with high costs. So, we should make sure only the most popular videos are cached there.
- By introducing an extractor service, it can regularly find popular videos from the blob storage and send them to the CDN, those outdated videos are replaced.
- We can also introduce a streaming protocol like the HTTP Live Streaming from Apple to realize “stream-as-you-go” manner, which improves user experience.
Show off
- We can introduce a recommendation system to recommend videos to users rather than pushing original feed accoring to the time.
- We can introduce a Two-Tower Model to embed user and video features into separate vectors. When a client requests videos, the system can recommend those that match their interests based on vector similarity.
- The pro is the vieo watching time of clients can be extended, the con is the extra cost of hiring a team to construct and deploy the model.
