YouTube Video Streaming¶
Quick Reference Guide for System Design Interviews
Problem Statement¶
Design a video streaming platform like YouTube that supports video upload, transcoding, storage, and streaming to millions of concurrent users.
Requirements¶
Functional Requirements¶
- Upload videos
- Stream/playback videos
- Video transcoding (multiple resolutions)
- Adaptive bitrate streaming
- Comments, likes, subscriptions
- Search and recommendations
Non-Functional Requirements¶
- Availability: 99.99%
- Latency: < 200ms video start
- Scale: 2B users, 1B hours watched/day
- Storage: Petabytes of video
Back of Envelope Estimation¶
High-Level Architecture¶
Video Upload Pipeline¶
Video Transcoding¶
Adaptive Bitrate Streaming¶
CDN Architecture¶
Video Playback Flow¶
Data Models¶
-- Videos table
CREATE TABLE videos (
video_id VARCHAR(20) PRIMARY KEY,
user_id UUID NOT NULL,
title VARCHAR(500) NOT NULL,
description TEXT,
duration INT,
status ENUM('uploading', 'processing', 'published', 'failed'),
privacy ENUM('public', 'unlisted', 'private'),
view_count BIGINT DEFAULT 0,
like_count BIGINT DEFAULT 0,
created_at TIMESTAMP NOT NULL,
published_at TIMESTAMP,
INDEX idx_user_id (user_id),
INDEX idx_created (created_at DESC)
);
-- Video files (per quality)
CREATE TABLE video_files (
video_id VARCHAR(20) NOT NULL,
quality VARCHAR(10) NOT NULL, -- '1080p', '720p', etc.
codec VARCHAR(20),
file_size BIGINT,
manifest_url VARCHAR(500),
PRIMARY KEY (video_id, quality)
);
-- View history (for analytics and recommendations)
CREATE TABLE view_events (
event_id UUID PRIMARY KEY,
video_id VARCHAR(20) NOT NULL,
user_id UUID,
watch_time INT, -- seconds watched
timestamp TIMESTAMP,
device_type VARCHAR(20),
INDEX idx_video (video_id, timestamp),
INDEX idx_user (user_id, timestamp)
);
Recommendations¶
Storage Architecture¶
Interview Discussion Points¶
- How do you handle video upload for large files?
- Chunked upload (resumable)
- Direct to S3 with presigned URLs
-
Background transcoding
-
How does adaptive bitrate streaming work?
- Video split into segments
- Multiple quality versions
-
Player switches based on bandwidth
-
How do you scale video delivery?
- Multi-tier CDN
- Edge caching
-
Pre-warm popular content
-
How do you handle transcoding at scale?
- Distributed workers
- Parallel segment encoding
-
Priority queues (popular channels first)
-
How do you store petabytes of video?
- Tiered storage (hot/warm/cold)
- Lifecycle policies
-
Cost optimization
-
How do you handle live streaming?
- Different pipeline (RTMP ingest)
- Near real-time transcoding
- Ultra-low latency CDN