Twitter Timeline System¶
Quick Reference Guide for System Design Interviews
Problem Statement¶
Design Twitter's core functionality: users can post tweets, follow other users, and see a timeline of tweets from users they follow. The system should handle high read and write throughput with low latency.
Requirements¶
Functional Requirements¶
- Post tweets (280 characters, media optional)
- Follow/unfollow users
- Home timeline (tweets from followed users)
- User timeline (user's own tweets)
- Like, retweet, reply to tweets
- Search tweets
Non-Functional Requirements¶
- Latency: < 200ms for timeline
- Availability: 99.99%
- Scale: 500M users, 300M DAU
- Consistency: Eventually consistent (acceptable)
Back of Envelope Estimation¶
High-Level Architecture¶
Timeline Generation Approaches¶
Approach 1: Pull Model (Fan-out on Read)¶
Approach 2: Push Model (Fan-out on Write)¶
Approach 3: Hybrid Model (Recommended)¶
Data Models¶
Database Schema¶
-- Users table
CREATE TABLE users (
user_id UUID PRIMARY KEY,
username VARCHAR(50) UNIQUE NOT NULL,
display_name VARCHAR(100),
bio VARCHAR(500),
follower_count BIGINT DEFAULT 0,
following_count BIGINT DEFAULT 0,
is_celebrity BOOLEAN DEFAULT FALSE,
created_at TIMESTAMP NOT NULL
);
-- Tweets table (sharded by user_id)
CREATE TABLE tweets (
tweet_id BIGINT PRIMARY KEY, -- Snowflake ID
user_id UUID NOT NULL,
content VARCHAR(280) NOT NULL,
media_urls TEXT[],
reply_to BIGINT,
retweet_of BIGINT,
like_count INT DEFAULT 0,
retweet_count INT DEFAULT 0,
reply_count INT DEFAULT 0,
created_at TIMESTAMP NOT NULL,
INDEX idx_user_time (user_id, created_at DESC)
);
-- Follower relationship (graph database recommended)
CREATE TABLE follows (
follower_id UUID NOT NULL,
followee_id UUID NOT NULL,
created_at TIMESTAMP NOT NULL,
PRIMARY KEY (follower_id, followee_id),
INDEX idx_followee (followee_id)
);
-- Likes
CREATE TABLE likes (
user_id UUID NOT NULL,
tweet_id BIGINT NOT NULL,
created_at TIMESTAMP NOT NULL,
PRIMARY KEY (user_id, tweet_id)
);
Snowflake ID Generation¶
Fanout Service¶
Timeline Service¶
Caching Strategy¶
Search Architecture¶
Data Partitioning¶
Interview Discussion Points¶
- Push vs Pull for timeline?
- Push for regular users (fast reads)
- Pull for celebrities (avoid slow writes)
-
Hybrid approach is best
-
How do you handle celebrities?
- Don't fan-out on write
- Fetch their tweets on timeline read
-
Cache their recent tweets
-
How do you generate unique tweet IDs?
- Snowflake IDs: time-ordered, distributed
-
64-bit, contains timestamp
-
How do you handle timeline pagination?
- Cursor-based with tweet_id
-
"Get tweets with ID < cursor"
-
How do you rank the timeline?
- Chronological (simple)
- ML-based ranking (engagement, relevance)
-
Mix of followed + recommended
-
How do you handle deletes?
- Soft delete in DB
- Async removal from timeline caches
- Eventual consistency acceptable
Quick Reference for System Design Interviews