Twitter Timeline System¶

Quick Reference Guide for System Design Interviews

Problem Statement¶

Design Twitter's core functionality: users can post tweets, follow other users, and see a timeline of tweets from users they follow. The system should handle high read and write throughput with low latency.

Requirements¶

Functional Requirements¶

Post tweets (280 characters, media optional)
Follow/unfollow users
Home timeline (tweets from followed users)
User timeline (user's own tweets)
Like, retweet, reply to tweets
Search tweets

Non-Functional Requirements¶

Latency: < 200ms for timeline
Availability: 99.99%
Scale: 500M users, 300M DAU
Consistency: Eventually consistent (acceptable)

Back of Envelope Estimation¶

High-Level Architecture¶

Timeline Generation Approaches¶

Approach 1: Pull Model (Fan-out on Read)¶

Approach 2: Push Model (Fan-out on Write)¶

Approach 3: Hybrid Model (Recommended)¶

Data Models¶

Database Schema¶

-- Users table
CREATE TABLE users (
    user_id         UUID PRIMARY KEY,
    username        VARCHAR(50) UNIQUE NOT NULL,
    display_name    VARCHAR(100),
    bio             VARCHAR(500),
    follower_count  BIGINT DEFAULT 0,
    following_count BIGINT DEFAULT 0,
    is_celebrity    BOOLEAN DEFAULT FALSE,
    created_at      TIMESTAMP NOT NULL
);

-- Tweets table (sharded by user_id)
CREATE TABLE tweets (
    tweet_id        BIGINT PRIMARY KEY,  -- Snowflake ID
    user_id         UUID NOT NULL,
    content         VARCHAR(280) NOT NULL,
    media_urls      TEXT[],
    reply_to        BIGINT,
    retweet_of      BIGINT,
    like_count      INT DEFAULT 0,
    retweet_count   INT DEFAULT 0,
    reply_count     INT DEFAULT 0,
    created_at      TIMESTAMP NOT NULL,

    INDEX idx_user_time (user_id, created_at DESC)
);

-- Follower relationship (graph database recommended)
CREATE TABLE follows (
    follower_id     UUID NOT NULL,
    followee_id     UUID NOT NULL,
    created_at      TIMESTAMP NOT NULL,

    PRIMARY KEY (follower_id, followee_id),
    INDEX idx_followee (followee_id)
);

-- Likes
CREATE TABLE likes (
    user_id         UUID NOT NULL,
    tweet_id        BIGINT NOT NULL,
    created_at      TIMESTAMP NOT NULL,

    PRIMARY KEY (user_id, tweet_id)
);

Snowflake ID Generation¶

Fanout Service¶

Timeline Service¶

Caching Strategy¶

Search Architecture¶

Data Partitioning¶

Interview Discussion Points¶

Push vs Pull for timeline?
Push for regular users (fast reads)
Pull for celebrities (avoid slow writes)
Hybrid approach is best
How do you handle celebrities?
Don't fan-out on write
Fetch their tweets on timeline read
Cache their recent tweets
How do you generate unique tweet IDs?
Snowflake IDs: time-ordered, distributed
64-bit, contains timestamp
How do you handle timeline pagination?
Cursor-based with tweet_id
"Get tweets with ID < cursor"
How do you rank the timeline?
Chronological (simple)
ML-based ranking (engagement, relevance)
Mix of followed + recommended
How do you handle deletes?
Soft delete in DB
Async removal from timeline caches
Eventual consistency acceptable

Quick Reference for System Design Interviews