Dropbox File Sync System¶
Quick Reference Guide for System Design Interviews
Problem Statement¶
Design a cloud file storage and synchronization system like Dropbox that allows users to store files, sync across devices, and share with others.
Requirements¶
Functional Requirements¶
- Upload/download files
- Sync files across devices
- Share files/folders with others
- Version history
- Offline access
- Conflict resolution
Non-Functional Requirements¶
- Availability: 99.99%
- Durability: 99.999999999% (11 nines)
- Consistency: Eventually consistent (sync)
- Scale: 500M users, 1B files/day uploaded
Back of Envelope Estimation¶
High-Level Architecture¶
File Chunking¶
Upload Flow¶
Sync Protocol¶
Conflict Resolution¶
Deduplication¶
Data Models¶
-- Files metadata
CREATE TABLE files (
file_id UUID PRIMARY KEY,
user_id UUID NOT NULL,
path VARCHAR(4096) NOT NULL,
version INT NOT NULL,
size BIGINT,
is_folder BOOLEAN DEFAULT FALSE,
is_deleted BOOLEAN DEFAULT FALSE,
modified_at TIMESTAMP,
created_at TIMESTAMP,
UNIQUE (user_id, path, version),
INDEX idx_user_path (user_id, path)
);
-- File to chunk mapping
CREATE TABLE file_chunks (
file_id UUID,
version INT,
chunk_index INT,
chunk_hash VARCHAR(64),
PRIMARY KEY (file_id, version, chunk_index)
);
-- Chunks storage reference
CREATE TABLE chunks (
chunk_hash VARCHAR(64) PRIMARY KEY,
size INT,
ref_count INT DEFAULT 1,
storage_url VARCHAR(500),
created_at TIMESTAMP
);
-- Sync state per device
CREATE TABLE sync_cursors (
user_id UUID,
device_id UUID,
cursor VARCHAR(100),
last_sync TIMESTAMP,
PRIMARY KEY (user_id, device_id)
);
Block Storage¶
Notification System¶
Interview Discussion Points¶
- Why chunk files?
- Efficient sync (only changed parts)
- Deduplication saves storage
- Parallel transfers
-
Resume capability
-
How do you handle conflicts?
- Create "conflicted copy"
- User manually resolves
-
Prevention via real-time sync
-
How do you ensure durability?
- Replicate chunks across data centers
- Erasure coding
-
Verify checksums
-
How does sync work efficiently?
- Cursor-based delta sync
- Only changed chunks uploaded
-
Long-polling or WebSocket
-
How do you handle deduplication with encryption?
- Trade-off: security vs efficiency
- Convergent encryption (same content = same key)
-
Or accept lower dedup rate
-
How do you scale to exabytes?
- Custom block storage
- Sharding by user and chunk hash
- Tiered storage (hot/cold)