Skip to content

Dropbox File Sync System

Quick Reference Guide for System Design Interviews


Problem Statement

Design a cloud file storage and synchronization system like Dropbox that allows users to store files, sync across devices, and share with others.


Requirements

Functional Requirements

  • Upload/download files
  • Sync files across devices
  • Share files/folders with others
  • Version history
  • Offline access
  • Conflict resolution

Non-Functional Requirements

  • Availability: 99.99%
  • Durability: 99.999999999% (11 nines)
  • Consistency: Eventually consistent (sync)
  • Scale: 500M users, 1B files/day uploaded

Back of Envelope Estimation

Capacity Estimation


High-Level Architecture

High-Level Architecture


File Chunking

File Chunking


Upload Flow

Upload Flow


Sync Protocol

Sync Protocol


Conflict Resolution

Conflict Resolution


Deduplication

Deduplication


Data Models

-- Files metadata
CREATE TABLE files (
    file_id         UUID PRIMARY KEY,
    user_id         UUID NOT NULL,
    path            VARCHAR(4096) NOT NULL,
    version         INT NOT NULL,
    size            BIGINT,
    is_folder       BOOLEAN DEFAULT FALSE,
    is_deleted      BOOLEAN DEFAULT FALSE,
    modified_at     TIMESTAMP,
    created_at      TIMESTAMP,

    UNIQUE (user_id, path, version),
    INDEX idx_user_path (user_id, path)
);

-- File to chunk mapping
CREATE TABLE file_chunks (
    file_id         UUID,
    version         INT,
    chunk_index     INT,
    chunk_hash      VARCHAR(64),

    PRIMARY KEY (file_id, version, chunk_index)
);

-- Chunks storage reference
CREATE TABLE chunks (
    chunk_hash      VARCHAR(64) PRIMARY KEY,
    size            INT,
    ref_count       INT DEFAULT 1,
    storage_url     VARCHAR(500),
    created_at      TIMESTAMP
);

-- Sync state per device
CREATE TABLE sync_cursors (
    user_id         UUID,
    device_id       UUID,
    cursor          VARCHAR(100),
    last_sync       TIMESTAMP,

    PRIMARY KEY (user_id, device_id)
);

Block Storage

Block Storage


Notification System

Notification System


Interview Discussion Points

  1. Why chunk files?
  2. Efficient sync (only changed parts)
  3. Deduplication saves storage
  4. Parallel transfers
  5. Resume capability

  6. How do you handle conflicts?

  7. Create "conflicted copy"
  8. User manually resolves
  9. Prevention via real-time sync

  10. How do you ensure durability?

  11. Replicate chunks across data centers
  12. Erasure coding
  13. Verify checksums

  14. How does sync work efficiently?

  15. Cursor-based delta sync
  16. Only changed chunks uploaded
  17. Long-polling or WebSocket

  18. How do you handle deduplication with encryption?

  19. Trade-off: security vs efficiency
  20. Convergent encryption (same content = same key)
  21. Or accept lower dedup rate

  22. How do you scale to exabytes?

  23. Custom block storage
  24. Sharding by user and chunk hash
  25. Tiered storage (hot/cold)