Skip to content

System Design Interview Delivery Framework

A structured approach to delivering system design interviews confidently and effectively.


Interview Timeline (45-60 minutes)

Phase Time Focus
1. Requirements 5-7 min Clarify scope, constraints, scale
2. Estimations 3-5 min Traffic, storage, bandwidth
3. High-Level Design 10-15 min Core components, data flow
4. Deep Dive 15-20 min Detailed design of critical components
5. Bottlenecks & Trade-offs 5-10 min Scaling, failure handling
6. Wrap-up 2-3 min Summary, future improvements

Phase 1: Requirements Clarification (5-7 min)

Why This Matters

  • Shows structured thinking
  • Prevents designing the wrong system
  • Demonstrates communication skills
  • Sets scope to avoid rabbit holes

Functional Requirements

Ask about core features:

"What are the core features we need to support?"
"Who are the users of this system?"
"What actions can users perform?"

Example (Twitter): - Post tweets - Follow users - View timeline/feed - Search tweets

Non-Functional Requirements

Always clarify:

Requirement Questions to Ask
Scale How many users? DAU? Requests/sec?
Performance Latency requirements? (p50, p99)
Availability 99.9%? 99.99%? Can we have downtime?
Consistency Strong or eventual?
Durability Can we lose data? How critical?

Scope Definition

Explicitly state what you will/won't cover:

"For this design, I'll focus on:
 - Core posting and reading flow
 - Feed generation
 - Basic scaling

I'll leave out of scope:
 - Analytics dashboard
 - Ad serving
 - Content moderation"

Sample Questions by System

System Key Questions
URL Shortener Custom URLs? Expiration? Analytics?
Chat App Group chat? Read receipts? Media?
Rate Limiter Per user? Per IP? Distributed?
News Feed Real-time? Ranking algorithm?
File Storage Max file size? Sharing? Versioning?

Phase 2: Back-of-Envelope Estimations (3-5 min)

When to Do This

  • After requirements are clear
  • Before diving into design
  • Shows you think about scale

What to Estimate

1. Traffic (QPS)
   - Write QPS = DAU × writes_per_user / 86,400
   - Read QPS = DAU × reads_per_user / 86,400
   - Peak QPS = Average × 2-3

2. Storage
   - Daily = requests × data_per_request
   - Yearly = daily × 365
   - 5-year = yearly × 5 × replication_factor

3. Bandwidth
   - Incoming = Write_QPS × request_size
   - Outgoing = Read_QPS × response_size

Example Calculation (Verbalize This)

"Let's estimate for a URL shortener:
- 100M new URLs/month → ~40 writes/sec
- 100:1 read ratio → 4,000 reads/sec
- Peak: ~10K reads/sec
- Storage: 500 bytes/URL × 100M × 12 × 5 years = 3TB

So we need a system handling 10K QPS reads with 3TB storage."

Pro Tip

Don't spend too long here. Round numbers are fine: - 86,400 ≈ 100,000 - Use powers of 10


Phase 3: High-Level Design (10-15 min)

Start with the Basics

Draw core components:

Basic System Architecture

Build Incrementally

  1. Single server → Add components as needed
  2. Identify bottlenecks → Address with scaling
  3. Explain each addition → Show your reasoning

Core Components to Consider

Component Purpose When to Add
Load Balancer Distribute traffic Multiple servers
CDN Serve static content Global users, media
Cache Reduce DB load Read-heavy systems
Message Queue Async processing Decouple components
Database Persistent storage Always needed
Search Index Full-text search Search features
Blob Storage Files, media Images, videos

Data Flow

Always explain the data flow:

"When a user posts a tweet:
1. Request hits load balancer
2. Routed to application server
3. Server validates and stores in DB
4. Async job updates follower feeds
5. Invalidates relevant caches
6. Returns success to user"

API Design

Define key endpoints:

POST /tweets
  - body: { content, media_ids }
  - returns: { tweet_id, created_at }

GET /feed?user_id=123&cursor=xxx
  - returns: { tweets: [...], next_cursor }

GET /tweets/{tweet_id}
  - returns: { tweet }

Database Schema

Sketch key tables:

users
  - id (PK)
  - username
  - email
  - created_at

tweets
  - id (PK)
  - user_id (FK)
  - content
  - created_at

follows
  - follower_id (PK)
  - followee_id (PK)
  - created_at

Phase 4: Deep Dive (15-20 min)

How to Choose What to Deep Dive

Option 1: Ask the interviewer

"I can deep dive into feed generation, real-time updates,
or the caching strategy. Which interests you most?"

Option 2: Pick the most critical/complex component - Feed ranking for social media - Consistency for financial systems - Real-time for chat apps

Deep Dive Topics

Database Design

"For the database, I'd use:

Primary DB: PostgreSQL for users/tweets (ACID, relational)
- Partition tweets by user_id
- Index on (user_id, created_at) for timeline

Cache: Redis
- User profiles (high read, low write)
- Hot tweets (viral content)
- Session data

Why not NoSQL for tweets?
- Need joins for mentions, replies
- Strong consistency for tweet creation
- But could shard by user_id at scale"

Caching Strategy

"I'd implement multi-layer caching:

L1: CDN for static assets
L2: Redis for computed feeds
L3: Local cache on app servers

Cache invalidation:
- TTL-based (5 min for feeds)
- Event-driven (new tweet invalidates author's feed)

Cache-aside pattern:
1. Check cache
2. If miss, query DB
3. Populate cache
4. Return data"

Scaling Approach

"To scale reads:
1. Add read replicas
2. Cache hot data
3. CDN for static content

To scale writes:
1. Shard database
2. Async processing via queue
3. Rate limiting

Sharding strategy:
- Shard by user_id (even distribution)
- Consistent hashing (easy to add nodes)"

Phase 5: Bottlenecks & Trade-offs (5-10 min)

Identify Bottlenecks

Proactively address:

"Potential bottlenecks I see:

1. Database writes at scale
   → Solution: Sharding by user_id

2. Feed generation for users with many followers
   → Solution: Pre-compute feeds, push model

3. Hot partitions (celebrity users)
   → Solution: Separate hot/cold data, caching"

Discuss Trade-offs

Show you understand there's no perfect solution:

Decision Trade-off
Push vs Pull feed Storage vs Latency
SQL vs NoSQL Consistency vs Scale
Sync vs Async Latency vs Complexity
Strong vs Eventual consistency Correctness vs Availability
Cache vs No cache Speed vs Staleness

Example Trade-off Discussion

"For feed generation, we have two options:

PUSH Model (Fan-out on write):
+ Fast reads (feed pre-computed)
- Slow writes for popular users
- Storage cost for duplicate data
- Best for: Most users

PULL Model (Fan-out on read):
+ Fast writes
+ Less storage
- Slow reads (compute on demand)
- Best for: Celebrity users

Hybrid approach:
- Push for regular users
- Pull for celebrities (>1M followers)"

Failure Handling

"For fault tolerance:

1. Server failure
   → Multiple instances behind LB
   → Health checks, auto-replacement

2. Database failure
   → Multi-AZ replication
   → Automated failover

3. Cache failure
   → Fall back to database
   → Cache warming on recovery

4. Datacenter failure
   → Cross-region replication
   → DNS failover"

Phase 6: Wrap-up (2-3 min)

Summarize Key Points

"To summarize, we designed a Twitter-like system that:
- Handles 500K read QPS, 5K write QPS
- Uses PostgreSQL sharded by user_id
- Implements hybrid push/pull for feeds
- Caches with Redis for hot data
- Achieves 99.99% availability with multi-AZ"

Mention Future Improvements

"If I had more time, I'd explore:
- Search functionality with Elasticsearch
- Real-time notifications with WebSockets
- Analytics pipeline with Kafka
- ML-based feed ranking"

Communication Best Practices

DO ✅

Practice Example
Think out loud "I'm considering two options here..."
Draw diagrams Always visualize architecture
State assumptions "I'm assuming 80% reads, 20% writes"
Ask for feedback "Does this direction make sense?"
Acknowledge trade-offs "This approach sacrifices X for Y"
Use numbers "This handles ~10K QPS"

DON'T ❌

Anti-pattern Why It's Bad
Jumping to solution Shows lack of process
Over-engineering Adds unnecessary complexity
Ignoring scale Missing the point of the interview
One-way monologue Not collaborative
Getting stuck in details Lose the big picture
Using buzzwords without understanding Easily exposed

Handling Uncertainty

Good: "I'm not 100% sure about X, but my understanding is..."
Good: "I'd need to research X, but I'd approach it by..."
Bad:  Making up facts or pretending to know

Handling Pushback

Interviewer: "What if this doesn't scale?"

Response: "Good point. At higher scale, we could:
1. Add more shards
2. Implement read replicas
3. Use a distributed cache

Shall I explore any of these?"

Common Patterns Reference

Scaling Reads

  1. Caching (Redis, Memcached)
  2. Read replicas
  3. CDN
  4. Database denormalization

Scaling Writes

  1. Sharding/partitioning
  2. Message queues
  3. Batch processing
  4. Write-behind caching

High Availability

  1. Load balancing
  2. Multi-AZ deployment
  3. Database replication
  4. Circuit breakers
  5. Graceful degradation

Low Latency

  1. Caching
  2. CDN
  3. Edge computing
  4. Connection pooling
  5. Async processing

System Design Template

Use this mental checklist:

□ Requirements
  □ Functional (what features?)
  □ Non-functional (scale, latency, availability?)
  □ Scope (what's in/out?)

□ Estimations
  □ Traffic (QPS read/write)
  □ Storage (how much data?)
  □ Bandwidth (network needs?)

□ High-Level Design
  □ Core components
  □ Data flow
  □ API design
  □ Data model

□ Deep Dive
  □ Critical component details
  □ Database choices
  □ Caching strategy
  □ Scaling approach

□ Trade-offs
  □ Identified bottlenecks
  □ Discussed alternatives
  □ Justified decisions

□ Wrap-up
  □ Summary
  □ Future improvements

Quick Reference: Component Selection

Need Solution
Distribute traffic Load Balancer (ALB/NLB)
Serve static content globally CDN (CloudFront)
Reduce database load Cache (Redis/ElastiCache)
Handle async tasks Message Queue (SQS/Kafka)
Store structured data SQL (PostgreSQL/MySQL)
Store unstructured data NoSQL (DynamoDB/MongoDB)
Store files Object Storage (S3)
Full-text search Search Engine (Elasticsearch)
Real-time communication WebSockets/Pub-Sub
Workflow orchestration Step Functions/Airflow
Rate limiting Redis + Token Bucket
Service discovery DNS/Consul/Cloud Map

Sample Opening Statement

"Before diving in, I'd like to understand the requirements.
Let me start with a few clarifying questions:

1. What's the expected scale - daily active users?
2. What are the core features we must support?
3. Are there specific latency or availability requirements?
4. Should I focus on any particular aspect?

[After getting answers]

Great. Based on that, let me outline our scope:
- We'll design X, Y, Z features
- Targeting N users with K QPS
- Aiming for 99.9% availability

Let me start with some quick estimations, then move
to the high-level architecture. Sound good?"

Final Tips

  1. Practice drawing - Diagrams should be clear and quick
  2. Know your numbers - Latencies, capacities, common scales
  3. Have opinions - "I prefer X because..." shows experience
  4. Stay calm - It's a conversation, not an interrogation
  5. Be collaborative - Treat interviewer as a teammate
  6. Manage time - Don't spend 20 min on requirements
  7. Show depth - Anyone can draw boxes; show you understand why