Amazon DynamoDB¶

Introduction¶

Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability. It's a key-value and document database delivering single-digit millisecond latency at any scale.

Key Features¶

Fully managed - No servers, patching, or maintenance
Serverless - Scales automatically, pay per use
Single-digit millisecond - Consistent low latency
Unlimited throughput - Scales to handle any traffic
Built-in security - Encryption, IAM, VPC endpoints
Global tables - Multi-region, multi-active replication

When to Use¶

Ideal Use Cases¶

Session management - User sessions, shopping carts
Gaming - Leaderboards, player data, game state
IoT - Time-series data, device status
Mobile backends - User profiles, app data
Ad tech - Real-time bidding, user tracking
E-commerce - Product catalog, order processing
Caching - With DAX for microsecond latency

Signs DynamoDB is Right for You¶

Need consistent single-digit millisecond latency
Have simple access patterns (known queries)
Need to scale massively
Want zero operational overhead
Have high read/write ratios
Don't need complex joins or transactions

Data Model¶

Core Concepts¶

Concept	Description
Table	Collection of items
Item	Single record (like a row)
Attribute	Data element (like a column)
Primary Key	Unique identifier for items
Partition Key (PK)	Hash key, determines data distribution
Sort Key (SK)	Optional range key, enables queries

Key Types¶

DynamoDB Key Types

Secondary Indexes¶

Type	Description	Use Case
Local Secondary Index (LSI)	Same PK, different SK	Alternative sort orders
Global Secondary Index (GSI)	Different PK and SK	Different access patterns

What to Be Careful About¶

Data Modeling¶

Access patterns first - Design tables around queries, not entities
Hot partitions - Uneven key distribution causes throttling
GSI limits - Max 20 GSIs per table
LSI limits - Max 5 LSIs, must be created at table creation
Item size - Max 400 KB per item
No joins - Denormalize or use application-level joins

Cost Management¶

Provisioned vs On-Demand - Choose based on traffic predictability
Over-provisioning - Paying for unused capacity
GSI costs - Each GSI has its own capacity
Storage costs - $0.25/GB/month
Scans - Expensive; use queries instead
Data transfer - Cross-region replication costs

Performance¶

Hot keys - Distribute traffic across partitions
Burst capacity - Limited, don't rely on it
Consistent reads - 2x cost of eventually consistent
Large items - Split across multiple items
Scans - Consume lots of capacity, use sparingly

Consistency¶

Eventually consistent - Default, might return stale data
Strongly consistent - Guaranteed latest, 2x RCU cost
Transactions - 2x cost, but ACID guarantees
Global tables - Eventually consistent across regions

Capacity Modes¶

On-Demand¶

Pay per request
No capacity planning
Instantly scales
Best for: Unpredictable traffic, new applications

Provisioned¶

Specify Read/Write Capacity Units (RCU/WCU)
Auto-scaling available
Reserved capacity for discounts
Best for: Predictable traffic, cost optimization

Capacity Units¶

Operation	Capacity
1 RCU	1 strongly consistent read/sec (up to 4 KB)
1 RCU	2 eventually consistent reads/sec (up to 4 KB)
1 WCU	1 write/sec (up to 1 KB)
Transactional	2x RCU/WCU

Key Features¶

DynamoDB Streams¶

Capture item-level changes
Time-ordered sequence
24-hour retention
Trigger Lambda functions
Use for: Replication, analytics, notifications

Global Tables¶

Multi-region, multi-active
Automatic replication
< 1 second replication latency
Conflict resolution: Last writer wins

DAX (DynamoDB Accelerator)¶

In-memory cache
Microsecond latency
Compatible with DynamoDB API
Use for: Read-heavy workloads

TTL (Time to Live)¶

Automatic item deletion
No additional cost
Use for: Session data, logs, temporary data

Common Interview Questions¶

When would you choose DynamoDB over RDS?
Need unlimited scale
Have simple access patterns
Need single-digit millisecond latency
Don't need complex joins/transactions
Want serverless/zero maintenance
How do you avoid hot partitions?
Use high-cardinality partition keys
Add random suffix to distribute writes
Use write sharding patterns
Use on-demand capacity mode
What's the difference between GSI and LSI?
LSI: Same PK, different SK, created at table creation, shares capacity
GSI: Different PK/SK, can be added later, separate capacity
How do you handle large items?
Compress data
Store large attributes in S3, reference in DynamoDB
Split across multiple items
Use document compression
Explain DynamoDB Streams use cases
Trigger Lambda on data changes
Replicate data to other systems
Build materialized views
Audit logging
Cross-region replication (Global Tables use Streams)

Single Table Design¶

Pattern: Entity per Item¶

Store multiple entity types in one table:

DynamoDB Single Table Design

Benefits¶

Single query fetches related data
Reduces costs (fewer tables)
Simplifies operations

Alternatives¶

AWS Alternatives¶

Service	When to Use Instead
RDS/Aurora	Complex queries, joins, transactions
ElastiCache	Pure caching, sub-millisecond latency
Neptune	Graph relationships
DocumentDB	MongoDB compatibility needed
Keyspaces	Cassandra compatibility needed
Timestream	Time-series data

External Alternatives¶

Provider	Service
Google Cloud	Firestore, Bigtable
Azure	Cosmos DB
MongoDB	MongoDB Atlas
ScyllaDB	DynamoDB-compatible
Apache Cassandra	Self-managed

Best Practices¶

Design for access patterns - Know your queries before designing
Use composite keys - Enable flexible queries
Avoid scans - Use queries with partition key
Distribute partition keys - Prevent hot partitions
Use sparse indexes - GSIs only contain items with index attributes
Enable Point-in-Time Recovery - For backup/restore
Use TTL - Automatically expire old data
Consider single-table design - For related entities
Use DAX for caching - Read-heavy workloads
Monitor with CloudWatch - Throttling, latency, errors

Pricing Summary¶

Component	Cost (US East)
Write Request Unit	$1.25 per million
Read Request Unit	$0.25 per million
Storage	$0.25 per GB/month
Global Tables	1.5x write cost
Streams	$0.02 per 100K reads
DAX	Instance hours