Skip to content

Amazon DynamoDB

Introduction

Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability. It's a key-value and document database delivering single-digit millisecond latency at any scale.

Key Features

  • Fully managed - No servers, patching, or maintenance
  • Serverless - Scales automatically, pay per use
  • Single-digit millisecond - Consistent low latency
  • Unlimited throughput - Scales to handle any traffic
  • Built-in security - Encryption, IAM, VPC endpoints
  • Global tables - Multi-region, multi-active replication

When to Use

Ideal Use Cases

  • Session management - User sessions, shopping carts
  • Gaming - Leaderboards, player data, game state
  • IoT - Time-series data, device status
  • Mobile backends - User profiles, app data
  • Ad tech - Real-time bidding, user tracking
  • E-commerce - Product catalog, order processing
  • Caching - With DAX for microsecond latency

Signs DynamoDB is Right for You

  • Need consistent single-digit millisecond latency
  • Have simple access patterns (known queries)
  • Need to scale massively
  • Want zero operational overhead
  • Have high read/write ratios
  • Don't need complex joins or transactions

Data Model

Core Concepts

Concept Description
Table Collection of items
Item Single record (like a row)
Attribute Data element (like a column)
Primary Key Unique identifier for items
Partition Key (PK) Hash key, determines data distribution
Sort Key (SK) Optional range key, enables queries

Key Types

DynamoDB Key Types

Secondary Indexes

Type Description Use Case
Local Secondary Index (LSI) Same PK, different SK Alternative sort orders
Global Secondary Index (GSI) Different PK and SK Different access patterns

What to Be Careful About

Data Modeling

  • Access patterns first - Design tables around queries, not entities
  • Hot partitions - Uneven key distribution causes throttling
  • GSI limits - Max 20 GSIs per table
  • LSI limits - Max 5 LSIs, must be created at table creation
  • Item size - Max 400 KB per item
  • No joins - Denormalize or use application-level joins

Cost Management

  • Provisioned vs On-Demand - Choose based on traffic predictability
  • Over-provisioning - Paying for unused capacity
  • GSI costs - Each GSI has its own capacity
  • Storage costs - $0.25/GB/month
  • Scans - Expensive; use queries instead
  • Data transfer - Cross-region replication costs

Performance

  • Hot keys - Distribute traffic across partitions
  • Burst capacity - Limited, don't rely on it
  • Consistent reads - 2x cost of eventually consistent
  • Large items - Split across multiple items
  • Scans - Consume lots of capacity, use sparingly

Consistency

  • Eventually consistent - Default, might return stale data
  • Strongly consistent - Guaranteed latest, 2x RCU cost
  • Transactions - 2x cost, but ACID guarantees
  • Global tables - Eventually consistent across regions

Capacity Modes

On-Demand

  • Pay per request
  • No capacity planning
  • Instantly scales
  • Best for: Unpredictable traffic, new applications

Provisioned

  • Specify Read/Write Capacity Units (RCU/WCU)
  • Auto-scaling available
  • Reserved capacity for discounts
  • Best for: Predictable traffic, cost optimization

Capacity Units

Operation Capacity
1 RCU 1 strongly consistent read/sec (up to 4 KB)
1 RCU 2 eventually consistent reads/sec (up to 4 KB)
1 WCU 1 write/sec (up to 1 KB)
Transactional 2x RCU/WCU

Key Features

DynamoDB Streams

  • Capture item-level changes
  • Time-ordered sequence
  • 24-hour retention
  • Trigger Lambda functions
  • Use for: Replication, analytics, notifications

Global Tables

  • Multi-region, multi-active
  • Automatic replication
  • < 1 second replication latency
  • Conflict resolution: Last writer wins

DAX (DynamoDB Accelerator)

  • In-memory cache
  • Microsecond latency
  • Compatible with DynamoDB API
  • Use for: Read-heavy workloads

TTL (Time to Live)

  • Automatic item deletion
  • No additional cost
  • Use for: Session data, logs, temporary data

Common Interview Questions

  1. When would you choose DynamoDB over RDS?
  2. Need unlimited scale
  3. Have simple access patterns
  4. Need single-digit millisecond latency
  5. Don't need complex joins/transactions
  6. Want serverless/zero maintenance

  7. How do you avoid hot partitions?

  8. Use high-cardinality partition keys
  9. Add random suffix to distribute writes
  10. Use write sharding patterns
  11. Use on-demand capacity mode

  12. What's the difference between GSI and LSI?

  13. LSI: Same PK, different SK, created at table creation, shares capacity
  14. GSI: Different PK/SK, can be added later, separate capacity

  15. How do you handle large items?

  16. Compress data
  17. Store large attributes in S3, reference in DynamoDB
  18. Split across multiple items
  19. Use document compression

  20. Explain DynamoDB Streams use cases

  21. Trigger Lambda on data changes
  22. Replicate data to other systems
  23. Build materialized views
  24. Audit logging
  25. Cross-region replication (Global Tables use Streams)

Single Table Design

Pattern: Entity per Item

Store multiple entity types in one table:

DynamoDB Single Table Design

Benefits

  • Single query fetches related data
  • Reduces costs (fewer tables)
  • Simplifies operations

Alternatives

AWS Alternatives

Service When to Use Instead
RDS/Aurora Complex queries, joins, transactions
ElastiCache Pure caching, sub-millisecond latency
Neptune Graph relationships
DocumentDB MongoDB compatibility needed
Keyspaces Cassandra compatibility needed
Timestream Time-series data

External Alternatives

Provider Service
Google Cloud Firestore, Bigtable
Azure Cosmos DB
MongoDB MongoDB Atlas
ScyllaDB DynamoDB-compatible
Apache Cassandra Self-managed

Best Practices

  1. Design for access patterns - Know your queries before designing
  2. Use composite keys - Enable flexible queries
  3. Avoid scans - Use queries with partition key
  4. Distribute partition keys - Prevent hot partitions
  5. Use sparse indexes - GSIs only contain items with index attributes
  6. Enable Point-in-Time Recovery - For backup/restore
  7. Use TTL - Automatically expire old data
  8. Consider single-table design - For related entities
  9. Use DAX for caching - Read-heavy workloads
  10. Monitor with CloudWatch - Throttling, latency, errors

Pricing Summary

Component Cost (US East)
Write Request Unit $1.25 per million
Read Request Unit $0.25 per million
Storage $0.25 per GB/month
Global Tables 1.5x write cost
Streams $0.02 per 100K reads
DAX Instance hours