Skip to content

Scaling Writes

The Problem

When write traffic overwhelms your database: - Single database becomes bottleneck - Transaction throughput limited - Disk I/O saturation - Replication lag increases - Write latency grows

Write-heavy systems: - Logging/metrics collection - IoT sensor data - Social media posts/activity - Gaming leaderboards - Real-time analytics - Message queues


Options & Trade-offs

1. Database Sharding (Horizontal Partitioning)

Philosophy: "Divide data across multiple database instances"

Sharding

Sharding Strategies

Hash-Based Sharding:

public class HashShardRouter {
    private static final int NUM_SHARDS = 8;

    public int getShard(String userId) {
        int hash = Math.abs(userId.hashCode());
        return hash % NUM_SHARDS;
    }
}

// More consistent: Consistent Hashing
public class ConsistentHashRouter {
    private final TreeMap<Long, Integer> ring = new TreeMap<>();
    private final int virtualNodes = 100;

    public void addShard(int shardId) {
        for (int i = 0; i < virtualNodes; i++) {
            long hash = hash(shardId + "-" + i);
            ring.put(hash, shardId);
        }
    }

    public int getShard(String key) {
        long hash = hash(key);
        Map.Entry<Long, Integer> entry = ring.ceilingEntry(hash);
        return entry != null ? entry.getValue() : ring.firstEntry().getValue();
    }
}

Pros Cons
Even distribution Cross-shard queries complex
Easy to add shards Resharding is hard
Good for random access No range queries

Range-Based Sharding:

public class RangeShardRouter {
    // Users A-H → Shard 1, I-P → Shard 2, Q-Z → Shard 3
    public int getShard(String username) {
        char first = Character.toUpperCase(username.charAt(0));
        if (first <= 'H') return 1;
        if (first <= 'P') return 2;
        return 3;
    }
}

Pros Cons
Range queries work Hotspots (uneven distribution)
Logical grouping Resharding still hard
Easy to understand Popular ranges get hammered

Directory-Based Sharding:

public class DirectoryShardRouter {
    private final Map<String, Integer> lookupTable;

    public int getShard(String entityId) {
        // Lookup in directory service
        return lookupTable.get(entityId);
    }

    public void moveShard(String entityId, int newShard) {
        lookupTable.put(entityId, newShard);
    }
}

Pros Cons
Flexible placement Lookup overhead
Easy rebalancing Directory is SPOF
Custom logic Additional latency

Geographic Sharding:

public class GeoShardRouter {
    public int getShard(String region) {
        return switch (region) {
            case "US" -> 1;
            case "EU" -> 2;
            case "ASIA" -> 3;
            default -> 1;
        };
    }
}

Pros Cons
Data locality Global queries hard
Lower latency Uneven distribution
Compliance (data residency) Cross-region operations

Cross-Shard Operations

// Scatter-Gather for cross-shard queries
public List<User> searchUsers(String query) {
    List<CompletableFuture<List<User>>> futures = shards.stream()
        .map(shard -> CompletableFuture.supplyAsync(() ->
            shard.search(query)))
        .collect(Collectors.toList());

    return futures.stream()
        .map(CompletableFuture::join)
        .flatMap(List::stream)
        .sorted(Comparator.comparing(User::getRelevance).reversed())
        .limit(100)
        .collect(Collectors.toList());
}

2. Write-Behind Caching (Async Writes)

Philosophy: "Accept writes fast, persist asynchronously"

Write-Behind Caching

Implementation:

public class WriteBehindService {
    private final Cache cache;
    private final BlockingQueue<WriteOperation> writeQueue;
    private final ScheduledExecutorService scheduler;

    public void save(Entity entity) {
        // Immediate cache update
        cache.put(entity.getId(), entity);

        // Queue for async database write
        writeQueue.offer(new WriteOperation(entity));
    }

    @PostConstruct
    public void startBackgroundWriter() {
        scheduler.scheduleWithFixedDelay(() -> {
            List<WriteOperation> batch = new ArrayList<>();
            writeQueue.drainTo(batch, 100);

            if (!batch.isEmpty()) {
                try {
                    database.batchInsert(batch);
                    log.info("Persisted {} entities", batch.size());
                } catch (Exception e) {
                    // Re-queue failed operations
                    writeQueue.addAll(batch);
                    log.error("Batch write failed, re-queued", e);
                }
            }
        }, 0, 1, TimeUnit.SECONDS);
    }
}

Pros Cons
Very fast writes Data loss risk
Batching = efficient Complexity
Reduces DB load Read-your-writes issues
Smooths traffic spikes

When to use: - Analytics/metrics - Logging - Non-critical data - When some data loss is acceptable


3. Event Sourcing

Philosophy: "Store changes as immutable events, not current state"

Event Sourcing

Implementation:

// Events
public sealed interface AccountEvent {
    String accountId();
}

public record AccountCreated(String accountId, String name,
                             BigDecimal initialBalance) implements AccountEvent {}
public record MoneyDeposited(String accountId, BigDecimal amount) implements AccountEvent {}
public record MoneyWithdrawn(String accountId, BigDecimal amount) implements AccountEvent {}

// Event Store
@Repository
public class EventStore {
    public void append(String streamId, AccountEvent event) {
        EventRecord record = new EventRecord(
            UUID.randomUUID(),
            streamId,
            event.getClass().getSimpleName(),
            serialize(event),
            Instant.now()
        );
        eventRepository.save(record);
    }

    public List<AccountEvent> getEvents(String streamId) {
        return eventRepository.findByStreamIdOrderByTimestamp(streamId)
            .stream()
            .map(this::deserialize)
            .collect(Collectors.toList());
    }
}

// Aggregate (reconstructed from events)
public class Account {
    private String id;
    private String name;
    private BigDecimal balance = BigDecimal.ZERO;

    public static Account reconstitute(List<AccountEvent> events) {
        Account account = new Account();
        events.forEach(account::apply);
        return account;
    }

    private void apply(AccountEvent event) {
        switch (event) {
            case AccountCreated e -> {
                this.id = e.accountId();
                this.name = e.name();
                this.balance = e.initialBalance();
            }
            case MoneyDeposited e -> this.balance = balance.add(e.amount());
            case MoneyWithdrawn e -> this.balance = balance.subtract(e.amount());
        }
    }
}

Pros Cons
Complete audit trail Complexity
Append-only (fast writes) Event schema evolution
Temporal queries Eventual consistency
Replay/debug Storage grows
Natural fit for CQRS

When to use: - Audit requirements (finance, healthcare) - Complex business logic - When history matters - Event-driven architectures


4. Message Queues (Async Processing)

Philosophy: "Decouple write acceptance from processing"

Message Queue

Implementation:

// Producer (API)
@PostMapping("/orders")
public ResponseEntity<OrderResponse> createOrder(@RequestBody OrderRequest request) {
    String orderId = UUID.randomUUID().toString();

    // Publish to queue
    OrderMessage message = new OrderMessage(orderId, request);
    kafkaTemplate.send("orders", orderId, message);

    // Return immediately
    return ResponseEntity.accepted()
        .body(new OrderResponse(orderId, "PENDING"));
}

// Consumer (Worker)
@KafkaListener(topics = "orders", groupId = "order-processors")
public void processOrder(OrderMessage message) {
    try {
        Order order = orderService.create(message);
        database.save(order);

        // Publish success event
        eventPublisher.publish(new OrderCreatedEvent(order));
    } catch (Exception e) {
        // Send to dead letter queue
        kafkaTemplate.send("orders-dlq", message);
    }
}

Scaling Workers:

# Kubernetes HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: order-worker
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: order-worker
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: External
    external:
      metric:
        name: kafka_consumer_lag
      target:
        type: AverageValue
        averageValue: "1000"

Pros Cons
Decouples components Complexity
Handles traffic spikes Eventual consistency
Retry capability Monitoring overhead
Scale workers independently Message ordering
Natural backpressure

5. Batch Processing

Philosophy: "Accumulate writes, process in bulk"

public class BatchWriter {
    private final List<Entity> buffer = new ArrayList<>();
    private final int batchSize = 1000;
    private final Duration maxWait = Duration.ofSeconds(5);
    private Instant lastFlush = Instant.now();

    public synchronized void write(Entity entity) {
        buffer.add(entity);

        if (buffer.size() >= batchSize ||
            Duration.between(lastFlush, Instant.now()).compareTo(maxWait) > 0) {
            flush();
        }
    }

    private void flush() {
        if (buffer.isEmpty()) return;

        List<Entity> batch = new ArrayList<>(buffer);
        buffer.clear();
        lastFlush = Instant.now();

        // Bulk insert
        database.batchInsert(batch);
    }
}

// Using Spring Batch
@Bean
public Job importJob(JobRepository jobRepository, Step step1) {
    return new JobBuilder("importJob", jobRepository)
        .start(step1)
        .build();
}

@Bean
public Step step1(JobRepository jobRepository,
                  PlatformTransactionManager transactionManager) {
    return new StepBuilder("step1", jobRepository)
        .<InputData, Entity>chunk(1000, transactionManager)
        .reader(reader())
        .processor(processor())
        .writer(writer())
        .build();
}
Pros Cons
Efficient bulk operations Latency (waiting for batch)
Reduces DB round trips Memory usage
Better I/O patterns Partial failure handling
Less overhead

6. Time-Series Databases

Philosophy: "Use databases optimized for time-stamped data"

Options: - InfluxDB - Popular, easy to use - TimescaleDB - PostgreSQL extension - ClickHouse - Very fast analytics - Apache Druid - Real-time analytics - Prometheus - Metrics focused

-- TimescaleDB example
CREATE TABLE metrics (
    time        TIMESTAMPTZ NOT NULL,
    sensor_id   INTEGER,
    temperature DOUBLE PRECISION,
    humidity    DOUBLE PRECISION
);

SELECT create_hypertable('metrics', 'time');

-- Automatic partitioning by time
-- Efficient time-range queries
-- Built-in compression
-- Retention policies
Pros Cons
Optimized for time data Limited query patterns
High write throughput Learning curve
Built-in retention May need migration
Efficient aggregations

7. LSM Tree Databases

Philosophy: "Optimize for writes with log-structured storage"

How LSM Trees Work:

LSM Tree

Databases using LSM: - Cassandra - RocksDB - LevelDB - HBase - ScyllaDB

// Cassandra example - Write-optimized
@Table("user_activity")
public class UserActivity {
    @PartitionKey
    private UUID userId;

    @ClusteringColumn
    private Instant timestamp;

    private String action;
    private String details;
}

// Writes are extremely fast (append-only)
// Reads require merging multiple SSTables
Pros Cons
Very fast writes Slower reads
Append-only (durable) Write amplification
SSD-friendly Space amplification
Good for time-series Compaction CPU usage

8. Distributed Databases

Philosophy: "Built for scale from the ground up"

Options:

Database Type Best For
Cassandra Wide-column High write throughput
CockroachDB Distributed SQL ACID at scale
TiDB Distributed SQL MySQL compatible
YugabyteDB Distributed SQL PostgreSQL compatible
Vitess MySQL sharding MySQL at scale
Spanner Distributed SQL Global consistency
// CockroachDB - Distributed PostgreSQL-compatible
// Automatic sharding, replication, and rebalancing

@Entity
public class Order {
    @Id
    @GeneratedValue(strategy = GenerationType.AUTO)
    private UUID id;  // UUID recommended for distribution

    private String customerId;
    private BigDecimal amount;
}

// Same JPA code, database handles distribution

9. Write Optimization Techniques

Disable/Defer Indexes:

-- Bulk load without indexes
ALTER TABLE orders DISABLE KEYS;
-- Load data
LOAD DATA INFILE 'orders.csv' INTO TABLE orders;
-- Re-enable and rebuild
ALTER TABLE orders ENABLE KEYS;

Prepared Statements (Reduce Parse Time):

PreparedStatement ps = conn.prepareStatement(
    "INSERT INTO orders (id, customer_id, amount) VALUES (?, ?, ?)"
);

for (Order order : orders) {
    ps.setString(1, order.getId());
    ps.setString(2, order.getCustomerId());
    ps.setBigDecimal(3, order.getAmount());
    ps.addBatch();
}

ps.executeBatch();

Connection Pooling:

HikariConfig config = new HikariConfig();
config.setJdbcUrl("jdbc:postgresql://localhost:5432/db");
config.setMaximumPoolSize(20);
config.setMinimumIdle(5);
config.setConnectionTimeout(30000);

HikariDataSource dataSource = new HikariDataSource(config);


Comparison Matrix

Strategy Throughput Consistency Complexity Best For
Sharding Very High Strong High Horizontal scale
Write-Behind Very High Eventual Medium Non-critical data
Event Sourcing High Eventual Very High Audit, complex domains
Message Queue High Eventual Medium Async workloads
Batch Processing High Strong Low Bulk operations
Time-Series DB Very High Varies Medium Metrics, IoT
LSM Databases Very High Eventual Medium Write-heavy workloads
Distributed DB Very High Varies High Global scale

Decision Tree

Decision Tree


Scaling Writes Architecture Example

Scaling Writes Architecture


Key Takeaways

  1. Shard when single DB isn't enough - Most impactful scaling strategy
  2. Use queues for async writes - Decouple acceptance from processing
  3. Batch where possible - Dramatically improves throughput
  4. Choose the right database - LSM for writes, B-tree for reads
  5. Consider event sourcing - Natural append-only pattern
  6. Time-series for time-series - Purpose-built is better
  7. Accept eventual consistency - Often worth the trade-off
  8. Monitor write amplification - Hidden cost in LSM databases