Skip to content

Database Internals


Storage Engines

B-Tree (Most Common)

B-Tree Structure

B+ Tree (Variation)

B+ Tree Structure

LSM Tree (Log-Structured Merge Tree)

LSM Tree

B-Tree vs LSM Tree

B-Tree vs LSM Tree


Indexing

Index Types

Index Types

Composite Index

-- Order matters!
CREATE INDEX idx_composite ON orders(customer_id, status, created_at);

-- This index can be used for:
WHERE customer_id = ?                              -- ✓
WHERE customer_id = ? AND status = ?               -- ✓
WHERE customer_id = ? AND status = ? AND created_at > ? -- ✓
WHERE status = ?                                   -- ✗ (need leftmost)
WHERE customer_id = ? AND created_at > ?           -- Partial (skips status)

-- Index-only scan (covering index)
CREATE INDEX idx_covering ON orders(customer_id)
    INCLUDE (status, total);  -- Avoids table lookup

Index Selection Guidelines

When to Create Indexes: - Column frequently in WHERE clauses - Column used in JOIN conditions - Column used in ORDER BY - High selectivity (many distinct values)

Avoid index when: - Small tables (full scan is fast) - Low selectivity (few distinct values) - Heavy write workload (index maintenance) - Column rarely queried

Index Maintenance Costs: - INSERT: Update all relevant indexes - UPDATE: Update if indexed column changes - DELETE: Update all relevant indexes - Storage: Each index uses disk space


Query Execution

Query Processing Pipeline

Query Processing Pipeline

EXPLAIN ANALYZE

EXPLAIN ANALYZE
SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'pending'
  AND o.created_at > '2024-01-01';

-- Output:
Hash Join  (cost=10.50..100.25 rows=50 width=200) (actual time=0.5..1.2 rows=45 loops=1)
  Hash Cond: (o.customer_id = c.id)
  -> Index Scan using idx_orders_status on orders o  (actual time=0.1..0.4 rows=100 loops=1)
       Index Cond: (status = 'pending')
       Filter: (created_at > '2024-01-01')
       Rows Removed by Filter: 20
  -> Hash  (actual time=0.3..0.3 rows=1000 loops=1)
       -> Seq Scan on customers c  (actual time=0.01..0.2 rows=1000 loops=1)
Planning Time: 0.5 ms
Execution Time: 1.5 ms

Join Algorithms

Nested Loop Join

for each row in outer:
    for each row in inner:
        if match: output
O(n x m), good for small tables or indexed lookups

Hash Join 1. Build hash table from smaller table 2. Probe hash table with larger table

O(n + m), good for equality joins, needs memory

Merge Join (Sort-Merge) 1. Sort both tables on join key 2. Merge sorted lists

O(n log n + m log m), good if already sorted


Transactions & ACID

ACID Properties

ACID Properties

Isolation Levels

Isolation Levels

MVCC (Multi-Version Concurrency Control)

MVCC


Write-Ahead Logging (WAL)

Write-Ahead Logging


Buffer Pool / Page Cache

Buffer Pool


Partitioning

Table Partitioning


Common Interview Questions

  1. B-Tree vs LSM Tree?
  2. B-Tree: Better reads, in-place updates
  3. LSM: Better writes, append-only, compaction

  4. What is MVCC?

  5. Keep multiple versions of rows
  6. Readers see consistent snapshot
  7. No read locks needed

  8. How does WAL work?

  9. Log changes before data pages
  10. Sequential writes (fast)
  11. Enables crash recovery

  12. When to add an index?

  13. Frequent WHERE conditions
  14. JOIN columns
  15. High selectivity
  16. Avoid on write-heavy, low-selectivity columns

  17. Explain isolation levels

  18. Trade-off between consistency and concurrency
  19. Read Committed most common default
  20. Serializable for full isolation