Database Internals¶

Storage Engines¶

B-Tree (Most Common)¶

B-Tree Structure

B+ Tree (Variation)¶

B+ Tree Structure

LSM Tree (Log-Structured Merge Tree)¶

LSM Tree

B-Tree vs LSM Tree¶

B-Tree vs LSM Tree

Indexing¶

Index Types¶

Index Types

Composite Index¶

-- Order matters!
CREATE INDEX idx_composite ON orders(customer_id, status, created_at);

-- This index can be used for:
WHERE customer_id = ?                              -- ✓
WHERE customer_id = ? AND status = ?               -- ✓
WHERE customer_id = ? AND status = ? AND created_at > ? -- ✓
WHERE status = ?                                   -- ✗ (need leftmost)
WHERE customer_id = ? AND created_at > ?           -- Partial (skips status)

-- Index-only scan (covering index)
CREATE INDEX idx_covering ON orders(customer_id)
    INCLUDE (status, total);  -- Avoids table lookup

Index Selection Guidelines¶

When to Create Indexes: - Column frequently in WHERE clauses - Column used in JOIN conditions - Column used in ORDER BY - High selectivity (many distinct values)

Avoid index when: - Small tables (full scan is fast) - Low selectivity (few distinct values) - Heavy write workload (index maintenance) - Column rarely queried

Index Maintenance Costs: - INSERT: Update all relevant indexes - UPDATE: Update if indexed column changes - DELETE: Update all relevant indexes - Storage: Each index uses disk space

Query Execution¶

Query Processing Pipeline¶

Query Processing Pipeline

EXPLAIN ANALYZE¶

EXPLAIN ANALYZE
SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'pending'
  AND o.created_at > '2024-01-01';

-- Output:
Hash Join  (cost=10.50..100.25 rows=50 width=200) (actual time=0.5..1.2 rows=45 loops=1)
  Hash Cond: (o.customer_id = c.id)
  -> Index Scan using idx_orders_status on orders o  (actual time=0.1..0.4 rows=100 loops=1)
       Index Cond: (status = 'pending')
       Filter: (created_at > '2024-01-01')
       Rows Removed by Filter: 20
  -> Hash  (actual time=0.3..0.3 rows=1000 loops=1)
       -> Seq Scan on customers c  (actual time=0.01..0.2 rows=1000 loops=1)
Planning Time: 0.5 ms
Execution Time: 1.5 ms

Join Algorithms¶

Nested Loop Join

for each row in outer:
    for each row in inner:
        if match: output

O(n x m), good for small tables or indexed lookups

Hash Join 1. Build hash table from smaller table 2. Probe hash table with larger table

O(n + m), good for equality joins, needs memory

Merge Join (Sort-Merge) 1. Sort both tables on join key 2. Merge sorted lists

O(n log n + m log m), good if already sorted

Transactions & ACID¶

ACID Properties¶

ACID Properties

Isolation Levels¶

Isolation Levels

MVCC (Multi-Version Concurrency Control)¶

MVCC

Write-Ahead Logging (WAL)¶

Write-Ahead Logging

Buffer Pool / Page Cache¶

Buffer Pool

Partitioning¶

Table Partitioning

Common Interview Questions¶

B-Tree vs LSM Tree?
B-Tree: Better reads, in-place updates
LSM: Better writes, append-only, compaction
What is MVCC?
Keep multiple versions of rows
Readers see consistent snapshot
No read locks needed
How does WAL work?
Log changes before data pages
Sequential writes (fast)
Enables crash recovery
When to add an index?
Frequent WHERE conditions
JOIN columns
High selectivity
Avoid on write-heavy, low-selectivity columns
Explain isolation levels
Trade-off between consistency and concurrency
Read Committed most common default
Serializable for full isolation