Skip to content

MySQL

What is MySQL?

MySQL is the world's most popular open-source relational database management system (RDBMS), known for its reliability, ease of use, and performance. It's widely used for web applications and is a core component of the LAMP stack.

  • Type: Relational Database (RDBMS)
  • Written in: C, C++
  • License: GPL (Community) / Commercial (Enterprise)
  • Protocol: MySQL wire protocol over TCP
  • Default Port: 3306
  • ACID: Full compliance (with InnoDB)
  • Owned by: Oracle Corporation

Core Features

MySQL offers: - ACID transactions (InnoDB) - Multiple storage engines (InnoDB, MyISAM, Memory) - Replication (async, semi-sync, group replication) - Partitioning - Full-text search - JSON support - Stored procedures, triggers, views - User-defined functions - Query caching - Connection pooling - SSL/TLS encryption - Role-based access control


Architecture

MySQL Server Architecture

InnoDB Architecture

InnoDB Storage Engine Architecture


Storage Engines

Engine ACID Transactions Locking Use Case
InnoDB Yes Yes Row-level Default, general purpose
MyISAM No No Table-level Legacy, read-heavy
Memory No No Table-level Temporary, caching
Archive No No Row-level Logging, archival
NDB Yes Yes Row-level MySQL Cluster

InnoDB vs MyISAM

Feature InnoDB MyISAM
Transactions Yes No
Foreign Keys Yes No
Crash Recovery Yes Limited
Locking Row-level Table-level
Full-text Search Yes (5.6+) Yes
Count(*) Speed Slower Fast (stored)
Storage Larger Smaller

Data Types

Numeric Types

-- Integer types
TINYINT      -- 1 byte (-128 to 127)
SMALLINT     -- 2 bytes (-32768 to 32767)
MEDIUMINT    -- 3 bytes
INT/INTEGER  -- 4 bytes (-2B to 2B)
BIGINT       -- 8 bytes

-- Decimal types
DECIMAL(10,2)  -- Exact precision (for money)
FLOAT          -- 4 bytes, approximate
DOUBLE         -- 8 bytes, approximate

-- Auto-increment
id INT AUTO_INCREMENT PRIMARY KEY
id BIGINT AUTO_INCREMENT PRIMARY KEY

String Types

-- Fixed/Variable length
CHAR(255)      -- Fixed length, padded
VARCHAR(255)   -- Variable length (max 65535 bytes)

-- Text types
TINYTEXT       -- 255 bytes
TEXT           -- 65KB
MEDIUMTEXT     -- 16MB
LONGTEXT       -- 4GB

-- Binary
BINARY(255)
VARBINARY(255)
BLOB, MEDIUMBLOB, LONGBLOB

Date/Time Types

DATE           -- 'YYYY-MM-DD'
TIME           -- 'HH:MM:SS'
DATETIME       -- 'YYYY-MM-DD HH:MM:SS' (no timezone)
TIMESTAMP      -- UTC, auto-converts to session timezone
YEAR           -- YYYY

-- Auto-updating timestamps
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP

JSON Type (MySQL 5.7+)

-- JSON column
CREATE TABLE products (
    id INT PRIMARY KEY,
    attributes JSON
);

-- Insert JSON
INSERT INTO products VALUES (1, '{"color": "red", "size": "L"}');

-- Query JSON
SELECT
    attributes->>'$.color' AS color,
    attributes->'$.size' AS size
FROM products
WHERE JSON_EXTRACT(attributes, '$.color') = 'red';

-- JSON functions
JSON_EXTRACT(json, '$.path')
JSON_SET(json, '$.path', value)
JSON_INSERT(json, '$.path', value)
JSON_REMOVE(json, '$.path')
JSON_CONTAINS(json, value, '$.path')
JSON_ARRAY(), JSON_OBJECT()

Indexing

Index Types

Type Description Use Case
B-tree Default, balanced tree Equality, range queries
Hash Hash table (Memory only) Exact matches
Full-text Text search Natural language search
Spatial R-tree Geospatial data

Index Examples

-- Primary key (clustered index in InnoDB)
CREATE TABLE users (
    id INT PRIMARY KEY,
    email VARCHAR(255)
);

-- Unique index
CREATE UNIQUE INDEX idx_users_email ON users(email);

-- Composite index
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);
-- Supports: WHERE user_id = ? AND created_at > ?
-- Supports: WHERE user_id = ?
-- Does NOT support: WHERE created_at > ? (leftmost prefix rule)

-- Covering index
CREATE INDEX idx_covering ON orders(user_id, status, total);
-- SELECT status, total FROM orders WHERE user_id = ?
-- Can be satisfied entirely from index

-- Prefix index (for long strings)
CREATE INDEX idx_users_email_prefix ON users(email(20));

-- Descending index (MySQL 8.0+)
CREATE INDEX idx_orders_date_desc ON orders(created_at DESC);

-- Full-text index
CREATE FULLTEXT INDEX idx_articles_content ON articles(title, body);
SELECT * FROM articles
WHERE MATCH(title, body) AGAINST('mysql performance' IN NATURAL LANGUAGE MODE);

-- Invisible index (for testing)
CREATE INDEX idx_test ON users(name) INVISIBLE;
ALTER INDEX idx_test VISIBLE;

Index Analysis

-- Check if index is used
EXPLAIN SELECT * FROM users WHERE email = '[email protected]';
EXPLAIN ANALYZE SELECT * FROM users WHERE email = '[email protected]';

-- Show index info
SHOW INDEX FROM users;

-- Index usage statistics (Performance Schema)
SELECT * FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE object_schema = 'mydb' AND object_name = 'users';

-- Unused indexes
SELECT * FROM sys.schema_unused_indexes;

-- Redundant indexes
SELECT * FROM sys.schema_redundant_indexes;

Query Optimization

EXPLAIN Output

EXPLAIN SELECT * FROM orders o
JOIN users u ON o.user_id = u.id
WHERE o.status = 'pending';

-- Key columns:
-- type: ALL (bad), index, range, ref, eq_ref, const (best)
-- possible_keys: Indexes that could be used
-- key: Index actually used
-- rows: Estimated rows to examine
-- Extra: Using where, Using index, Using filesort, Using temporary

Optimization Tips

-- Use covering index
SELECT user_id, status FROM orders WHERE user_id = 123;
-- With: INDEX(user_id, status) - no table lookup needed

-- Avoid SELECT *
SELECT id, name, email FROM users;  -- Better than SELECT *

-- Use LIMIT for pagination
SELECT * FROM orders ORDER BY created_at DESC LIMIT 20 OFFSET 100;
-- Better: Keyset pagination
SELECT * FROM orders WHERE created_at < ? ORDER BY created_at DESC LIMIT 20;

-- Avoid functions on indexed columns
-- Bad:
SELECT * FROM users WHERE YEAR(created_at) = 2024;
-- Good:
SELECT * FROM users WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';

-- Use EXISTS instead of IN for subqueries
-- Instead of:
SELECT * FROM orders WHERE user_id IN (SELECT id FROM users WHERE status = 'active');
-- Use:
SELECT * FROM orders o WHERE EXISTS (
    SELECT 1 FROM users u WHERE u.id = o.user_id AND u.status = 'active'
);

-- Batch INSERT
INSERT INTO logs (message, created_at) VALUES
('msg1', NOW()),
('msg2', NOW()),
('msg3', NOW());
-- Much faster than individual INSERTs

Transactions & Locking

Transaction Isolation Levels

-- Read Uncommitted (dirty reads possible)
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

-- Read Committed (no dirty reads)
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

-- Repeatable Read (default in MySQL)
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

-- Serializable (strictest)
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

Isolation Level Comparison

Level Dirty Read Non-repeatable Read Phantom Read
Read Uncommitted Yes Yes Yes
Read Committed No Yes Yes
Repeatable Read No No No*
Serializable No No No

*InnoDB prevents phantom reads in Repeatable Read using gap locking

Locking Examples

-- Shared lock (read lock)
SELECT * FROM accounts WHERE id = 1 LOCK IN SHARE MODE;
-- or (MySQL 8.0+)
SELECT * FROM accounts WHERE id = 1 FOR SHARE;

-- Exclusive lock (write lock)
SELECT * FROM accounts WHERE id = 1 FOR UPDATE;

-- Skip locked rows (queue processing)
SELECT * FROM tasks WHERE status = 'pending'
FOR UPDATE SKIP LOCKED
LIMIT 1;

-- No wait (fail immediately if locked)
SELECT * FROM accounts WHERE id = 1
FOR UPDATE NOWAIT;

-- Transaction example
START TRANSACTION;
SELECT balance FROM accounts WHERE id = 1 FOR UPDATE;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

Deadlock Handling

-- View recent deadlocks
SHOW ENGINE INNODB STATUS;

-- Deadlock prevention:
-- 1. Access tables in same order
-- 2. Keep transactions short
-- 3. Use lower isolation level if possible
-- 4. Add appropriate indexes

-- Application handling
try {
    executeTransaction();
} catch (DeadlockException e) {
    // Retry transaction
    executeTransaction();
}

Replication

Replication Types

1. Asynchronous Replication (default): Primary writes to binary log, replicas may lag behind.

2. Semi-synchronous Replication: Primary waits for ACK from at least one replica before committing.

3. Group Replication (MySQL 5.7.17+): Paxos-based consensus with multi-primary or single-primary mode.

Replication Setup

-- On Primary
-- Enable binary logging in my.cnf
[mysqld]
server-id = 1
log_bin = mysql-bin
binlog_format = ROW

-- Create replication user
CREATE USER 'repl'@'%' IDENTIFIED BY 'password';
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%';

-- Get binary log position
SHOW MASTER STATUS;

-- On Replica
CHANGE REPLICATION SOURCE TO
    SOURCE_HOST='primary_host',
    SOURCE_USER='repl',
    SOURCE_PASSWORD='password',
    SOURCE_LOG_FILE='mysql-bin.000001',
    SOURCE_LOG_POS=154;

START REPLICA;
SHOW REPLICA STATUS\G

Read/Write Splitting

MySQL Read/Write Splitting


Partitioning

Partition Types

-- Range partitioning
CREATE TABLE orders (
    id INT,
    order_date DATE,
    amount DECIMAL(10,2)
)
PARTITION BY RANGE (YEAR(order_date)) (
    PARTITION p2022 VALUES LESS THAN (2023),
    PARTITION p2023 VALUES LESS THAN (2024),
    PARTITION p2024 VALUES LESS THAN (2025),
    PARTITION pmax VALUES LESS THAN MAXVALUE
);

-- List partitioning
CREATE TABLE orders (
    id INT,
    region VARCHAR(20)
)
PARTITION BY LIST COLUMNS (region) (
    PARTITION p_east VALUES IN ('NY', 'NJ', 'PA'),
    PARTITION p_west VALUES IN ('CA', 'WA', 'OR'),
    PARTITION p_other VALUES IN ('TX', 'FL')
);

-- Hash partitioning
CREATE TABLE sessions (
    id INT,
    user_id INT,
    data TEXT
)
PARTITION BY HASH(user_id)
PARTITIONS 8;

-- Key partitioning (hash on primary key)
CREATE TABLE cache (
    id INT PRIMARY KEY,
    data TEXT
)
PARTITION BY KEY()
PARTITIONS 4;

Partition Management

-- Add partition
ALTER TABLE orders ADD PARTITION (
    PARTITION p2025 VALUES LESS THAN (2026)
);

-- Drop partition (fast delete!)
ALTER TABLE orders DROP PARTITION p2022;

-- Reorganize partitions
ALTER TABLE orders REORGANIZE PARTITION pmax INTO (
    PARTITION p2025 VALUES LESS THAN (2026),
    PARTITION pmax VALUES LESS THAN MAXVALUE
);

-- Check partition pruning
EXPLAIN SELECT * FROM orders WHERE order_date = '2024-01-15';
-- Should show: partitions: p2024

Common Use Cases

1. User Management

CREATE TABLE users (
    id BIGINT AUTO_INCREMENT PRIMARY KEY,
    email VARCHAR(255) NOT NULL,
    password_hash VARCHAR(255) NOT NULL,
    name VARCHAR(255),
    status ENUM('active', 'inactive', 'suspended') DEFAULT 'active',
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    updated_at DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    UNIQUE KEY idx_email (email),
    KEY idx_status (status)
) ENGINE=InnoDB;

2. E-commerce Orders

CREATE TABLE orders (
    id BIGINT AUTO_INCREMENT PRIMARY KEY,
    user_id BIGINT NOT NULL,
    status ENUM('pending', 'paid', 'shipped', 'delivered', 'cancelled'),
    total DECIMAL(10,2),
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    KEY idx_user_id (user_id),
    KEY idx_status_created (status, created_at),
    FOREIGN KEY (user_id) REFERENCES users(id)
) ENGINE=InnoDB;

CREATE TABLE order_items (
    id BIGINT AUTO_INCREMENT PRIMARY KEY,
    order_id BIGINT NOT NULL,
    product_id BIGINT NOT NULL,
    quantity INT NOT NULL,
    price DECIMAL(10,2) NOT NULL,
    KEY idx_order_id (order_id),
    FOREIGN KEY (order_id) REFERENCES orders(id)
) ENGINE=InnoDB;

3. Session Storage

CREATE TABLE sessions (
    id VARCHAR(128) PRIMARY KEY,
    user_id BIGINT,
    data TEXT,
    expires_at DATETIME NOT NULL,
    KEY idx_expires (expires_at),
    KEY idx_user (user_id)
) ENGINE=InnoDB;

-- Cleanup expired sessions
DELETE FROM sessions WHERE expires_at < NOW();

-- Or use MySQL Event Scheduler
CREATE EVENT cleanup_sessions
ON SCHEDULE EVERY 1 HOUR
DO DELETE FROM sessions WHERE expires_at < NOW();

4. Audit Logging

CREATE TABLE audit_log (
    id BIGINT AUTO_INCREMENT,
    table_name VARCHAR(64),
    record_id BIGINT,
    action ENUM('INSERT', 'UPDATE', 'DELETE'),
    old_values JSON,
    new_values JSON,
    user_id BIGINT,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (id, created_at)
)
PARTITION BY RANGE (TO_DAYS(created_at)) (
    PARTITION p_2024_01 VALUES LESS THAN (TO_DAYS('2024-02-01')),
    PARTITION p_2024_02 VALUES LESS THAN (TO_DAYS('2024-03-01')),
    -- ...
    PARTITION p_future VALUES LESS THAN MAXVALUE
);

Performance Tuning

Key Configuration Parameters

[mysqld]
# InnoDB Buffer Pool (70-80% of RAM for dedicated server)
innodb_buffer_pool_size = 12G
innodb_buffer_pool_instances = 8

# Log files
innodb_log_file_size = 1G
innodb_log_buffer_size = 64M

# Flush behavior
innodb_flush_log_at_trx_commit = 1  # ACID compliant
# Set to 2 for better performance (risk: 1 sec of data on crash)

innodb_flush_method = O_DIRECT

# Connections
max_connections = 500
thread_cache_size = 100

# Query cache (deprecated in 8.0, disable in 5.7)
query_cache_type = 0
query_cache_size = 0

# Temp tables
tmp_table_size = 256M
max_heap_table_size = 256M

# Sort/Join buffers
sort_buffer_size = 4M
join_buffer_size = 4M
read_buffer_size = 2M

# Binary log
binlog_format = ROW
sync_binlog = 1  # ACID compliant
expire_logs_days = 7

Monitoring Queries

-- Slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;  -- seconds

-- Current processes
SHOW PROCESSLIST;
SHOW FULL PROCESSLIST;

-- Kill long-running query
KILL <process_id>;

-- Table statistics
SHOW TABLE STATUS LIKE 'users';

-- InnoDB status
SHOW ENGINE INNODB STATUS\G

-- Buffer pool stats
SELECT * FROM information_schema.INNODB_BUFFER_POOL_STATS;

-- Performance Schema
SELECT * FROM performance_schema.events_statements_summary_by_digest
ORDER BY SUM_TIMER_WAIT DESC LIMIT 10;

Trade-offs

Pros Cons
Mature and battle-tested Vertical scaling primarily
Excellent tooling ecosystem Complex replication setup
Wide hosting support Query optimizer less advanced than PostgreSQL
Fast reads with proper indexing No native JSON indexing (GIN)
Strong community Fewer data types than PostgreSQL
Multiple storage engines Oracle ownership concerns
Easy to learn No native table inheritance

MySQL vs Alternatives

Feature MySQL PostgreSQL MariaDB Oracle
ACID Yes (InnoDB) Yes Yes Yes
JSON Good Excellent Good Good
Replication Good Good Good Excellent
Partitioning Good Good Good Excellent
Full-text Good Good Good Excellent
License GPL/Commercial PostgreSQL GPL Commercial
Complexity Low Medium Low High
Cloud Support Excellent Excellent Good Excellent

Best Practices

  1. Always use InnoDB - Unless you have a very specific reason not to
  2. Proper indexing - Analyze slow queries, add appropriate indexes
  3. Use connection pooling - Avoid connection overhead
  4. Normalize, then denormalize - Start normalized, denormalize for performance
  5. Use prepared statements - Prevent SQL injection, better performance
  6. Set appropriate character set - Use utf8mb4 for full Unicode support
  7. Monitor slow queries - Enable slow query log
  8. Regular backups - mysqldump, mysqlpump, or Percona XtraBackup
  9. Keep statistics updated - ANALYZE TABLE periodically
  10. Use EXPLAIN - Before deploying queries

Essential Commands

-- Database
CREATE DATABASE mydb CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
USE mydb;
DROP DATABASE mydb;

-- User management
CREATE USER 'app'@'%' IDENTIFIED BY 'password';
GRANT SELECT, INSERT, UPDATE, DELETE ON mydb.* TO 'app'@'%';
FLUSH PRIVILEGES;

-- Backup
mysqldump -u root -p mydb > backup.sql
mysqldump -u root -p --all-databases > all_backup.sql

-- Restore
mysql -u root -p mydb < backup.sql

-- Table maintenance
ANALYZE TABLE users;  -- Update statistics
OPTIMIZE TABLE users;  -- Defragment (locks table!)
CHECK TABLE users;     -- Check for errors
REPAIR TABLE users;    -- Repair MyISAM table

-- Show variables
SHOW VARIABLES LIKE 'innodb%';
SHOW GLOBAL STATUS LIKE 'Threads%';

-- Size queries
SELECT
    table_schema AS 'Database',
    ROUND(SUM(data_length + index_length) / 1024 / 1024, 2) AS 'Size (MB)'
FROM information_schema.tables
GROUP BY table_schema;