Databases & OLTP

NVMe Storage for Databases (PostgreSQL, MySQL)

Databases are the most demanding random I/O workload in production. Every query, every index scan, and every WAL write touches storage. NVMe SSDs — and NVMe over Fabrics for shared clusters — deliver the sub-millisecond latency that transforms query response times.

The Storage Challenge

B-tree index scans generate thousands of random 4KB–8KB reads per query — exactly the workload where HDD collapses to 150 IOPS
Write-Ahead Log (WAL) fsyncs block commit latency; slow fsync means slow transactions
Autovacuum in PostgreSQL and InnoDB compaction in MySQL generate background I/O that competes with foreground queries
Shared storage clusters (read replicas, HA failover) require consistent low-latency block access across multiple nodes

Why NVMe Storage Fits

500K–1M random IOPS per device

A single NVMe SSD handles 500K–7M random 4KB IOPS. A PostgreSQL server with 64 parallel worker processes can saturate this; a SATA SSD at 80K IOPS cannot keep up.

10–20µs device latency

PostgreSQL fsync latency drops from 1–5ms (HDD) or 50–100µs (SATA SSD) to 10–20µs on NVMe. That directly cuts transaction commit times for write-heavy OLTP.

NVMe-oF for shared Postgres clusters

NVMe over Fabrics / NVMe/TCP extends NVMe block devices over standard 10/25GbE. A Postgres primary and read replicas all access the same NVMe-oF volume at 25–40µs total latency — faster than a local SATA SSD.

No head-of-line blocking under autovacuum

NVMe's 64,000 queue pairs mean autovacuum I/O does not block foreground query I/O. On SATA AHCI (1 queue, 32 commands), background I/O steals from foreground queries, causing latency spikes.

Reference Architecture

Layer	Recommendation
Primary storage	NVMe SSD (PCIe 4.0) or NVMe-oF/TCP volume
WAL / redo log	Separate NVMe namespace for isolation
Shared cluster	NVMe-oF target with ANA multipath for HA
Kubernetes	NVMe-oF CSI driver (e.g. simplyblock) → PVC per pod
Filesystem	ext4 or xfs; disable atime; O_DIRECT for large DBs

Benchmark This Workload

70/30 read/write, 8K block — approximates PostgreSQL OLTP pattern

fio --name=pgbench --ioengine=libaio --iodepth=64 \ --rw=randrw --rwmixread=70 --bs=8k --direct=1 \ --size=20G --filename=/dev/nvme0n1 --runtime=120

Need shared block storage at NVMe speed?

NVMe over Fabrics (NVMe-oF) extends NVMe performance across standard Ethernet — delivering 25–40µs block storage to any host in your cluster. NVMe/TCP guide →

simplyblock provides production NVMe/TCP block storage for Kubernetes and bare-metal — no proprietary hardware required.

Managed PostgreSQL on NVMe

vela.run is a managed PostgreSQL platform built on NVMe/TCP storage — delivering the latency of local NVMe with the flexibility of cloud-native managed Postgres.

vela.run →

Glossary

IOPS → Latency → Block Storage → Queue Depth → Disaggregated Storage →

Comparisons

NVMe vs HDD → NVMe vs SATA SSD →