NVMe Storage for Databases (PostgreSQL, MySQL)
Databases are the most demanding random I/O workload in production. Every query, every index scan, and every WAL write touches storage. NVMe SSDs — and NVMe over Fabrics for shared clusters — deliver the sub-millisecond latency that transforms query response times.
The Storage Challenge
- B-tree index scans generate thousands of random 4KB–8KB reads per query — exactly the workload where HDD collapses to 150 IOPS
- Write-Ahead Log (WAL) fsyncs block commit latency; slow fsync means slow transactions
- Autovacuum in PostgreSQL and InnoDB compaction in MySQL generate background I/O that competes with foreground queries
- Shared storage clusters (read replicas, HA failover) require consistent low-latency block access across multiple nodes
Why NVMe Storage Fits
500K–1M random IOPS per device
A single NVMe SSD handles 500K–7M random 4KB IOPS. A PostgreSQL server with 64 parallel worker processes can saturate this; a SATA SSD at 80K IOPS cannot keep up.
10–20µs device latency
PostgreSQL fsync latency drops from 1–5ms (HDD) or 50–100µs (SATA SSD) to 10–20µs on NVMe. That directly cuts transaction commit times for write-heavy OLTP.
NVMe-oF for shared Postgres clusters
NVMe over Fabrics / NVMe/TCP extends NVMe block devices over standard 10/25GbE. A Postgres primary and read replicas all access the same NVMe-oF volume at 25–40µs total latency — faster than a local SATA SSD.
No head-of-line blocking under autovacuum
NVMe's 64,000 queue pairs mean autovacuum I/O does not block foreground query I/O. On SATA AHCI (1 queue, 32 commands), background I/O steals from foreground queries, causing latency spikes.
Reference Architecture
| Layer | Recommendation |
|---|---|
| Primary storage | NVMe SSD (PCIe 4.0) or NVMe-oF/TCP volume |
| WAL / redo log | Separate NVMe namespace for isolation |
| Shared cluster | NVMe-oF target with ANA multipath for HA |
| Kubernetes | NVMe-oF CSI driver (e.g. simplyblock) → PVC per pod |
| Filesystem | ext4 or xfs; disable atime; O_DIRECT for large DBs |
Benchmark This Workload
70/30 read/write, 8K block — approximates PostgreSQL OLTP pattern
Need shared block storage at NVMe speed?
NVMe over Fabrics (NVMe-oF) extends NVMe performance across standard Ethernet — delivering 25–40µs block storage to any host in your cluster. NVMe/TCP guide →
simplyblock provides production NVMe/TCP block storage for Kubernetes and bare-metal — no proprietary hardware required.
Managed PostgreSQL on NVMe
vela.run is a managed PostgreSQL platform built on NVMe/TCP storage — delivering the latency of local NVMe with the flexibility of cloud-native managed Postgres.
vela.run →