NVMe Storage on Bare-Metal Servers
Bare-metal servers give you full access to NVMe hardware without hypervisor overhead. The architectural choice is local NVMe (maximum single-node performance) vs NVMe-oF disaggregation (independent scaling of compute and storage across a cluster).
The Storage Challenge
- Local NVMe is the highest-performance option but ties storage capacity to compute — scaling storage means scaling servers
- NVMe-oF disaggregation decouples compute and storage but adds ~20µs network latency
- Linux kernel NVMe driver tuning (queue depth, IRQ affinity, CPU pinning) is required to achieve rated device IOPS
- NUMA topology must be respected — NVMe queues should be bound to CPUs on the same NUMA node as the PCIe slot
Why NVMe Storage Fits
Zero hypervisor overhead
Bare metal gives NVMe direct PCIe access from the OS. No virtio-blk, no emulation layer, no hypervisor scheduler. The full rated device IOPS is available to applications.
NVMe-oF as a scale-out strategy
NVMe-oF disaggregation lets you add storage capacity independently of compute. A cluster of 10 bare-metal compute nodes can share a single NVMe-oF storage pool, balancing storage utilization without adding compute servers.
Kernel NVMe tuning
Linux's block/nvme driver exposes per-queue depth, CPU affinity, and poll mode settings. With proper tuning (io_uring + nvme poll mode), bare-metal NVMe achieves <5µs submission latency.
SPDK for maximum throughput
SPDK (Storage Performance Development Kit) bypasses the kernel block layer entirely, achieving 10M+ IOPS from a single NVMe device on bare metal with a dedicated CPU core.
Reference Architecture
| Layer | Recommendation |
|---|---|
| Local NVMe use case | Single-node maximum performance, no HA requirement |
| NVMe-oF use case | Multi-node cluster with shared storage pool |
| Kernel config | io_uring, nvme poll mode, NUMA-aware IRQ affinity |
| SPDK | User-space NVMe driver for 10M+ IOPS (dedicated core) |
| Transport (NVMe-oF) | NVMe/TCP (software) or NVMe/RoCE (RDMA NICs) |
Benchmark This Workload
io_uring engine + direct I/O for bare-metal NVMe benchmark
Need shared block storage at NVMe speed?
NVMe over Fabrics (NVMe-oF) extends NVMe performance across standard Ethernet — delivering 25–40µs block storage to any host in your cluster. NVMe/TCP guide →
simplyblock provides production NVMe/TCP block storage for Kubernetes and bare-metal — no proprietary hardware required.