Bare Metal

NVMe Storage on Bare-Metal Servers

Bare-metal servers give you full access to NVMe hardware without hypervisor overhead. The architectural choice is local NVMe (maximum single-node performance) vs NVMe-oF disaggregation (independent scaling of compute and storage across a cluster).

The Storage Challenge

Local NVMe is the highest-performance option but ties storage capacity to compute — scaling storage means scaling servers
NVMe-oF disaggregation decouples compute and storage but adds ~20µs network latency
Linux kernel NVMe driver tuning (queue depth, IRQ affinity, CPU pinning) is required to achieve rated device IOPS
NUMA topology must be respected — NVMe queues should be bound to CPUs on the same NUMA node as the PCIe slot

Why NVMe Storage Fits

Zero hypervisor overhead

Bare metal gives NVMe direct PCIe access from the OS. No virtio-blk, no emulation layer, no hypervisor scheduler. The full rated device IOPS is available to applications.

NVMe-oF as a scale-out strategy

NVMe-oF disaggregation lets you add storage capacity independently of compute. A cluster of 10 bare-metal compute nodes can share a single NVMe-oF storage pool, balancing storage utilization without adding compute servers.

Kernel NVMe tuning

Linux's block/nvme driver exposes per-queue depth, CPU affinity, and poll mode settings. With proper tuning (io_uring + nvme poll mode), bare-metal NVMe achieves <5µs submission latency.

SPDK for maximum throughput

SPDK (Storage Performance Development Kit) bypasses the kernel block layer entirely, achieving 10M+ IOPS from a single NVMe device on bare metal with a dedicated CPU core.

Reference Architecture

Layer	Recommendation
Local NVMe use case	Single-node maximum performance, no HA requirement
NVMe-oF use case	Multi-node cluster with shared storage pool
Kernel config	io_uring, nvme poll mode, NUMA-aware IRQ affinity
SPDK	User-space NVMe driver for 10M+ IOPS (dedicated core)
Transport (NVMe-oF)	NVMe/TCP (software) or NVMe/RoCE (RDMA NICs)

Benchmark This Workload

io_uring engine + direct I/O for bare-metal NVMe benchmark

fio --name=nvme-direct --ioengine=io_uring --iodepth=128 \ --rw=randread --bs=4k --direct=1 \ --size=10G --filename=/dev/nvme0n1 --runtime=60

Need shared block storage at NVMe speed?

NVMe over Fabrics (NVMe-oF) extends NVMe performance across standard Ethernet — delivering 25–40µs block storage to any host in your cluster. NVMe/TCP guide →

simplyblock provides production NVMe/TCP block storage for Kubernetes and bare-metal — no proprietary hardware required.

Glossary

SPDK → NVMe Controller → PCIe → Disaggregated Storage → Storage Fabric →

Comparisons

NVMe-oF vs iSCSI →