NVMe-oF (NVMe over Fabrics):
Architecture, Transports & Use Cases
NVMe-oF extends the NVMe command set across a network fabric, giving hosts access to remote flash storage with near-local latency. It is the foundation of modern disaggregated storage infrastructure.
What is NVMe-oF?
NVMe-oF (NVMe over Fabrics) is a specification from the NVM Express organization that defines how to transport the NVMe command set across a network instead of a local PCIe bus. A host running an NVMe-oF initiator opens a connection to a storage node running an NVMe-oF target and issues I/O as if the storage were locally attached — including all 64,000 command queues.
Because the NVMe protocol retains its massively parallel queue model over the fabric, NVMe-oF achieves dramatically lower latency than legacy protocols (iSCSI, NFS) that were designed before flash storage existed. Disaggregated NVMe-oF storage means compute nodes and storage nodes can scale independently, eliminating the need to co-locate compute and high-capacity SSDs.
NVMe-oF Transport Options
The NVMe-oF specification defines three transport bindings. Each makes different trade-offs between performance, hardware requirements, and deployment complexity.
NVMe/TCP
NVMe commands encapsulated in TCP segments. Runs on any IP network with standard NICs.
- ✓ Standard Ethernet — no special hardware
- ✓ 25–40µs typical latency
- ✓ Works on any cloud provider
- ✓ Linux kernel support since 5.0
NVMe/RDMA
Uses RDMA (Remote Direct Memory Access) via RoCE or iWARP for kernel-bypass I/O.
- ✓ Sub-20µs latency possible
- ✓ Highest raw throughput
- ⚠ Requires RDMA-capable NICs
- ⚠ Lossless network recommended
NVMe/FC
NVMe over Fibre Channel. A migration path for enterprises with existing FC SAN infrastructure.
- ✓ Reuses FC fabric investment
- ✓ 30–50µs latency
- ✗ Requires FC HBAs and switches
- ✗ High cost, limited scalability
NVMe-oF vs Local NVMe
| Attribute | Local NVMe (PCIe) | NVMe-oF (TCP) |
|---|---|---|
| Latency | 5–20µs | 25–40µs |
| High Availability | None (single node) | Multi-path, HA failover |
| Independent scaling | No — tied to host | Yes — scale storage separately |
| Capacity per host | Limited by PCIe slots | Effectively unlimited |
| Hardware cost | High (SSDs per server) | Shared pool, lower TCO |
| Cloud compatible | Bare metal only | Any cloud, any VM |
NVMe-oF vs iSCSI
| Attribute | iSCSI | NVMe-oF (TCP) |
|---|---|---|
| Protocol overhead | High (SCSI command set) | Minimal (NVMe native) |
| Command queues | 1 queue / 128 commands | 64K queues / 64K commands |
| Latency | 100–200µs | 25–40µs |
| IOPS (same HW) | Baseline | Up to 3.5× higher |
| Hardware req. | Standard NICs | Standard NICs |
| Linux kernel support | Since ~2002 | Since 5.0 (2019) |
NVMe-oF Use Cases
Kubernetes Block Storage
CSI drivers expose NVMe-oF volumes as PersistentVolumes. Pods get local-SSD performance from a shared, HA-protected storage pool — without DAS.
AI / ML Training Data
GPU clusters stream training datasets from a centralized NVMe-oF pool. High IOPS and bandwidth prevent GPU starvation between epochs.
OLTP Databases
PostgreSQL, MySQL, and MongoDB on NVMe-oF achieve sub-millisecond query latency and high transaction throughput with independent storage scaling.
VDI & Virtual Desktops
Boot storms from hundreds of simultaneous VM starts are absorbed by the parallel queue model — no more IOPS contention at peak hours.
Linux Kernel Support
NVMe-oF is fully supported in the mainline Linux kernel. All three transports ship as loadable modules:
Requires: Linux ≥ 5.0, nvme-cli package. Full NVMe/TCP setup guide at nvme-tcp.com →
Deploy NVMe-oF in Kubernetes today
simplyblock.io provides production-ready NVMe/TCP block storage for Kubernetes — native CSI driver, multi-AZ replication, and zero special hardware required.
Talk to an expert →