DirectNVM: Hardware-accelerated NVMe SSDs for High-performance Embedded Computing

Yu Zou,Amro Awad,Mingjie Lin

doi:10.1145/3463911

Abstract

With data-intensive artificial intelligence (AI) and machine learning (ML) applications rapidly surging, modern high-performance embedded systems, with heterogeneous computing resources, critically demand low-latency and high-bandwidth data communication. As such, the newly emerging NVMe (Non-Volatile Memory Express) protocol, with parallel queuing, access prioritization, and optimized I/O arbitration, starts to be widely adopted as a de facto fast I/O communication interface. However, effectively leveraging the potential of modern NVMe storage proves to be nontrivial and demands fine-grained control, high processing concurrency, and application-specific optimization. Fortunately, modern FPGA devices, capable of efficient parallel processing and application-specific programmability, readily meet the underlying physical layer requirements of the NVMe protocol, therefore providing unprecedented opportunities to implementing a rich-featured NVMe middleware to benefit modern high-performance embedded computing. In this article, we present how to rethink existing accessing mechanisms of NVMe storage and devise innovative hardware-assisted solutions to accelerating NVMe data access performance for the high-performance embedded computing system. Our key idea is to exploit the massively parallel I/O queuing capability, provided by the NVMe storage system, through leveraging FPGAs’ reconfigurability and native hardware computing power to operate transparently to the main processor. Specifically, our DirectNVM system aims at providing effective hardware constructs for facilitating high-performance and scalable userspace storage applications through (1) hardening all the essential NVMe driver functionalities, therefore avoiding expensive OS syscalls and enabling zero-copy data access from the application, (2) relying on hardware for the I/O communication control instead of relying on OS-level interrupts that can significantly reduce both total I/O latency and its variance, and (3) exposing cutting-edge and application-specific weighted-round-robin I/O traffic scheduling to the userspace. To validate our design methodology, we developed a complete DirectNVM system utilizing the Xilinx Zynq MPSoC architecture that incorporates a high-performance application processor (APU) equipped with DDR4 system memory and a hardened configurable PCIe Gen3 block in its programmable logic part. We then measured the storage bandwidth and I/O latency of both our DirectNVM system and a conventional OS-based system when executing the standard FIO benchmark suite [ 2 ]. Specifically, compared against the PetaLinux built-in kernel driver code running on a Zynq MPSoC, our DirectNVM has shown to achieve up to 18.4× higher throughput and up to 4.5× lower latency. To ensure the fairness of our performance comparison, we also measured our DirectNVM system against the Intel SPDK [ 26 ], a highly optimized userspace asynchronous NVMe I/O framework running on a X86 PC system. Our experiment results have shown that our DirectNVM, even running on a considerably less powerful embedded ARM processor than a full-scale AMD processor, achieved up to 2.2× higher throughput and 1.3× lower latency. Furthermore, by experimenting with a multi-threading test case, we have demonstrated that our DirectNVM’s weighted-round-robin scheduling can significantly optimize the bandwidth allocation between latency-constraint frontend applications and other backend applications in real-time systems. Finally, we have developed a theoretical framework of performance modeling with classic queuing theory that can quantitatively define the relationship between a system’s I/O performance and its I/O implementation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

DirectNVM: Hardware-accelerated NVMe SSDs for High-performance Embedded Computing

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Embedded Computing Systems

Lead the way for us

Journal: ACM Transactions on Embedded Computing Systems	Publication Date: Jan 31, 2022
Citations: 2

Similar Papers

Design and Implementation of Optical Fiber SSD Exploiting FPGA Accelerated NVMe
Jingchao Zhang ... Liyan Qiao
IEEE Access | VOL. 7
Jingchao Zhang, et. al.Jingchao Zhang ... Liyan Qiao
01 Jan 2019
IEEE Access | VOL. 7

FastPath: Towards Wire-speed NVMe SSDs
Athanasios Stratikopoulos ... Mikel Lujan
-
Athanasios Stratikopoulos, et. al.Athanasios Stratikopoulos ... Mikel Lujan
08 Jul 2018
08 Jul 2018

FastPath: Towards Wire-Speed NVMe SSDs
Athanasios Stratikopoulos ... John Goodacre
-
Athanasios Stratikopoulos, et. al.Athanasios Stratikopoulos ... John Goodacre
08 Jul 2018
08 Jul 2018

Optimized I/O determinism for emerging NVM-based NVMe SSD in an enterprise system
Seonbong Kim ... Joon-Sung Yang
-
Seonbong Kim, et. al.Seonbong Kim ... Joon-Sung Yang
24 Jun 2018
24 Jun 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DirectNVM: Hardware-accelerated NVMe SSDs for High-performance Embedded Computing

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Embedded Computing Systems