Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet

Pilar Gonzalez-Ferez,Angelos Bilas

doi:10.1109/msst.2015.7208293

Abstract

Small I/O requests are important for a large number of modern workloads in the data center. Traditionally, storage systems have been able to achieve low I/O rates for small I/O operations because of hard disk drive (HDD) limitations that are capable of about 100–150 IOPS (I/O operations per second) per spindle. Therefore, the host CPU processing capacity and network link throughput have been relatively abundant for providing these low rates. With new storage device technologies, such as NAND Flash Solid State Drives (SSDs) and non-volatile memory (NVM), it is becoming common to design storage systems that are able to support millions of small IOPS. At these rates, however, both server CPU and network protocol are emerging as the main bottlenecks for achieving large rates for small I/O requests. Most storage systems in datacenters deliver I/O operations over some network protocol. Although there has been extensive work in low-latency and high-throughput networks, such as Infiniband, Ethernet has dominated the datacenter. In this work we examine how networked storage protocols over raw Ethernet can achieve low, host CPU overhead and increase network link efficiency for small I/O requests. We first analyze in detail the latency and overhead of a networked storage protocol directly over Ethernet and we point out the main inefficiencies. Then, we examine how storage protocols can take advantage of context switch elimination and adaptive batching to reduce CPU and network overhead. Our results show that raw Ethernet is appropriate for supporting fast storage systems. For 4kB requests we reduce server CPU overhead by up to 45%, we improve link utilization by up to 56%, achieving more than 88% of the theoretical link throughput. Effectively, our techniques serve 56% more I/O operations over a 10Gbits/s link than a baseline protocol that does not include our optimizations at the same CPU utilization. Overall, to the best of our knowledge, this is the first work to present a system that is able to achieve 14μs host CPU overhead on both initiator and target for small networked I/Os over raw Ethernet without hardware support. In addition, our approach is able to achieve 287K 4kB IOPS out of the 315K IOPS that are theoretically possible over a 1.2GBytes/s link.

Full Text