Approaching DRAM performance by using microsecond-latency flash memory for small-sized random read accesses

Tomoya Suzuki,Hirotsugu Kajihara,Kazuhiro Hiwada,Tatsuo Shiozawa,Shintaro Sano,Shuou Nomura

doi:10.14778/3457390.3457397

Abstract

For applications in which small-sized random accesses frequently occur for datasets that exceed DRAM capacity, placing the datasets on SSD can result in poor application performance. For the read-intensive case we focus on in this paper, low latency flash memory with microsecond read latency is a promising solution. However, when they are used in large numbers to achieve high IOPS (Input/Output operations Per Second), the CPU processing involved in IO requests is an overhead. To tackle the problem, we propose a new access method combining two approaches: 1) optimizing issuance and completion of the IO requests to reduce the CPU overhead. 2) utilizing many contexts with lightweight context switches by stackless coroutines. These reduce the CPU overhead per request to less than 10 ns, enabling read access with DRAM-like overhead, while the access latency longer than DRAM can be hidden by the context switches. We apply the proposed method to graph algorithms such as BFS (Breadth First Search), which involves many small-sized random read accesses. In our evaluation, the large graph data is placed on microsecond-latency flash memories within prototype boards, and it is accessed by the proposed method. As a result, for the synthetic and real-world graphs, the execution times of the graph algorithms are 88--141% of those when all the data are placed in DRAM.

Full Text