Boosting Compaction in B-Tree Based Key-Value Store by Exploiting Parallel Reads in Flash SSDs

Jongbaeg Lee,Sang-Won Lee,Gihwan Oh

doi:10.1109/access.2021.3072378

Abstract

Append-only B-tree based key-value stores provide superior search and update performance based on their structural characteristics; however, they periodically require the compaction task that incurs significant I/O overhead. In this paper, we present that the compaction’s degraded read performance deteriorates the overall performance in ForestDB, a representative append-only B-tree engine. We demonstrate that despite the exceptional performance of the SSD, the cause of the slow read performance is the underutilization of the SSD’s internal parallelism due to the read operations using synchronous I/O. Furthermore, this paper proposes a novel compaction method that improves the compaction’s read performance by exploiting SSD’s internal parallelism by requesting multiple read operations in a batch using the asynchronous I/O technique. We implemented our proposed methods on ForestDB using two Linux asynchronous I/O interfaces, AIO and io_uring. The evaluation results confirm that our method drastically improves the compaction’s read performance up to ten times compared to the conventional compaction method. In particular, we confirmed that the proposed method using io_uring, the latest asynchronous I/O interface, is effective regardless of the file I/O mode and outperforms the others in all cases.

Highlights

With the active use of mobile devices and the recent spread of Internet services such as social media, search engines, and e-commerce, there has been increasing interest in deploying storage techniques that effectively store and retrieve massive amounts of unstructured data
We found that the overhead of tree search and read operation during the ForestDB’s compaction is considerable (Section III)
DESIGN To overcome the ForestDB compaction’s slow read performance, we propose a new compaction method, parallel fetch (p-fetch), that submits multiple read requests in a single call and fetches the I/O results in parallel through async I/O, thereby improving the read performance of the compaction by exploiting the solid-state drives (SSDs)’s internal parallelism

Summary

Introduction

With the active use of mobile devices and the recent spread of Internet services such as social media, search engines, and e-commerce, there has been increasing interest in deploying storage techniques that effectively store and retrieve massive amounts of unstructured data. NoSQL databases, especially key-value store techniques, use flexible data structures without schemas and are designed to run on a cluster consists of plural machines [1], [2]. They effectively handle large amounts of unstructured data and are widely used in data-intensive applications [3], [4]. In HB+-trie, each B+-tree’s leaf node points to the disk location of another B+-tree’s root node or actual data Through this index structure, ForestDB minimizes the number of disk blocks accessed for each key-value operation and quickly retrieves desired data from persistent storage. Previous studies have proposed methods that convert random writes of B-tree index to sequential writes to make full use of storage bandwidth [10]–[12]

Methods

Results

Conclusion