Abstract
Append-only B-tree based key-value stores provide superior search and update performance based on their structural characteristics; however, they periodically require the compaction task that incurs significant I/O overhead. In this paper, we present that the compaction’s degraded read performance deteriorates the overall performance in ForestDB, a representative append-only B-tree engine. We demonstrate that despite the exceptional performance of the SSD, the cause of the slow read performance is the underutilization of the SSD’s internal parallelism due to the read operations using synchronous I/O. Furthermore, this paper proposes a novel compaction method that improves the compaction’s read performance by exploiting SSD’s internal parallelism by requesting multiple read operations in a batch using the asynchronous I/O technique. We implemented our proposed methods on ForestDB using two Linux asynchronous I/O interfaces, AIO and io_uring. The evaluation results confirm that our method drastically improves the compaction’s read performance up to ten times compared to the conventional compaction method. In particular, we confirmed that the proposed method using io_uring, the latest asynchronous I/O interface, is effective regardless of the file I/O mode and outperforms the others in all cases.
Highlights
With the active use of mobile devices and the recent spread of Internet services such as social media, search engines, and e-commerce, there has been increasing interest in deploying storage techniques that effectively store and retrieve massive amounts of unstructured data
We found that the overhead of tree search and read operation during the ForestDB’s compaction is considerable (Section III)
DESIGN To overcome the ForestDB compaction’s slow read performance, we propose a new compaction method, parallel fetch (p-fetch), that submits multiple read requests in a single call and fetches the I/O results in parallel through async I/O, thereby improving the read performance of the compaction by exploiting the solid-state drives (SSDs)’s internal parallelism
Summary
With the active use of mobile devices and the recent spread of Internet services such as social media, search engines, and e-commerce, there has been increasing interest in deploying storage techniques that effectively store and retrieve massive amounts of unstructured data. NoSQL databases, especially key-value store techniques, use flexible data structures without schemas and are designed to run on a cluster consists of plural machines [1], [2]. They effectively handle large amounts of unstructured data and are widely used in data-intensive applications [3], [4]. In HB+-trie, each B+-tree’s leaf node points to the disk location of another B+-tree’s root node or actual data Through this index structure, ForestDB minimizes the number of disk blocks accessed for each key-value operation and quickly retrieves desired data from persistent storage. Previous studies have proposed methods that convert random writes of B-tree index to sequential writes to make full use of storage bandwidth [10]–[12]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.