Abstract

Though various applications such as flash memory, cache, storage systems, and even indexing for enterprise big data search, adopt hot data identification schemes, relatively little research has been expended into holistically examining alternative strategies. Rather, researchers tend to classify hot data simplistically by considering one or more frequency metrics, thereby disregarding recency, which is also an important consideration. In practice, different workloads mandate different treatment to achieve effective hot data decisions. This paper proposes a dynamic hot data identification scheme that adopts a workload stack distance approximation. Stack distance is a good recency measure, but it traditionally requires high computational complexity as well as additional space. Since stack distance calculation efficiency is a core component for our dynamic feature design, this paper additionally proposes a stack distance approximation algorithm that significantly reduces both computation and space requirements. To our knowledge, the proposed scheme is the first dynamic hot data identification scheme which judiciously assigns more weight to either recency or frequency based on workload characteristics. Our experiments with diverse realistic workloads demonstrate that our stack distance approximation achieves excellent accuracy (up to a 0.1% error rate) and our dynamic scheme improves performance by as much as 49.8%.

Highlights

  • Hot data identification is a paramount issue in numerous fields [1]

  • To resolve MBF’s inevitable compromise, this paper proposes a dynamic hot data identification scheme that exploits a stack distance approximation

  • EVALUATION SETUP To conduct an extensive and objective evaluation, our proposed dynamic hot data identification scheme is compared with four other schemes: Multiple Bloom Filter-based scheme [1], Multiple Hash Function scheme [14], our proposed dynamic baseline scheme, and Direct Address Method [14]

Read more

Summary

INTRODUCTION

Hot data identification is a paramount issue in numerous fields [1]. For example, NAND (Not-AND) flash-based storage devices, such as SSDs (Solid State Drives) and USB (Universal Serial Bus) flash drives, must adopt an intermediate software layer named FTL (Flash Translation Layer) to hide NAND flash memory idiosyncrasies. MBF suggests considering recency with frequency for effective hot data identification To capture both considerations, it proposed a new data structure (i.e., multiple bloom filters) and recorded information (i.e., LBA hash values) in the bloom filters, selecting one bloom filter in a round robin manner for each request. The main contributions of this paper are as follows: A dynamic hot data identification scheme: Our proposed dynamic scheme effectively captures both recency and frequency together by judiciously selecting a bloom filter. It describes our stack distance approximation algorithm, one of our core features.

TWO LOCALITY MEASURES
WEIGHT ALLOCATOR
A NEW BASELINE ALGORITHM
EXPERIMENTS
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.