Dynamic Hot Data Identification Using a Stack Distance Approximation

Hyeonji Ha,Hyeyin Lee,Daeun Shim,Dongchul Park

doi:10.1109/access.2021.3084851

Abstract

Though various applications such as flash memory, cache, storage systems, and even indexing for enterprise big data search, adopt hot data identification schemes, relatively little research has been expended into holistically examining alternative strategies. Rather, researchers tend to classify hot data simplistically by considering one or more frequency metrics, thereby disregarding recency, which is also an important consideration. In practice, different workloads mandate different treatment to achieve effective hot data decisions. This paper proposes a dynamic hot data identification scheme that adopts a workload stack distance approximation. Stack distance is a good recency measure, but it traditionally requires high computational complexity as well as additional space. Since stack distance calculation efficiency is a core component for our dynamic feature design, this paper additionally proposes a stack distance approximation algorithm that significantly reduces both computation and space requirements. To our knowledge, the proposed scheme is the first dynamic hot data identification scheme which judiciously assigns more weight to either recency or frequency based on workload characteristics. Our experiments with diverse realistic workloads demonstrate that our stack distance approximation achieves excellent accuracy (up to a 0.1% error rate) and our dynamic scheme improves performance by as much as 49.8%.

Highlights

Hot data identification is a paramount issue in numerous fields [1]
To resolve MBF’s inevitable compromise, this paper proposes a dynamic hot data identification scheme that exploits a stack distance approximation
EVALUATION SETUP To conduct an extensive and objective evaluation, our proposed dynamic hot data identification scheme is compared with four other schemes: Multiple Bloom Filter-based scheme [1], Multiple Hash Function scheme [14], our proposed dynamic baseline scheme, and Direct Address Method [14]

Summary

INTRODUCTION

Hot data identification is a paramount issue in numerous fields [1]. For example, NAND (Not-AND) flash-based storage devices, such as SSDs (Solid State Drives) and USB (Universal Serial Bus) flash drives, must adopt an intermediate software layer named FTL (Flash Translation Layer) to hide NAND flash memory idiosyncrasies. MBF suggests considering recency with frequency for effective hot data identification To capture both considerations, it proposed a new data structure (i.e., multiple bloom filters) and recorded information (i.e., LBA hash values) in the bloom filters, selecting one bloom filter in a round robin manner for each request. The main contributions of this paper are as follows: A dynamic hot data identification scheme: Our proposed dynamic scheme effectively captures both recency and frequency together by judiciously selecting a bloom filter. It describes our stack distance approximation algorithm, one of our core features.

TWO LOCALITY MEASURES

WEIGHT ALLOCATOR

A NEW BASELINE ALGORITHM

EXPERIMENTS

Findings

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Dynamic Hot Data Identification Using a Stack Distance Approximation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Efficient Stack Distance Approximation Based on Workload Characteristics
Sooyoung Lim ... Dongchul Park
IEEE Access | VOL. 10
Sooyoung Lim, et. al.Sooyoung Lim ... Dongchul Park
01 Jan 2021
IEEE Access | VOL. 10

Flash-aware Database Transactions

-

01 Jan 2014
01 Jan 2014

Tutorial: Flash-based storage systems modelling, simulation and IO characterisation
Soraya Zertal ... Peter Harrison
-
Soraya Zertal, et. al.Soraya Zertal ... Peter Harrison
01 Jul 2013
01 Jul 2013

Health-Binning
Roman A Pletka ... Saša Tomić
-
Roman A Pletka, et. al.Roman A Pletka ... Saša Tomić
06 Jun 2016
06 Jun 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Dynamic Hot Data Identification Using a Stack Distance Approximation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access