Transfer Learning Based Performance Modeling And Effective Storage Management In Big Data Ecosystems

Fatimah Alsayoud

doi:10.32920/17190185

Abstract

Big data ecosystems contain a mix of sophisticated hardware storage components to support heterogeneous workloads. Storage components and the workloads interact and affect each other; therefore, their relationship has to consider when modeling workloads or managing storage. Efficient workload modeling guides optimal storage management decisions, and the right decisions help guarantee the workload’s needs. The first part of this thesis focuses on workload modeling efficiency, and the second part focuses on cost-effective storage management.<div>Workload performance modeling is an essential step in management decisions. The standard modeling approach constructs the model based on a historical dataset collected from one set of setups (scenario). The standard modeling approach requires the model to be reconstructed from scratch with every time the setups changes. To address this issue, we propose a cross-scenario modeling approach that improves the workload’s performance classification accuracy by up to 78% through adopting the Transfer Learning (TL).<br></div><div>The storage system is the most crucial component of the big data ecosystem, where the workload’s execution process starts by fetching data from it and ends by storing data into it. Thus, the workload’s performance is directly affected by storage capability. To provide a high I/O performance in the ecosystems, Solid State Drive (SSD) are utilized as a tier or as a cache on big data distributed ecosystems. SSDs have a short lifespan that is affected by data size and the number of writing operations. Balancing performance requirements and SSD’s lifespan consumption is never easy, and it’s even harder when interacting with a huge amount of data and with heterogeneous I/O patterns. In this thesis, we analysis big data workloads I/O pattern impacts on SSD’s lifespan when SSD is used as a tier or as a cache. Then, we design a Hidden Markov Model (HMM) based I/O pattern controller that manages workload placement and guarantees cost-effective storage that enhances the workload performance by up to 60%, and improves SSD’s lifespan by up to 40%. </div><div>The designed transfer learning modeling approach and the storage management solutions improve workload modeling accuracy, and the quality of the storage management policies while the testing setup changes.<br></div>

Highlights

Today’s enterprises employ different types of analytics software and techniques to extract the needed knowledge form a massive amount of data
Performance prediction is an essential aspect of several critical system design decisions, such as workload scheduling and resource planning
Developing a model with higher prediction accuracy is a challenging task in big data systems due to the stack complexity and environmental heterogeneity

Summary

Introduction

Today’s enterprises employ different types of analytics software and techniques to extract the needed knowledge form a massive amount of data. To support different workload types and to meet various requirements, the ecosystems contain and interact with complex and heterogeneous software stacks and hardware elements. Big data ecosystems have become one of the main element in today’s information technology environments These ecosystems support big data sets and provide a variety of execution methods to meet system workload requirements. Flash-based SSDs have become one of the main components of most of today’s storage system They provide highly attractive characteristics like low latency and low power consumption. Balancing the big data-based system capacity requirements and the SSD capacity limitations is not a straightforward task For this reason, there is a great deal of desire both in industry and academia to use solutions that can manage and control SSDs in order to reduce storage costs and to improve the overall system’s performance. The model’s results are used to define optimal storage management decisions through adopting policy-based management technology as explained in Subsections 2.3.1 and 2.3.2

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Transfer Learning Based Performance Modeling And Effective Storage Management In Big Data Ecosystems

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Transfer Learning Based Performance Modeling And Effective Storage Management In Big Data Ecosystems
Fatimah Alsayoud
-
Fatimah AlsayoudFatimah Alsayoud
20 Dec 2021
20 Dec 2021

SPOPB: Reducing solid state drive write traffic for flash‐based key‐value caching
Zongwei Li ... Yuchong Hu
Software: Practice and Experience | VOL. 52
Zongwei Li, et. al.Zongwei Li ... Yuchong Hu
21 Nov 2021
Software: Practice and Experience | VOL. 52

Hybrid storage management for database systems
Xin Liu ... Kenneth Salem
Proceedings of the VLDB Endowment | VOL. 6
Xin Liu, et. al.Xin Liu ... Kenneth Salem
01 Jun 2013
Proceedings of the VLDB Endowment | VOL. 6

Unified and efficient HEC storage system with a working-set based reorganization scheme
Junjie Chen ... Yong Chen
-
Junjie Chen, et. al.Junjie Chen ... Yong Chen
01 Sep 2013
01 Sep 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Transfer Learning Based Performance Modeling And Effective Storage Management In Big Data Ecosystems

Abstract

Highlights

Summary

Talk to us

Similar Papers