Ordinal Optimization-Based Performance Model Estimation Method for HDFS

Tian Ma,Bo Dong,Feng Tian

doi:10.1109/access.2019.2962724

Abstract

Modeling and analyzing the performance of distributed file systems (DFSs) benefit the reliability and quality of data processing in data-intensive applications. Hadoop Distributed File System (HDFS) is a typical representative of DFSs. Its internal heterogeneity and complexity as well as external disturbance contribute to HDFS's built-in features of nonlinearity as well as randomness in system level, which raises a great challenge in modeling these features. Particularly, the randomness results in the uncertainty of HDFS performance model. Due to the complex mathematical structure and parameters hardly estimated of analytical models, it is highly complicated and computationally impossible to build an explicit and precise analytical model of the randomness. The measurement-based methodology is a promising way to model HDFS performance in terms of randomness since it requires no knowledge of system's internal behaviors. In this paper, the estimation of HDFS performance models on account of the randomness is transformed to an optimization problem of finding out the real best design of performance model structure with large design space. Core ideas of ordinal optimization (OO) are introduced to solve this problem with a limited computing budget. Piecewise linear (PL) model is applied to approximate the nonlinear characteristics and randomness of HDFS performance. The experimental results show that the proposed method is effective and practical to estimate the optimal design of the PL-based performance model structure for HDFS. It not only provides a globally consistent evaluation of the design space but also guarantees the goodness of the solution with high probability. Moreover, it improves the accuracy of system model-based HDFS performance models.

Highlights

The rapid growing internet services such as Google, Yahoo!, Amazon and Facebook, are a representative style of data-intensive applications
Based on the comparison above, it is obvious that the optimal Hadoop Distributed File System (HDFS) W/R performance model built by the proposed method improves the accuracy of system model-based HDFS W/R performance models
The randomness practically results in the uncertainty of performance model for HDFS. Since it is highly complicated and mathematically impossible to build explicit and precise analytical models of the randomness, Piecewise linear (PL) model is applied to approximately characterize HDFS performance driven by the measurement-based methodology combined with system identification

Summary

INTRODUCTION

The rapid growing internet services such as Google, Yahoo!, Amazon and Facebook, are a representative style of data-intensive applications. A classical methodology is to build analytical models of the system’s internal architecture, components, or working mechanism (e.g., [3], [15], [16]), which significantly requires solid professional knowledge of the system It is highly complicated and computationally impossible to build an explicit and precise analytical performance models in terms of the randomness of performance by using this methodology. Heavy computational burden including time and resource costs ensues owing to the large design space By this means, the estimation of HDFS performance on account of the randomness is transformed to an optimization problem of finding out the real best design (a design is a solution to an optimization problem) of performance model using a limited computing budget.

OVERVIEW OF ORDINAL OPTIMIZATION

PROCEDURE OF THE PROPOSED METHOD

EXPERIMENTS AND ANALYSIS

INITIAL PARAMETER SETTINGS

RELATED WORK

Findings

CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2020
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Ordinal Optimization-Based Performance Model Estimation Method for HDFS

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

Performance models and dynamic characteristics analysis for HDFS write and read operations: A systematic view
Bo Dong ... Haipeng Xu
The Journal of Systems & Software | VOL. 93
Bo Dong, et. al.Bo Dong ... Haipeng Xu
02 Mar 2014
The Journal of Systems & Software | VOL. 93

ERMS: An Elastic Replication Management System for HDFS
Zhendong Cheng ... Alain Roy
-
Zhendong Cheng, et. al.Zhendong Cheng ... Alain Roy
01 Sep 2012
01 Sep 2012

PHDFS: Optimizing I/O performance of HDFS in deep learning cloud computing platform
Zongwei Zhu ... Cheng Ji
Journal of Systems Architecture | VOL. 109
Zongwei Zhu, et. al.Zongwei Zhu ... Cheng Ji
04 Jun 2020
Journal of Systems Architecture | VOL. 109

Locality Sensitive Hashing based incremental clustering for creating affinity groups in Hadoop — HDFS - An infrastructure extension
A Kala Karun ... K Chitharanjan
-
A Kala Karun, et. al.A Kala Karun ... K Chitharanjan
01 Mar 2013
01 Mar 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Ordinal Optimization-Based Performance Model Estimation Method for HDFS

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions