Abstract

Modeling and analyzing the performance of distributed file systems (DFSs) benefit the reliability and quality of data processing in data-intensive applications. Hadoop Distributed File System (HDFS) is a typical representative of DFSs. Its internal heterogeneity and complexity as well as external disturbance contribute to HDFS's built-in features of nonlinearity as well as randomness in system level, which raises a great challenge in modeling these features. Particularly, the randomness results in the uncertainty of HDFS performance model. Due to the complex mathematical structure and parameters hardly estimated of analytical models, it is highly complicated and computationally impossible to build an explicit and precise analytical model of the randomness. The measurement-based methodology is a promising way to model HDFS performance in terms of randomness since it requires no knowledge of system's internal behaviors. In this paper, the estimation of HDFS performance models on account of the randomness is transformed to an optimization problem of finding out the real best design of performance model structure with large design space. Core ideas of ordinal optimization (OO) are introduced to solve this problem with a limited computing budget. Piecewise linear (PL) model is applied to approximate the nonlinear characteristics and randomness of HDFS performance. The experimental results show that the proposed method is effective and practical to estimate the optimal design of the PL-based performance model structure for HDFS. It not only provides a globally consistent evaluation of the design space but also guarantees the goodness of the solution with high probability. Moreover, it improves the accuracy of system model-based HDFS performance models.

Highlights

  • The rapid growing internet services such as Google, Yahoo!, Amazon and Facebook, are a representative style of data-intensive applications

  • Based on the comparison above, it is obvious that the optimal Hadoop Distributed File System (HDFS) W/R performance model built by the proposed method improves the accuracy of system model-based HDFS W/R performance models

  • The randomness practically results in the uncertainty of performance model for HDFS. Since it is highly complicated and mathematically impossible to build explicit and precise analytical models of the randomness, Piecewise linear (PL) model is applied to approximately characterize HDFS performance driven by the measurement-based methodology combined with system identification

Read more

Summary

INTRODUCTION

The rapid growing internet services such as Google, Yahoo!, Amazon and Facebook, are a representative style of data-intensive applications. A classical methodology is to build analytical models of the system’s internal architecture, components, or working mechanism (e.g., [3], [15], [16]), which significantly requires solid professional knowledge of the system It is highly complicated and computationally impossible to build an explicit and precise analytical performance models in terms of the randomness of performance by using this methodology. Heavy computational burden including time and resource costs ensues owing to the large design space By this means, the estimation of HDFS performance on account of the randomness is transformed to an optimization problem of finding out the real best design (a design is a solution to an optimization problem) of performance model using a limited computing budget.

OVERVIEW OF ORDINAL OPTIMIZATION
PROCEDURE OF THE PROPOSED METHOD
EXPERIMENTS AND ANALYSIS
INITIAL PARAMETER SETTINGS
RELATED WORK
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call