D-Simplexed: Adaptive Delaunay Triangulation for Performance Modeling and Prediction on Big Data Analytics

Yuxing Chen,Mohammad A. Hoque,Sasu Tarkoma,Peter Goetsch,Jiaheng Lu

doi:10.1109/tbdata.2019.2948338

Yuxing Chen, Mohammad A. Hoque + Show 3 more

Open Access

https://doi.org/10.1109/tbdata.2019.2948338

Copy DOI

Journal: IEEE Transactions on Big Data	Publication Date: Jan 1, 2019
Citations: 15	License type: CC BY 4.0

Affiliation: University of Helsinki

Abstract

Big Data processing systems (e.g., Spark) have a number of resource configuration parameters, such as memory size, CPU allocation, and the number of running nodes. Regular users and even expert administrators struggle to understand the mutual relation between different parameter configurations and the overall performance of the system. In this paper, we address this challenge by proposing a performance prediction framework, called d-Simplexed, to build performance models with varied configurable parameters on Spark. We take inspiration from the field of Computational Geometry to construct a d-dimensional mesh using Delaunay Triangulation over a selected set of features. From this mesh, we predict execution time for various feature configurations. To minimize the time and resources in building a bootstrap model with a large number of configuration values, we propose an adaptive sampling technique to allow us to collect as few training points as required. Our evaluation on a cluster of computers using WordCount, PageRank, Kmeans, and Join workloads in HiBench benchmarking suites shows that we can achieve less than 5% error rate for estimation accuracy by sampling less than 1% of data.

Highlights

Numerous Big Data frameworks have been introduced to address the problem of organizing large-scale fault-tolerant computation in a clustered environment
We propose a framework, called dSimplexed, by using the Delaunay Triangulation (DT) model to make the prediction for a given parameter configuration and heuristic adaptive sampling to reducing samples for training
We introduce the following main steps to build and use the DT model to fit our problem of performance modeling and prediction: 1) Triangulation: Given a set of d features {f1, f2, ..., fd}, e.g., {16 GB, 4 vcores}, we build the Delaunay Triangulation model in Rd space; 2) Projection: From each d-simplex returned from a Delaunay Triangulation model, we use the running-times of each of the (d + 1) points to compute the hyperplanes; 3) Prediction: Given a new parameter configuration, we can make the running-time prediction based on the model constructed before

Summary

INTRODUCTION

Alvaro [16], OtterTune [1], and CDBTune [42] use several regressors to tune a set of parameters They train the models in the way of maximizing one objective, i.e., predicting local optimal performance. Training the whole topography is extravagant with a large parameter space, and randomly picking samples does not guarantee the desired accuracy Determining both the right fraction and the appropriate representatives of the samples for building a model is not trivial.

Spark Preliminaries

Delaunay Triangulation Primitives

PROBLEM STATEMENT

M Runtime Modeling

DELAUNAY TRIANGULATION

Result

Modeling

Prediction

ADAPTIVE SAMPLING

EMPIRICAL EVALUATION

Experiment Design

Experiment Setting

Overview of Workload Evaluation

Model Evaluation

Sampling Evaluation

More Evaluation Results

Evaluation Summary

RELATED WORK

CONCLUSIONS AND FUTURE WORK

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

D-Simplexed: Adaptive Delaunay Triangulation for Performance Modeling and Prediction on Big Data Analytics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Transactions on Big Data

Lead the way for us

Similar Papers

A Survey on Automatic Parameter Tuning for Big Data Processing Systems
Herodotos Herodotou ... Yuxing Chen
ACM Computing Surveys | VOL. 53
Herodotos Herodotou, et. al.Herodotos Herodotou ... Yuxing Chen
26 Apr 2020
ACM Computing Surveys | VOL. 53

OpenMP, OpenMP/MPI, and CUDA/MPI C programs for solving the time-dependent dipolar Gross–Pitaevskii equation
Vladimir Lončar ... Antun Balaž
Computer Physics Communications | VOL. 209
Vladimir Lončar, et. al.Vladimir Lončar ... Antun Balaž
06 Sep 2016
Computer Physics Communications | VOL. 209

Analyzing the performance of a cluster-based architecture for immersive visualization systems
P Morillo ... C Cruz-Neira
Journal of Parallel and Distributed Computing | VOL. 68
P Morillo, et. al.P Morillo ... C Cruz-Neira
21 Sep 2007
Journal of Parallel and Distributed Computing | VOL. 68

BUILDING MINIMUM SPANNING TREES BY LIMITED NUMBER OF NODES OVER TRIANGULATED SET OF INITIAL NODES
Vadim Romanuke
Information and Telecommunication Sciences | VOL. -
Vadim RomanukeVadim Romanuke
24 Jun 2023
Information and Telecommunication Sciences | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

D-Simplexed: Adaptive Delaunay Triangulation for Performance Modeling and Prediction on Big Data Analytics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Transactions on Big Data