An Evaluation Model and Benchmark for Parallel Computing Frameworks

Weibei Fan,Ruchuan Wang,Zhijie Han

doi:10.1155/2018/3890341

Weibei Fan, Ruchuan Wang + Show 1 more

Open Access

https://doi.org/10.1155/2018/3890341

Copy DOI

Journal: Mobile Information Systems	Publication Date: Jan 1, 2018
Citations: 6	License type: CC BY 4.0

Affiliation: Soochow University, Henan University

Abstract

MARS and Spark are two popular parallel computing frameworks and widely used for large-scale data analysis. In this paper, we first propose a performance evaluation model based on support vector machine (SVM), which is used to analyze the performance of parallel computing frameworks. Furthermore, we give representative results of a set of analysis with the proposed analytical performance model and then perform a comparative evaluation of MARS and Spark by using representative workloads and considering factors, such as performance and scalability. The experiments show that our evaluation model has higher accuracy than multifactor line regression (MLR) in predicting execution time, and it also provides a resource consumption requirement. Finally, we study benchmark experiments between MARS and Spark. MARS has better performance than Spark in both throughput and speedup in the executions of logistic regression and Bayesian classification because MARS has a large number of GPU threads that can handle higher parallelism. It also shows that Spark has lower latency than MARS in the execution of the four benchmarks.

Highlights

Cloud computing has increased exponentially because of the increasing demands in storing, processing, and retrieving a large amount of data in a cloud cluster
In MARS, data are input into the main memory in the form of key/value pairs, and the key/value pairs are copied into graphics memory while starting GPU computing and performing Map/Reduce operation
We propose an evaluation performance model based on support vector machine (SVM) for the distributed computing framework and apply it to MARS and Spark. ere are two major steps to build the performance model: the rst step is selecting a number of systems that have been scored by experts as the sample data set

Summary

Introduction

Cloud computing has increased exponentially because of the increasing demands in storing, processing, and retrieving a large amount of data in a cloud cluster. Given the increasing use of parallel computing frameworks, the design of methods that allow one to understand and predict the performance of such applications is appealing [9]. Expanding the computational complexity of matrix multiplication and reducing its computation time will meet the requirements to deal with large-scale data of the MLR prediction model. SVM and MLR are two major performance evaluation methods We apply both to the parallel computing framework to compare which one has higher accuracy in terms of execution time and resource consumption requirement. (1) We propose an evaluation performance model based on the machine learning method SVM for parallel computing frameworks and have given a comparison with MLR.

Related Works

Analytical Performance Modeling Techniques

Performance Model

Experiment Evaluation

Conclusion