Abstract

MARS and Spark are two popular parallel computing frameworks and widely used for large-scale data analysis. In this paper, we first propose a performance evaluation model based on support vector machine (SVM), which is used to analyze the performance of parallel computing frameworks. Furthermore, we give representative results of a set of analysis with the proposed analytical performance model and then perform a comparative evaluation of MARS and Spark by using representative workloads and considering factors, such as performance and scalability. The experiments show that our evaluation model has higher accuracy than multifactor line regression (MLR) in predicting execution time, and it also provides a resource consumption requirement. Finally, we study benchmark experiments between MARS and Spark. MARS has better performance than Spark in both throughput and speedup in the executions of logistic regression and Bayesian classification because MARS has a large number of GPU threads that can handle higher parallelism. It also shows that Spark has lower latency than MARS in the execution of the four benchmarks.

Highlights

  • Cloud computing has increased exponentially because of the increasing demands in storing, processing, and retrieving a large amount of data in a cloud cluster

  • In MARS, data are input into the main memory in the form of key/value pairs, and the key/value pairs are copied into graphics memory while starting GPU computing and performing Map/Reduce operation

  • We propose an evaluation performance model based on support vector machine (SVM) for the distributed computing framework and apply it to MARS and Spark. ere are two major steps to build the performance model: the rst step is selecting a number of systems that have been scored by experts as the sample data set

Read more

Summary

Introduction

Cloud computing has increased exponentially because of the increasing demands in storing, processing, and retrieving a large amount of data in a cloud cluster. Given the increasing use of parallel computing frameworks, the design of methods that allow one to understand and predict the performance of such applications is appealing [9]. Expanding the computational complexity of matrix multiplication and reducing its computation time will meet the requirements to deal with large-scale data of the MLR prediction model. SVM and MLR are two major performance evaluation methods We apply both to the parallel computing framework to compare which one has higher accuracy in terms of execution time and resource consumption requirement. (1) We propose an evaluation performance model based on the machine learning method SVM for parallel computing frameworks and have given a comparison with MLR.

Related Works
Analytical Performance Modeling Techniques
Performance Model
Experiment Evaluation
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call