Abstract

Kubernetes makes it easier to automate deployment and scale containerized applications to achieve near-native performance in cloud environment. However, there still lacks a systematic comparison study on how Spark applications perform between on the bare metal and on Kubernetes. In this paper, we focus on the performance evaluation of these applications running on the two environments by a series of experiments. Based on these experiments, we locate what stages cause their performance gap and reveal out root causes to the gap by analysing work-flows of these Spark applications and their resource costs. Through extensive measurements, we find out that Spark on the bare metal almost always contribute to better performance when compared with Spark on Kubernetes. More CPU usage of executors and better data locality on the bare metal are the root causes to the gap. By contrast, Spark on Kubernetes also has some its advantages over Spark on the bare metal in terms of disk W-IOPs. The research work in this paper can help practitioners and researchers to make more informed decisions on tuning their cloud environment and configuring the big data applications, so as to achieve better performance and higher resources utilization.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call