Abstract

Docker container is experiencing a rapid development with the support from industry and being widely used in large scale production cloud environment, due to the benefits of speedy launching time and tiny memory footprint. However the performance of big data applications (e.g., Spark) running in Docker containers is still not clear due to the complex parameter configuration and interference between neighbor containers. This paper investigates the impacts of docker configuration and resource interference on the performance of big data applications in a typical container environment. In particular, we first conduct a series of experiments to measure the performance impact by adjusting the docker configuration parameters, such as resource limits, and observe the Spark performance is not linear with increasing resource allocation for containers. Then, we evaluate the interference between multiple containers by controlling the resource competition and detect the performance interference phenomenon between multiple containers. Finally, we propose a performance prediction model based on the Support Vector Regression (SVR) to predict the application performance with different configurations and resource competition settings. Experimental results show the prediction error is less than 10% for all the four typical Spark applications.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call