An online algorithm for scheduling big data analysis jobs in cloud environments

Youyou Kang,Li Pan,Shijun Liu

doi:10.1016/j.knosys.2022.108628

Abstract

Cloud computing has become a popular platform for processing big data analysis jobs with its advantages of high-availability, elasticity and cost-efficiency. Many big data analysis service providers use cloud instances to process users’ big data analysis job execution requests and they need efficient scheduling algorithms to improve job execution efficiency and economic benefits. This paper presents a problem of minimizing the execution time of a batch of big data analysis jobs without changing the number of cloud instances. Solving this problem can not only improve big data job execution efficiency in cloud environments and user satisfaction, but also bring higher economic benefits to big data analysis service providers. This paper proposes an online scheduling algorithm, which can make full use of the parallelism of big data analysis jobs to optimize job scheduling decisions on the premise that the job execution time cannot be accurately known. For evaluating the performance of the proposed online scheduling algorithm, a traditional two-phase scheduling algorithm is introduced as a benchmark for comparison in this paper. Theoretical analysis and extensive simulation experiments based on real datasets show that the online scheduling algorithm proposed in this paper can achieve more stable performance compared with the benchmark two-phase scheduling algorithm.

Full Text