Abstract

Backfill scheduling is a common scheduling strategy in high-performance computing systems that allows priority execution of low-priority jobs to make better use of available resources. Job running time is an important parameter that affects the performance of backfill scheduling algorithm. However, in order to avoid job killing due to lack of time, the running time requested by users is often several times higher than the actual running time, resulting in a certain degree of resource waste. In order to improve resource utilization, a new job running time prediction algorithm is proposed by combining classification and ensemble learning methods. The algorithm first classifies the historical job set according to the application type, then uses Jaccard coefficient to calculate the similarity between the jobs, and further classifies the jobs. At the same time, different integration models are constructed for the jobs of different application types. New jobs are categorized, and the class's integration model is used to predict the running time of the new job. The algorithm was tested on the historical job data of the National Supercomputing Center Kunshan, Hefei Advanced Computing Center and "Wuzhen Light" supercomputing Center and compared with GA-sim algorithm and IRPA algorithm. The experimental results show that compared with the IRPA algorithm, the average absolute error of the algorithm is improved by 60% on the three data sets on average. Compared with the GA-sim algorithm, the average prediction accuracy of the algorithm is improved by 20% on the three data sets on average. Through the in-depth analysis of the experimental results, the amplification method for the low estimation of long and short jobs is given.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call