High-Performance Computing (HPC) systems offer massive computation strength to execute large-scale applications. However, the availability of thousands of CPU cores in the HPC Systems has also triggered a significant increase in the associated energy consumption translating to higher energy expenses of system providers and carbon emissions in the environment. Therefore efficient job schedulers, which can trade-off between user-desired performance and conflicting energy-efficiency objectives simultaneously, are the need of the hour and must nowadays. Job scheduling in HPC systems is a known NP-Hard problem for which meta-heuristics may provide a near-to-optimal solution. Cuckoo search (CS) is a well-known robust swarm-intelligence based meta-heuristic, which has been applied extensively in many optimization problems due to the strong searching efficiency and requirement of very few tuning parameters. However, it suffers from the likelihood of trapping in the local minima and lack of solution diversity towards the end of the algorithm. These drawbacks could result in unacceptable results when the CS algorithm applies to the parallel job scheduling problem. To overcome these limitations and improve the searching efficiency of the traditional CS, we have proposed a multi-objective hybrid scheduling algorithm called MOHCSFA to optimally schedule the batch of parallel jobs in HPC Grid. The proposed MOHCSFA policy combines the solution search mechanisms of both Cuckoo Search (CS) and Firefly algorithm (FA) during each generation. Our proposed policy is further integrated with efficient resource allocation (ERA) heuristic to improve job scheduler performance by effectively using multi-site resource allocation. The experiments are conducted on the GridSim simulator and the benchmarking of the proposed algorithm is done using real data-sets extracted from two supercomputing workload logs. The simulation results showed that the proposed MOHCSFA policy outperforms many heuristics and meta-heuristic scheduling policies for different test cases for both performance and energy-efficiency objectives. Specifically, in the case of Unilu-Gaia workloads, the MOHCSFA obtained 5.87–24.05%, 3.46–28.50%, and 7.06–26.76% performance improvement for the makespan, energy consumption and avg. flowtime, respectively over other tested scheduling policies. The statistical tests validated the stability and robustness of the proposed policy over other scheduling policies.
Read full abstract