Combining Machine Learning techniques and Genetic Algorithm for predicting run times of High Performance Computing jobs

Suja Ramachandran,M.L Jayalal,M Vasudevan,Sourish Das,R Jehadeesan

doi:10.1016/j.asoc.2024.112053

Suja Ramachandran, M.L Jayalal + Show 3 more

https://doi.org/10.1016/j.asoc.2024.112053

Copy DOI

Export

Save

Cite

Journal: Applied Soft Computing	Publication Date: Aug 3, 2024
Citations: 3

Abstract
Full-Text
Similar Papers

Abstract

Listen

This study proposes a novel approach combining Machine Learning (ML) techniques and Genetic Algorithms (GA) for predicting High-Performance Computing (HPC) job run times. The objective is to create a prediction method universally applicable to any HPC system, irrespective of workload characteristics, application specific parameters, user behavior, or hardware architecture. Since user-supplied run time estimates are often inaccurate, we aim to categorize job runtimes into several classes, allowing users to select appropriate classes for their jobs. A Genetic Algorithm is developed to optimally define these runtime classes, determining both the number of classes and the time intervals represented by them. Four Machine Learning algorithms (K Nearest Neighbours, Support Vector Regression, Extreme Gradient Boosting and Deep Neural Networks) are implemented for run time prediction. A unique set of features extracted from historical job data serves as input to the Machine Learning models. The generalized nature of our method is demonstrated by validating its performance on data from six clusters with distinct configurations, applications and runtime distributions. Our results illustrate the superior performance of Machine Learning models incorporating GA-defined runtime classes for all datasets. Across all six datasets, our method achieves R2 scores exceeding 0.8, and accuracy greater than 0.7.

Full Text