Abstract

This study proposes a novel approach combining Machine Learning (ML) techniques and Genetic Algorithms (GA) for predicting High-Performance Computing (HPC) job run times. The objective is to create a prediction method universally applicable to any HPC system, irrespective of workload characteristics, application specific parameters, user behavior, or hardware architecture. Since user-supplied run time estimates are often inaccurate, we aim to categorize job runtimes into several classes, allowing users to select appropriate classes for their jobs. A Genetic Algorithm is developed to optimally define these runtime classes, determining both the number of classes and the time intervals represented by them. Four Machine Learning algorithms (K Nearest Neighbours, Support Vector Regression, Extreme Gradient Boosting and Deep Neural Networks) are implemented for run time prediction. A unique set of features extracted from historical job data serves as input to the Machine Learning models. The generalized nature of our method is demonstrated by validating its performance on data from six clusters with distinct configurations, applications and runtime distributions. Our results illustrate the superior performance of Machine Learning models incorporating GA-defined runtime classes for all datasets. Across all six datasets, our method achieves R2 scores exceeding 0.8, and accuracy greater than 0.7.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.