CLUTCH: A Clustering-Driven Runtime Estimation Scheme for Scientific Simulations

Young-Kyoon Suh,Seounghyeon Kim,Jeeyoung Kim

doi:10.1109/access.2020.3042596

Young-Kyoon Suh, Seounghyeon Kim + Show 1 more

Open Access

https://doi.org/10.1109/access.2020.3042596

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 25	License type: CC BY 4.0

Affiliation: Kyungpook National University, Samsung (South Korea)

Abstract

Efficient scheduling among simultaneous simulation jobs is of critical importance in the allocation of limited computing and I/O resources. The difficulty of predicting when a job is completed can cause nontrivial problems for system administrators and users e.g., squandered resources, long waiting times, and simulation plan delays. To alleviate these problems, we propose a novel simulation runtime estimation scheme termed CLUTCH , which employs a well-orchestrated ensemble of clustering, classification, and regression techniques. The proposed scheme trains a runtime estimation model through a series of steps: ( ${i}$ ) grouping past simulation provenance records by clustering, ( ii ) labeling each of the grouped records by classification, and ( iii ) performing regression on the execution times in each group. Given a simulation and its external arguments, the trained model predicts the simulation’s runtime with high accuracy in a black box fashion, using only basic external arguments without needing extra information. We additionally propose two optimization algorithms which significantly reduce training overhead without sacrificing estimation quality. In the experiment with real datasets, our model achieved approximately a 14.2% growth in estimation accuracy, compared to the most recent state-of-the-art method; with our optimizations applied, the model was trained 16 times faster while still retaining accuracy.

Highlights

Runtime estimation has long been an important task for black box-based online simulation platform services [1]
1) Evaluation Environment Our proposed scheme CLUTCH was developed in R [36] and the source code is currently available in a GitHub repository1
The first metric is the Mean Absolute Percentage Error (MAPE) [38] method, the most basic method that can be considered for comparison with other competitors

Summary

INTRODUCTION

Runtime estimation has long been an important task for black box-based online simulation platform services [1]. The main concerns are that often many simulations accompany high-performance computing (HPC) and storage resources which require very high execution cost in time, sometimes reaching up to months Such long execution times can lead to a variety of issues, such as (i) leaving users to sit and wait with no information of when their simulation will end; (ii) unexpectedly delaying simulation schedules; and (iii) wasting limited online simulation resources, occasionally caused by an infinite loop initiated by a wrong combination of simulation input values. To address these aforementioned concerns, this paper proposes a CLUsTering-based sCHeme for estimating simulation execution time, which we call CLUTCH.

RELATED WORK

Code Availability No No

PROBLEM FORMULATION

OUR APPROACH

PROPOSED SCHEME

OPTIMIZATION

EXPERIMENT

EVALUATION RESULTS

OPTIMIZATION PERFORMANCE

VIII. CONCLUSION AND FUTURE WORK

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

CLUTCH: A Clustering-Driven Runtime Estimation Scheme for Scientific Simulations

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Classification Technique and its Combination with Clustering and Association Rule Mining in Educational Data Mining — A survey
Sunita M Dol ... Pradip M Jawandhiya
Engineering Applications of Artificial Intelligence | VOL. 122
Sunita M Dol, et. al.Sunita M Dol ... Pradip M Jawandhiya
11 Mar 2023
Engineering Applications of Artificial Intelligence | VOL. 122

A novel hybrid antlion optimization algorithm for multi-objective task scheduling problems in cloud computing environments
Laith Abualigah ... Ali Diabat
Cluster Computing | VOL. 24
Laith Abualigah, et. al.Laith Abualigah ... Ali Diabat
12 Mar 2020
Cluster Computing | VOL. 24

Data mining and machine learning in textile industry
Pelin Yildirim ... Tuba Alpyildiz
WIREs Data Mining and Knowledge Discovery | VOL. 8
Pelin Yildirim, et. al.Pelin Yildirim ... Tuba Alpyildiz
02 Oct 2017
WIREs Data Mining and Knowledge Discovery | VOL. 8

Optimization of spatial join using constraints based- clustering techniques
V Pattabiraman
Journal of Engineering and Computer Innovations | VOL. 3
V Pattabiraman V Pattabiraman
01 Feb 2012
Journal of Engineering and Computer Innovations | VOL. 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CLUTCH: A Clustering-Driven Runtime Estimation Scheme for Scientific Simulations

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access