Resource-Aware Device Allocation of Data-Parallel Applications on Heterogeneous Systems

Donghyeon Kim,Woosung Kim,Seokwon Kang,Junsu Lim,Yongjun Park,Sunwook Jung

doi:10.3390/electronics9111825

Abstract

As recent heterogeneous systems comprise multi-core CPUs and multiple GPUs, efficient allocation of multiple data-parallel applications has become a primary goal to achieve both maximum total performance and efficiency. However, the efficient orchestration of multiple applications is highly challenging because a detailed runtime status such as expected remaining time and available memory size of each computing device is hidden. To solve these problems, we propose a dynamic data-parallel application allocation framework called ADAMS. Evaluations show that our framework improves the average total execution device time by 1.85× over the round-robin policy in the non-shared-memory system with small data set.

Highlights

High performance and energy efficiency are critical parameters for emerging applications such as vision and various machine-learning applications [1,2,3,4,5]
Though the execution time estimation is nearly impossible for general programs, recent studies have shown that the execution time of general-purpose GPU (GPGPU) tasks are fairly predictable [9,10,11] based on input problem sizes; we decided to use a problem size-based regression model for execution time prediction, similar to the approach of MKMD [10] based on offline profile data
We propose a dynamic multiple data-parallel application allocation framework (ADAMS), to efficiently allocating multiple processes to multiple devices

Summary

Introduction

High performance and energy efficiency are critical parameters for emerging applications such as vision and various machine-learning applications [1,2,3,4,5]. In this situation, load balancing failures among multiple devices result in resource underuse of some devices, and the maximum performance of the system cannot be achieved. Recent GPU evolution trends (NVIDIA Pascal [16] and Volta [17]) improve both the throughput and latency by allowing concurrent execution of multiple kernels using preemptive [18]/spatial multitasking [19] Efficient memory management has become crucial for achieving better multitasking performance because concurrent kernel execution requires more memory to handle all co-running kernels To address this issue, we introduce an automatic device allocation management system (ADAMS). Shared memory includes the allocated application list and remaining total execution time per device

OpenCL Programming Model

Target Device-Selection Challenge

Execution Time Prediction of Data-Parallel Applications

Limitations in Non-Shared-Memory Systems

Limitations in Shared-Memory Systems

Overview

Allocation Manager

Global Memory Analyzer

Concurrent Time Estimator

Time Prediction

Evaluation and Discussion

Allocation Policy on Non-Shared-Memory Systems

Memory Consideration on Non-Shared-Memory Systems

Memory Consideration on Shared-Memory Systems

Case Study

Overhead

Findings

Related Work

Conclusions

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronics	Publication Date: Nov 2, 2020
Citations: 1	License type: CC BY 4.0

R Discovery Prime

Resource-Aware Device Allocation of Data-Parallel Applications on Heterogeneous Systems

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

Transfer learning-based fault location with small datasets in VSC-HVDC
Boyang Shang ... Jiaxin Hei
International Journal of Electrical Power & Energy Systems | VOL. 151
Boyang Shang, et. al.Boyang Shang ... Jiaxin Hei
13 Apr 2023
International Journal of Electrical Power & Energy Systems | VOL. 151

A new imputation method for small software project data sets
Qinbao Song ... Martin Shepperd
The Journal of Systems & Software | VOL. 80
Qinbao Song, et. al.Qinbao Song ... Martin Shepperd
16 Jun 2006
The Journal of Systems & Software | VOL. 80

Integral projection models perform better for small demographic data sets than matrix population models: a case study of two perennial herbs
Satu Ramula ... Yvonne M Buckley
Journal of Applied Ecology | VOL. 46
Satu Ramula, et. al.Satu Ramula ... Yvonne M Buckley
01 Oct 2009
Journal of Applied Ecology | VOL. 46

Abstract LB396: The power of NetraAI: Precision medicine in oncology through sub-insight learning from small data sets
Bessi Qorri ... Paul Leonchyk
Cancer Research | VOL. 84
Bessi Qorri, et. al.Bessi Qorri ... Paul Leonchyk
05 Apr 2024
Cancer Research | VOL. 84

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Resource-Aware Device Allocation of Data-Parallel Applications on Heterogeneous Systems

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Electronics