Heterogeneous GPU Research Articles

The recent GPU-based clusters that handle deep learning (DL) tasks have the features of GPU device heterogeneity, a variety of deep neural network (DNN) models, and high computational complexity. Thus, the traditional power capping methods for CPU-based clusters or small-scale GPU devices cannot be applied to the GPU-based clusters handling DL tasks. This article develops a cooperative distributed GPU power capping (CD-GPC) system for GPU-based clusters, aiming to minimize the training completion time of invoked DL tasks without exceeding the limited power budget. Specifically, we first design the frequency scaling approach using the online model estimation based on the recursive least square method. This approach achieves the accurate tuning for DL task training time and power usage of GPU devices without needing offline profiling. Then, we formulate the proposed FS problem as a Lagrangian dual decomposition-based economic model predictive control problem for large-scale heterogeneous GPU clusters. We conduct both the NVIDIA GPU-based lab-scale real experiments and real job trace-based simulation experiments for performance evaluation. Experimental results validate that the proposed system improves the power capping accuracy to have a mean absolute error of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$< \!1\%$</tex-math></inline-formula> , and reduces the deadline violation ratio of invoked DL tasks by 21.5% compared with other recent counterparts.

Read full abstract

Nowadays, embedded systems are comprised of heterogeneous multi-core architectures, i.e., CPUs and GPUs. If the application is mapped to an appropriate processing core, then these architectures provide many performance benefits to applications. Typically, programmers map sequential applications to CPU and parallel applications to GPU. The task mapping becomes challenging because of the usage of evolving and complex CPU- and GPU-based architectures. This paper presents an approach to map the OpenCL application to heterogeneous multi-core architecture by determining the application suitability and processing capability. The classification is achieved by developing a machine learning-based device suitability classifier that predicts which processor has the highest computational compatibility to run OpenCL applications. In this paper, 20 distinct features are proposed that are extracted by using the developed LLVM-based static analyzer. In order to select the best subset of features, feature selection is performed by using both correlation analysis and the feature importance method. For the class imbalance problem, we use and compare synthetic minority over-sampling method with and without feature selection. Instead of hand-tuning the machine learning classifier, we use the tree-based pipeline optimization method to select the best classifier and its hyper-parameter. We then compare the optimized selected method with traditional algorithms, i.e., random forest, decision tree, Naïve Bayes and KNN. We apply our novel approach on extensively used OpenCL benchmarks, i.e., AMD and Polybench. The dataset contains 653 training and 277 testing applications. We test the classification results using four performance metrics, i.e., F-measure, precision, recall and R^2. The optimized and reduced feature subset model achieved a high F-measure of 0.91 and R^2 of 0.76. The proposed framework automatically distributes the workload based on the application requirement and processor compatibility.

Read full abstract

Heterogeneous GPU Research Articles

Related Topics

Articles published on Heterogeneous GPU

LBB: load-balanced batching for efficient distributed learning on heterogeneous GPU cluster

Non-Clairvoyant Scheduling of Distributed Machine Learning with Inter-job and Intra-job Parallelism on Heterogeneous GPUs

SLoB: Suboptimal Load Balancing Scheduling in Local Heterogeneous GPU Clusters for Large Language Model Inference

Utilization-prediction-aware energy optimization approach for heterogeneous GPU clusters

MSHGN: Multi-scenario adaptive hierarchical spatial graph convolution network for GPU utilization prediction in heterogeneous GPU clusters

Mixtran: an efficient and fair scheduler for mixed deep learning workloads in heterogeneous GPU environments

PipePar: Enabling fast DNN pipeline parallel training in heterogeneous GPU clusters

Hydra: Deadline-Aware and Efficiency-Oriented Scheduling for Deep Learning Jobs on Heterogeneous GPUs

Using heterogeneous GPU nodes with a Cabana-based implementation of MPCD

HetSev: Exploiting Heterogeneity-Aware Autoscaling and Resource-Efficient Scheduling for Cost-Effective Machine-Learning Model Serving

Cooperative Distributed GPU Power Capping for Deep Learning Clusters

Performance and accuracy predictions of approximation methods for shortest-path algorithms on GPUs

Horus: Interference-Aware and Prediction-Based Scheduling in Deep Learning Systems

Parallel Fine-Grained Comparison of Long DNA Sequences in Homogeneous and Heterogeneous GPU Platforms With Pruning

Hyperspectral Parallel Image Compression on Edge GPUs

Distributed programming of a hyperspectral image registration algorithm for heterogeneous GPU clusters

Multi-GPU Design and Performance Evaluation of Homomorphic Encryption on GPU Clusters

AEML: An Acceleration Engine for Multi-GPU Load-balancing in Distributed Heterogeneous Environment

Activity-Driven Task Allocation in Energy-Constrained Heterogeneous GPUs Systems

A load balance multi-scheduling model for OpenCL kernel tasks in an integrated cluster

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Heterogeneous GPU Research Articles

Related Topics

Articles published on Heterogeneous GPU

LBB: load-balanced batching for efficient distributed learning on heterogeneous GPU cluster

Non-Clairvoyant Scheduling of Distributed Machine Learning with Inter-job and Intra-job Parallelism on Heterogeneous GPUs

SLoB: Suboptimal Load Balancing Scheduling in Local Heterogeneous GPU Clusters for Large Language Model Inference

Utilization-prediction-aware energy optimization approach for heterogeneous GPU clusters

MSHGN: Multi-scenario adaptive hierarchical spatial graph convolution network for GPU utilization prediction in heterogeneous GPU clusters

Mixtran: an efficient and fair scheduler for mixed deep learning workloads in heterogeneous GPU environments

PipePar: Enabling fast DNN pipeline parallel training in heterogeneous GPU clusters

Hydra: Deadline-Aware and Efficiency-Oriented Scheduling for Deep Learning Jobs on Heterogeneous GPUs

Using heterogeneous GPU nodes with a Cabana-based implementation of MPCD

HetSev: Exploiting Heterogeneity-Aware Autoscaling and Resource-Efficient Scheduling for Cost-Effective Machine-Learning Model Serving

Cooperative Distributed GPU Power Capping for Deep Learning Clusters

Performance and accuracy predictions of approximation methods for shortest-path algorithms on GPUs

Horus: Interference-Aware and Prediction-Based Scheduling in Deep Learning Systems

Parallel Fine-Grained Comparison of Long DNA Sequences in Homogeneous and Heterogeneous GPU Platforms With Pruning

Hyperspectral Parallel Image Compression on Edge GPUs

Distributed programming of a hyperspectral image registration algorithm for heterogeneous GPU clusters

Multi-GPU Design and Performance Evaluation of Homomorphic Encryption on GPU Clusters

AEML: An Acceleration Engine for Multi-GPU Load-balancing in Distributed Heterogeneous Environment

Activity-Driven Task Allocation in Energy-Constrained Heterogeneous GPUs Systems

A load balance multi-scheduling model for OpenCL kernel tasks in an integrated cluster