Abstract

Energy optimization is an increasingly important aspect of today’s high-performance computing applications. In particular, dynamic voltage and frequency scaling (DVFS) has become a widely adopted solution to balance performance and energy consumption, and hardware vendors provide management libraries that allow the programmer to change both memory and core frequencies manually to minimize energy consumption while maximizing performance. This article focuses on modeling the energy consumption and speedup of GPU applications while using different frequency configurations. The task is not straightforward, because of the large set of possible and uniformly distributed configurations and because of the multi-objective nature of the problem, which minimizes energy consumption and maximizes performance. This article proposes a machine learning-based method to predict the best core and memory frequency configurations on GPUs for an input OpenCL kernel. The method is based on two models for speedup and normalized energy predictions over the default frequency configuration. Those are later combined into a multi-objective approach that predicts a Pareto-set of frequency configurations. Results show that our approach is very accurate at predicting extema and the Pareto set, and finds frequency configurations that dominate the default configuration in either energy or performance.

Highlights

  • Power consumption is a major concern of modern computing platforms, from small-scale embedded systems to large-scale compute clusters

  • By using NVIDIA Management Library (NVML), a programmer is able to tune core and memory frequencies is for a specific application, and different applications may show different energy consumption and performance depending on the selected frequency setting: for instance, a compute-bounded

  • The problem of modeling performance and energy with different frequency configurations has an additional problem on NVIDIA Titan X graphics processing units (GPUs): the set of available frequency configurations provided by the NVML tool is not evenly distributed, and is highly imbalanced

Read more

Summary

Introduction

Power consumption is a major concern of modern computing platforms, from small-scale embedded systems to large-scale compute clusters. An NVIDIA GTX Titan X supports a total number of 219 possible configurations, spanning four memory frequencies and 85 core frequencies (note that not all memory-core combinations are supported; e.g., it is not possible to have both maximal core and memory frequency) Sampling such a large space is not viable option for many applications; this work focuses on the design and implementation of a predictive approach, which aims at minimizing both the energy-per-task and the running time, e.g., by solving a multi-objective optimization problem. The distribution of the different memory settings highly depends on the kernel: memory-bounded applications are more sensitive to increases of memory frequency, while compute-bounded ones are mainly affected by core frequency increases

Related Work
Background
Overview
Features
Training Data
Predictive Modeling
Speedup Prediction
Normalized Energy Model
Deriving the Pareto Set
Modeling Imbalanced Dataset
Frequency Domain and Test Setting
Imbalance of Available Frequency Configurations
Experimental Evaluation
Experimental Evaluation of Oversampling
Application Characterization Analysis
Input-Size Analysis
Accuracy of Speedup and Normalized Energy Predictions
Accuracy of the Predicted Pareto Set
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call