Multiobjective GPU design space exploration optimization

Ali Jooya,Nikitas Dimopoulos,Amirali Baniasadi

doi:10.1016/j.micpro.2019.06.001

Abstract

It has been more than a decade since general porous applications targeted GPUs to benefit from the enormous processing power they offer. However, not all applications gain speedup running on GPUs. If an application does not have enough parallel computation to hide memory latency, running it on a GPU will degrade the performance compared to what it could achieve on a CPU. On the other hand, the efficiency that an application with high level of parallelism can achieve running on a GPU depends on how well the application’s memory and computational demands are balanced with a GPU’s resources.In this work we tackle the problem of finding a GPU configuration that performs well on a set of GPGPU applications. To achieve this, we propose two models as follows.First, we study the design space of 20 GPGPU applications and show that the relationship between the architectural parameters of the GPU and the power and performance of the application it runs can be learned by a Neural Network (NN). We propose application-specific NN-based predictors that train with 5% of the design space and predict the power and performance of the remaining 95% configurations (blind set). Although the models make accurate predictions, there exist few configurations that their power and performance are mispredicted. We propose a filtering heuristic that captures most of the predictions with large errors by marking only 5% of the configurations in the blind set as outliers.Using the models and the filtering heuristic, one will have the power and performance values for all configurations in the design space of an application. Searching the design space for a set of configurations that meet certain restrictions on the power and performance can be a tedious task as some applications have large design spaces. In the Second model, we propose to employ the Pareto Front multiobjective optimization technique to obtain a subset of the design space that run the application optimally in terms of power and performance. We show that the optimum configurations predicted by our model is very close to the actual optimum configurations. While this method gives the optimum configurations for each application, having a set of GPGPU applications, one may look for a configuration that performs well over all the applications. Therefore, we propose a method to find such a configuration with respect to different performance objectives.

Full Text