Abstract
This work provides an approach that automatically optimizes the component computations on graphics processing unit (GPU) devices from different vendors. The approach consists of a two-level optimization, where the first level considers the linear part of the computation for vectorization and applies mixed matrix formats to increase computational throughput further. Then, the second optimization level treats the combination of linear and non-linear parts as a black box and searches for the optimal configuration of parameters such as the degree of vectorization, the combination of matrix formats, and the group (of threads) sizes during parallel execution on GPU. Moreover, we also introduce constraints that reduce the optimization procedure’s execution time. Finally, we select three different types of components that could be representative to computational tasks in power system and perform our optimization approach on these kernels. The computational performance is compared with unoptimized baseline and sparse linear algebra library based implementations, result shows that our optimization leads to better performance and more efficient memory utilization.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.