Tuning and Optimization for a Variety of Many-Core Architectures Without Changing a Single Line of Implementation Code Using the Alpaka Library

Alexander Matthes,Erik Zenker,Benjamin Worpitz,Michael Bussmann,Axel Huebl,René Widera

doi:10.1007/978-3-319-67630-2_36

Alexander Matthes, Erik Zenker + Show 4 more

Open Access

https://doi.org/10.1007/978-3-319-67630-2_36

Copy DOI

Abstract

We present an analysis on optimizing performance of a single C++11 source code using the Alpaka hardware abstraction library. For this we use the general matrix multiplication (GEMM) algorithm in order to show that compilers can optimize Alpaka code effectively when tuning key parameters of the algorithm. We do not intend to rival existing, highly optimized DGEMM versions, but merely choose this example to prove that Alpaka allows for platform-specific tuning with a single source code. In addition we analyze the optimization potential available with vendor-specific compilers when confronted with the heavily templated abstractions of Alpaka. We specifically test the code for bleeding edge architectures such as Nvidia's Tesla P100, Intel's Knights Landing (KNL) and Haswell architecture as well as IBM's Power8 system. On some of these we are able to reach almost 50\% of the peak floating point operation performance using the aforementioned means. When adding compiler-specific #pragmas we are able to reach 5 TFLOPS/s on a P100 and over 1 TFLOPS/s on a KNL system.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Tuning and Optimization for a Variety of Many-Core Architectures Without Changing a Single Line of Implementation Code Using the Alpaka Library

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Performance Comparisons on Parallel Optimization of Atmospheric and Ocean Numerical Circulation Models Using KISTI Supercomputer Nurion System
Chaewook Lim ... Dong-Hoon Kim
Applied Sciences | VOL. 10
Chaewook Lim, et. al.Chaewook Lim ... Dong-Hoon Kim
21 Apr 2020
Applied Sciences | VOL. 10

Porting the COSMO Weather Model to Manycore CPUs
Felix Thaler ... Torsten Hoefler
-
Felix Thaler, et. al.Felix Thaler ... Torsten Hoefler
12 Jun 2019
12 Jun 2019

Evaluation of Deep Learning Frameworks Over Different HPC Architectures
Shayan Shams ... Seung-Jong Park
-
Shayan Shams, et. al.Shayan Shams ... Seung-Jong Park
01 Jun 2017
01 Jun 2017

Adaptive Optimization of Sparse Matrix-Vector Multiplication on Emerging Many-Core Architectures
Shizhao Chen ... Zheng Wang
-
Shizhao Chen, et. al.Shizhao Chen ... Zheng Wang
01 Jun 2018
01 Jun 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Tuning and Optimization for a Variety of Many-Core Architectures Without Changing a Single Line of Implementation Code Using the Alpaka Library

Abstract

Talk to us

Similar Papers