Reliability Evaluation of LU Decomposition on GPU-Accelerated System-on-Chip Under Proton Irradiation

Jose M Badia,Almudena Lindoso,Luis Entrena,Jose A Belloch,German Leon,Mario Garcia-Valderas,Yolanda Morilla

doi:10.1109/tns.2022.3155820

Abstract

Graphic processing units (GPUs) have become a basic accelerator both in high-performance nodes and low-power system-on-chip (SoC). They provide massive data parallelism and very high performance per watt. However, their reliability in harsh environments is an important issue to take into account, especially for safety-critical applications. In this article, we evaluate the influence of the parallelization strategy on the reliability of lower–upper (LU) decomposition on a GPU-accelerated SoC under proton irradiation. Specifically, we compare a memory bound and a compute bound implementation of the decomposition on a K20A GPU embedded on a Tegra K1 (TK1) SoC. We leverage the GPU and CPU clock frequencies both to highlight the radiation sensitivity of the GPU where we are running the benchmark and also to apply both algorithms to solve problems with the same size when exposed to the same radiation dose. Results show that more intensive use of the resources of the GPU increases the cross section. We also observed that most of the radiation-induced errors hang the operating system and even the rebooting process. Finally, we present a preliminary study of the error propagation of the LU decomposition algorithms.

Full Text