An On-Line Testing Technique for the Scheduler Memory of a GPGPU

Stefano Di Carlo,Josie E Rodriguez Condia,Matteo Sonza Reorda

doi:10.1109/access.2020.2968139

Stefano Di Carlo, Josie E Rodriguez Condia + Show 1 more

Open Access

https://doi.org/10.1109/access.2020.2968139

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 9	License type: CC BY 4.0

Affiliation: Polytechnic University of Turin

Abstract

The highly parallel processing capabilities and reduced power performance of General Purpose Graphics Processing Units (GPGPUs) have been crucial factors for their massive use in multiple fields, such as multimedia and high-performance computing applications. Nowadays, more demanding areas, such as automotive, employ GPGPU devices where safety and reliability are mandatory design constraints. Nevertheless, the structural complexity, the transistor density, and the implementation in the latest silicon technologies introduce challenges to match safety and reliability requirements. In these technologies, wear-out and aging are factors that may significantly increase the occurrence of permanent faults during the lifetime operation. Moreover, these faults may generate unacceptable misbehaviors during the execution of an application. These constraints require devising new methods for in-field fault detection, thus verifying the integrity and correct behavior of the device during its whole operational life. This work proposes a technique to generate functional self-test programs targeting the detection of permanent static faults in the memory of the warp scheduler of a GPGPU. The proposed technique can translate fault primitives, which represent the effect of faults in a memory cell, into self-test functions and programs composed of a sequence of operations to excite the fault in the memory and to propagate its effects to a visible location, thus detecting its presence. We focused on the memory in the warp scheduler because it represents a crucial module for the device operation. Furthermore, this memory is present in each Streaming Multiprocessor (SM) of a GPGPU. Some experimental results to validate the method have been gathered, resorting to the NVIDIA Visual Profiler and the Nsight Debugger using the NVIDIA-GEFORCE GTX GPU platform and a structural fault simulator. The CUDA programming environment was used to implement the test procedures.

Highlights

The General Purpose Graphics Processing Units (GPGPUs) are well-known processing solutions for data-intensive applications, such as those in the multimedia and the High-Performance Computing (HPC) fields, due to their parallel processing capabilities and the relatively reduced power consumption
The Streaming Multiprocessor (SM) is the main module inside a GPGPU, and it is optimized to process the same instruction on multiple data sources employing internal execution units (CUDA cores)
The Warp Program Counter (WPC) test programs employ a constant amount of shared variables independently of the Scheduler Controller (SC) memory size. This constant amount can be explained considering that the techniques for testing the WPC parameter are more straightforward than those employed to evaluate the Thread Active-Mask (TAM) field, including the warp selection mechanism to stop the operation of the dispatchers

Summary

INTRODUCTION

The General Purpose Graphics Processing Units (GPGPUs) are well-known processing solutions for data-intensive applications, such as those in the multimedia and the High-Performance Computing (HPC) fields, due to their parallel processing capabilities and the relatively reduced power consumption. This work proposes a method to develop self-test procedures targeting the detection of faults in the memory of the SC of a GPGPU. The FPs are used to extract the corresponding test patterns (TPs), i.e., the sequence of reading and writing operations These TPs maps into high-level self-test routines or functions for the GPGPU, generating test programs. The same mapping and translation process is performed from March elements into self-test routines, providing the same fault detection coverage of the original March elements In the end, this method can translate any element of a March algorithm targeting the status memory of the SC into a self-test procedure.

BACKGROUND

METHODS

Findings

CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An On-Line Testing Technique for the Scheduler Memory of a GPGPU

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

ACWS: Adaptive Cache-state Aware Warp Scheduling Based on Cache Feature Analysis
Weiyu Chen ... Weiqin Tong
-
Weiyu Chen, et. al.Weiyu Chen ... Weiqin Tong
02 Dec 2022
02 Dec 2022

Performance Analysis of Thread Block Schedulers in GPGPU and Its Implications
Kyungwoon Cho ... Hyokyung Bahn
Applied Sciences | VOL. 10
Kyungwoon Cho, et. al.Kyungwoon Cho ... Hyokyung Bahn
20 Dec 2020
Applied Sciences | VOL. 10

Testing permanent faults in pipeline registers of GPGPUs: A multi-kernel approach
Josie E Rodriguez Condia ... Matteo Sonza Reorda
-
Josie E Rodriguez Condia, et. al.Josie E Rodriguez Condia ... Matteo Sonza Reorda
01 Jul 2019
01 Jul 2019

A dynamic hardware redundancy mechanism for the in-field fault detection in cores of GPGPUs
Josie E Rodriguez Condia ... M Sonza Reorda
-
Josie E Rodriguez Condia, et. al.Josie E Rodriguez Condia ... M Sonza Reorda
01 Apr 2020
01 Apr 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An On-Line Testing Technique for the Scheduler Memory of a GPGPU

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access