Abstract
The highly parallel processing capabilities and reduced power performance of General Purpose Graphics Processing Units (GPGPUs) have been crucial factors for their massive use in multiple fields, such as multimedia and high-performance computing applications. Nowadays, more demanding areas, such as automotive, employ GPGPU devices where safety and reliability are mandatory design constraints. Nevertheless, the structural complexity, the transistor density, and the implementation in the latest silicon technologies introduce challenges to match safety and reliability requirements. In these technologies, wear-out and aging are factors that may significantly increase the occurrence of permanent faults during the lifetime operation. Moreover, these faults may generate unacceptable misbehaviors during the execution of an application. These constraints require devising new methods for in-field fault detection, thus verifying the integrity and correct behavior of the device during its whole operational life. This work proposes a technique to generate functional self-test programs targeting the detection of permanent static faults in the memory of the warp scheduler of a GPGPU. The proposed technique can translate fault primitives, which represent the effect of faults in a memory cell, into self-test functions and programs composed of a sequence of operations to excite the fault in the memory and to propagate its effects to a visible location, thus detecting its presence. We focused on the memory in the warp scheduler because it represents a crucial module for the device operation. Furthermore, this memory is present in each Streaming Multiprocessor (SM) of a GPGPU. Some experimental results to validate the method have been gathered, resorting to the NVIDIA Visual Profiler and the Nsight Debugger using the NVIDIA-GEFORCE GTX GPU platform and a structural fault simulator. The CUDA programming environment was used to implement the test procedures.
Highlights
The General Purpose Graphics Processing Units (GPGPUs) are well-known processing solutions for data-intensive applications, such as those in the multimedia and the High-Performance Computing (HPC) fields, due to their parallel processing capabilities and the relatively reduced power consumption
The Streaming Multiprocessor (SM) is the main module inside a GPGPU, and it is optimized to process the same instruction on multiple data sources employing internal execution units (CUDA cores)
The Warp Program Counter (WPC) test programs employ a constant amount of shared variables independently of the Scheduler Controller (SC) memory size. This constant amount can be explained considering that the techniques for testing the WPC parameter are more straightforward than those employed to evaluate the Thread Active-Mask (TAM) field, including the warp selection mechanism to stop the operation of the dispatchers
Summary
The General Purpose Graphics Processing Units (GPGPUs) are well-known processing solutions for data-intensive applications, such as those in the multimedia and the High-Performance Computing (HPC) fields, due to their parallel processing capabilities and the relatively reduced power consumption. This work proposes a method to develop self-test procedures targeting the detection of faults in the memory of the SC of a GPGPU. The FPs are used to extract the corresponding test patterns (TPs), i.e., the sequence of reading and writing operations These TPs maps into high-level self-test routines or functions for the GPGPU, generating test programs. The same mapping and translation process is performed from March elements into self-test routines, providing the same fault detection coverage of the original March elements In the end, this method can translate any element of a March algorithm targeting the status memory of the SC into a self-test procedure.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.