Abstract
The exponentially increasing occurrence of soft errors makes the optimization of reliability, performance, hardware area, and power consumption one of the main concerns in modern embedded processors. Since the design cost of hardware techniques aimed at improving the reliability of microprocessors is quite expensive for resource-constrained embedded systems, software-level fault tolerance mechanisms have been proposed as an attractive solution for soft error threats. However, many software-level redundancy-based schemes are accompanied by considerable performance overhead, which is not acceptable for many embedded applications. In this work, we have introduced an ultra-low-cost soft error protection scheme for embedded applications, which works based on source-code analysis and identifying critical variables. After identification, these vital variables are adequately protected by placing runtime checks at critical points of execution. Our experimental results based on several applications demonstrate that the proposed scheme can mitigate the failure rate by 47% with negligible performance degradation.
Highlights
One of the primary sources of unreliability in modern processors is transient faults or soft errors
To the software level [2] in contemporary microprocessors, it has been projected that the number of transient faults that can bypass the masking walls and change the final program output will increase by 30× as technology scales down from 45 to 16 nm technology [3,4]
In order to mitigate the problem of soft errors, researchers have proposed fault-tolerant mechanisms operating at several levels, including circuit level [5,6], microarchitectural level [2], and software level [7,8,9,10,11,12,13,14]
Summary
One of the primary sources of unreliability in modern processors is transient faults or soft errors. In order to mitigate the problem of soft errors, researchers have proposed fault-tolerant mechanisms operating at several levels, including circuit level [5,6], microarchitectural level [2], and software level [7,8,9,10,11,12,13,14]. Soft errors or transient faults are one of the main reasons that can cause hardware malfunctions and jeopardize the functional safety of an application. Background radiation such as high-energy neutrons and protons are considered as the primary source of soft errors. Soft errors were considered as a reliability challenge for high-altitude
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have