Abstract

As semiconductor technologies scale down to deep sub-micron dimensions, transient faults will soon become a critical reliability concern. Due to their prohibitive costs, traditional high-end solutions are unacceptable for the mainstream commodity market. This paper presents FTPIPE, a hybrid software/hardware solution, which provides sufficient fault coverage with affordable overhead for single-threaded programs running on commodity systems. Leveraging existing exception mechanisms with minor modifications to handle exception-causing faults, FTPIPE focuses on tolerating silent data corruptions by using compile-time analysis and performing selective instruction replication in a modern superscalar pipeline extended with minimal hardware overhead. Unlike existing instruction replication-based solutions, which detect faults by synchronous checks, the FTPIPE platform has exploited a novel hybrid synchronous/asynchronous check method for the replicated instructions. In this manner, better performance can be obtained without degradation of fault coverage. By synchronous checks, the validation of the result of a replicated instruction must be finished before it is committed, whereas such a guarantee is not required by an asynchronous check. Evaluation using a set of nine programs from the Mibench benchmark suite demonstrates that FTPIPE can tolerate 89.8% of transient faults under a modest performance overhead of 20.1%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call