Abstract

As chip densities and clock rates increases, processors are becoming more susceptible to transient faults that affect program correctness. Therefore, fault tolerance becomes increasingly important in computing system. Two major concerns of fault tolerance techniques are: a) improving system reliability by detecting transient errors and b) reducing performance overhead. In this study, we propose a configurable fault tolerance technique targeting both high reliability and low performance overhead for multi-media applications. The basic principle is applying different levels of fault tolerance configurability, which means that different degrees of fault tolerance are applied to different parts of the source codes in multi-media applications. First, a primary analysis is performed on the source code level to classify the critical statements. Second, a fault injection process combined with a statistical analysis is used to assure the partition with regards to a confidence degree. Finally, checksum-based fault tolerance and instruction duplication are applied to critical statements, while no fault tolerance mechanism is applied to non-critical parts. Performance experiment results demonstrate that our configurable fault tolerance technique can lead to significant performance gains compared with duplicating all instructions. The fault coverage of this scheme is also evaluated. Fault injection results show that about 90% of outputs are application-level correctness with just 20% of runtime overhead.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call