Abstract

Abstract : The objective of this research was to develop new, cost-effective techniques for fault tolerance in multicomputer architectures. The requirements for high performance and fault tolerance are seemingly contradictory: parallel architectures and algorithms developed for high performance attempt to achieve maximum utilization of each of the processors, while fault tolerance requires redundant computations and checks to ensure that the results of the applied to highly parallel multicomputer architectures. Our unique approach to achieve fault tolerance in multicomputer parallel architectures is to use an algorithm-based tolerance (ABFT) technique which is an on-line system-level method for detection of faults followed by a system level approach to reconfiguration and recovery of a parallel processor system.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call