Fault-secure algorithms for multiple-processor systems

Prithviraj Banerjee,Jacob A Abraham

doi:10.1145/773453.808196

Abstract

In this paper we describe techniques for achieving fault secureness with low cost in multiple processor 7 systems. In order to do this we consider the relationships N between algorithms, parallel architectures, and fault tolerance. The concept of fault-secure algorithms, described in this paper, involves the application of the ideas of fault tolerance at the system level to high-performance multiple-processor algorithms to make the results of the computation reliable. Algorithms are classified into broad classes called paradigms which are determined exclusively by the communication patterns of the processors. Fault-secure techniques are presented for three powerful paradigms: the multiplex, the recursive combination, and the multiplex-demultiplex paradigms. The basic idea used in the design of fault-tolerant algorithms is that the algorithms operate on encoded input data and produce encoded output data such that the over-head in time and number of processors is not high. This technique is distinguished by three characteristics: the encoding of the data used by the algorithm, the re-design of the algorithm to operate on the encoded data, and the distribution of the computation steps in the algorithm among the computation units.

Full Text