Fabrication-Time and Run-Time Fault-Tolerant Array Processors Using Single-Track Switches

S. Y. Kung,C. W. Chang,S. N. Jean

doi:10.1007/978-1-4615-6799-8_25

Abstract

This paper addresses the important fault-tolerance issue for arrays of large number of processors. An array grid model based on single-track switches is adopted. Single track requires less hardware overhead and suffers less from possible faults on switches. More significantly, we are able to establish a very useful necessary and sufficient condition for the reconfigurability of the array. This is indeed the theoretical footing for two reconfiguration algorithms: one adopts global control for the (fabrication-time) yield enhancement and the other is a distributed scheme for the (run-time) reliability improvement. For the fabrication time reconfiguration algorithm, the task can be reformulated as a maximum independent set problem. An existing algorithm in graph theory is adopted to effectively solve this problem. The simulations conducted indicate that the algorithm is computationally very efficient; therefore, it is also very suitable for the compile-time fault-tolerance. In contrast, for the real time reconfiguration algorithm, it is more suitable to adopt a distributive method for (asynchronous) array processors. The algorithm has several important features: (1) it is distributively executed by the processor elements (PEs); (2) no global information is required by the individual PEs; (3) the time overhead for reconfiguration is independent of the array size; (4) transient faults are handled by retries or by deactivating/reactivating the temporarily failed PE. Based on simulations, the performance of the algorithms and the tradeoffs between fault-tolerance capability and hardware complexity for various kinds of spare PE distributions are evaluated.

Full Text