Abstract

This paper addresses the important fault-tolerance issue for arrays of large number of processors. An array grid model based on single-track switches is adopted. Single track requires less hardware overhead and suffers less from possible faults on switches. More significantly, we are able to establish a very useful necessary and sufficient condition for the reconfigurability of the array. This is indeed the theoretical footing for two reconfiguration algorithms: one adopts global control for the (fabrication-time) yield enhancement and the other is a distributed scheme for the (run-time) reliability improvement. For the fabrication time reconfiguration algorithm, the task can be reformulated as a maximum independent set problem. An existing algorithm in graph theory is adopted to effectively solve this problem. The simulations conducted indicate that the algorithm is computationally very efficient; therefore, it is also very suitable for the compile-time fault-tolerance. In contrast, for the real time reconfiguration algorithm, it is more suitable to adopt a distributive method for (asynchronous) array processors. The algorithm has several important features: (1) it is distributively executed by the processor elements (PEs); (2) no global information is required by the individual PEs; (3) the time overhead for reconfiguration is independent of the array size; (4) transient faults are handled by retries or by deactivating/reactivating the temporarily failed PE. Based on simulations, the performance of the algorithms and the tradeoffs between fault-tolerance capability and hardware complexity for various kinds of spare PE distributions are evaluated.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call