Abstract

The traditional approach to fault-tolerant computation has been via modular hardware redundancy. Although universal and simple, modular redundancy is inherently expensive and inefficient. By exploiting particular structural features of a computational architecture or an algorithm, arithmetic codes and recently developed algorithm-based fault tolerance (ABFT) techniques manage to introduce “analytical redundancy” and offer more efficient fault coverage at the cost of narrower applicability and harder design. In this paper, we extend a variety of results and constructive procedures that were developed in previous work for computations that take place in an abelian group to a more general setting that considers computations in semigroups. We demonstrate possible encodings for semigroup operations of interest and use our extension to design concurrent error detection and correction schemes for group and semigroup machines. The method provides insight regarding the role of decomposition in fault-tolerant algebraic machines and results in a general, hardware-independent characterization of concurrent error detection and correction in finite semiautomata. We also demonstrate that by extending this approach to other dynamic systems, with specific hardware implementations and failure modes, we can systematically obtain fault-tolerant architectures. More specifically, we apply these techniques to linear time-invariant dynamic systems and Petri net models of discrete event systems.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call