Abstract

In many challenging applications, environmental conditions that affect fault-tolerance requirements imposed on computer systems change dynamically. As significant changes in environmental conditions or in internal computing resource conditions occur, the set of fault-tolerance mechanisms that are effective also changes. The purpose of adaptive fault-tolerance (AFT) is to meet the dynamically and widely changing fault-tolerance requirement by efficiently and adaptively utilizing a limited and dynamically changing amount of available redundant processing resources. This paper is an attempt to define the notion of AFT in a reasonably concrete form, identify major technical issues to be resolved for practical realization of AFT, and illustrate some feasible approaches to resolving the major issues. After discussing the basic concept and major research issues, an important case of AFT management, which is to adapt to the change of the environment from the soft-real-time mode to the hard-real-time mode, is examined in some detail.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call