Fault tolerance in cloud computing environment: A systematic survey

Moin Hasan,Major Singh Goraya

doi:10.1016/j.compind.2018.03.027

Abstract

Fault tolerance is among the most imperative issues in cloud to deliver reliable services. It is difficult to implement due to dynamic service infrastructure, complex configurations and various interdependencies existing in cloud. Extensive research efforts are consistently being made to implement the fault tolerance in cloud. Implementation of a fault tolerance policy in cloud not only needs specific knowledge of its application domain, but a comprehensive analysis of the background and various prevalent techniques also. Some recent surveys try to assimilate the various fault tolerance architectures and approaches proposed for cloud environment but seem to be limited on some accounts. This paper gives a systematic and comprehensive elucidation of different fault types, their causes and various fault tolerance approaches used in cloud. The paper presents a broad survey of various fault tolerance frameworks in the context of their basic approaches, fault applicability, and other key features. A comparative analysis of the surveyed frameworks is also included in the paper. For the first time, on the basis of an analysis of various fault tolerance frameworks cited in the present paper as well as included in the recently published prime surveys, a quantified view on their applicability is presented. It is observed that primarily the checkpoint-restart and replication oriented fault tolerance techniques are used to target the crash faults in cloud.

Full Text