AbstractOne of the most important and frequently reported issues in cloud computing is fault tolerance. Implementing Fault Tolerance (FT) in cloud computing is challenging due to the diverse architecture and the complex interrelationships of system resources. The primary objective of this article is to critically review and analyze the fault-tolerant models with two other related aspects, i.e., load balancing and scheduling which is the peak need of the time and was not adequately addressed in the recent related surveys. In this paper, we present the systematic and comparative analysis of these hybrid models highlighting their limitations in different parameters, cases, and scenarios. Our analysis reveals that Proactive, Reactive, and Resilient approaches are commonly utilized to address system failure in the cloud. Also, it was found that a thorough study of intelligent fault tolerance approaches, also known as resilient fault tolerance, was overseen to determine their efficacy over conventional approaches. Additionally, the survey includes the discussion part which presents a unique in-depth analysis of hybrid fault tolerant approaches with respect to the handling of different faults and parameters. To illustrate the reviewed observations, a detailed statistical analysis has been conducted and presented graphically to provide insights into the study and simultaneously highlight further research in this area. Our analysis includes the critical role of these hybrid fault-tolerant models in accomplishing high accessibility and reliability in emerging computing systems thereby providing valuable insights for future researchers of the field. We have also provided a broad roadmap that charts strategies for facing the discussed cloud challenges. The study provides valuable contributions to the field.
Read full abstract