Abstract

High fault tolerance issue is one of the major obstacles for opening up the new era of high serviceability cloud computing as fault tolerance plays a key role in order to ensure cloud serviceability. In most current clouds, check pointing, the process of saving application states, and replication, the process of replicating hot data, usually to stable storage, have been the two most common fault tolerance strategies. However, when, where, and how often to insert check pointing or to replicate hot data have become challenges and are ignored in clouds. In this paper, the definitions of fault, error, and failure in a cloud are given, a high serviceability model by check pointing and replication strategy HSCR is put forward. It includes: (1) analyzing the mathematical relationship between different failure rates and two different fault tolerance strategies, which are check pointing fault tolerance strategy and data replication fault tolerance strategy, (2) building a high serviceability check pointing fault tolerance model and a high serviceability replication fault tolerance model by combining the two fault tolerance models together to maximize the serviceability and meet the SLOs. Experimental results conclusively demonstrate that the high serviceability model HSCR has high potential as it provides efficient fault tolerance enhancements, significant cloud serviceability improvement, and great SLOs satisfaction.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call