A mobile grid (MG) consists of interconnected mobile devices which are used for high performance computing. Fault tolerance is an important property of mobile computational grid systems for achieving superior arrangement reliability and faster recovery from failures. Since the failure of the resources affects task execution fatally, fault tolerance service is essential to achieve QoS requirement in MG. The faults which occur in MG are link failure, node failure, task failure, limited bandwidth etc. Detecting these failures can help in better utilisation of the resources and timely notification to the user in a MG environment. These failures result in loss of computational results and data. Many algorithms or techniques were proposed for failure handling in traditional grids. The authors propose a checkpointing based failure handling technique which will improve arrangement reliability and failure recovery time for the MG network. Experimentation was conducted by creating a grid of ubiquitously available Android-based mobile phones.
Read full abstract