Taxonomy for the Analysis of Fault Tolerant Agents Replica Performance in Grid

Sunil Gavaskar Paleti ,S Rao

doi:10.1109/acct.2014.87

Abstract

One of the major challenges in wide use of grid environment is fault tolerance and avoidance. Existing check pointing schemes provides a way of fault detection and recovery. Fault tolerance, communication, efficiency and reliability are important requirements in grid environment. In general replicas are used as proactive (i.e. failure considered before scheduling of a job) and post active handles the job failure after it has occurred. In our proposed model replicas used as centralized and local replicas within agents. In this paper we address the problem of how to use set of replicated distributed objects in the context of scheduled jobs, tasks execution time (t) and search cost of replicas using agents and objects. The main objective of the work is to propose replicas as centralized and local with varying number of agent size and its failure rate parameters such as number of tasks, mean time between failures and increased number of agents if fault occurs. The model considered number of replicas as object with different states, the requirement of task like replicas availability, processing capability, and mapping of agents replica to tasks. These requirements are dynamic and effective on the basis of RFOH (Resource Fault Occurrence History) while processing. The proposed model is an efficient solution towards the agents computing capabilities and high availability when resource usage, task execution and mean time to failure rates are considered into account. In this paper Analytical study of the reliability for the proposed model is specified.

Full Text