Abstract

Employing the queuing theory, closed form solutions for the response time of a fault tolerant network of processors system based on the primary site approach is obtained. Fault tolerance is achieved in the primary site approach by having the services replicated by the primary at many nodes. All the requests are sent to the primary which, periodically, checkpoints its status on the backup nodes. If the primary fails, one of the backups takes over as primary. Two repair mechanisms are considered to repair faulty nodes in the system: delayed repair and immediate repair. In addition to their closed form formats, the analytical results presented in this paper have several other advantages over those presented in the previous work. First, for immediate repair case, there is no need to solve a set of recursive equations. Secondly, the results reveal much of the characteristics of the system. We studied the effect of checkpointing rate on the system response time and we found a closed form solution for the optimum checkpointing rate, which minimizes the system response time.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call