Abstract

Imperfect coverage (IPC) occurs when a malicious component failure causes extensive damage due to inadequate fault detection, fault location or fault recovery. Common-cause failures (CCF) are multiple dependent component failures within a system due to a shared root cause. Both imperfect coverage and common-cause failures can exist in distributed computer systems and can contribute significantly to the overall system unreliability. Moreover they can complicate the reliability analysis. In this study, we propose an efficient approach to the reliability analysis of distributed computer systems (DCS) with both IPC and CCF. The proposed methodology is to decouple the effects of IPC and CCF from the combinatorics of the solution. The resulting approach is applicable to the computationally efficient binary decision diagrams (BDD) based method for the reliability analysis of DCS. We provide a concrete analysis of an example DCS to illustrate the application and advantages of our approach. Due to the consideration of IPC and CCF, our approach can evaluate a wider class of DCS as compared with existing approaches. Due to the nature of the BDD and the separation of IPC and CCF from the solution combinatorics, our approach has high computational efficiency and is easy to implement, which means that it can be easily applied to the accurate reliability analysis of large-scale DCS subject to IPC and CCF. The DCS without IPC or CCF appear to be special cases of our approach.

Highlights

  • A distributed computer system (DCS) is a collection of interconnected independent computers that appears to its users as a single coherent system[1]

  • Because failure to consider imperfect coverage (IPC) in the reliability analysis leads to overestimated system reliability, considerable research have been performed in studying IPC for the reliability analysis of faulttolerant systems[2,3,4,5,6,7], but only few of them[5,7] are applicable to DCS and their complexity can increase rapidly as the size of DCS, i.e., the number of hosts and links in a DCS increases

  • Binary Decision Diagram Common Cause Common-Cause Event Common-Cause Failure Common-Cause Group Distributed Computer System Distributed Program Reliability Distributed Program UnReliability Distributed System Reliability File Spanning Tree Imperfect Coverage Imperfect Coverage Model Minimal File Spanning Tree Reduced Ordered binary decision diagrams (BDD) Implies: statistical(ly) spanning tree (FST) is defined as a spanning tree that connects the root node, i.e., the host running the program under consideration to other nodes such that its vertices hold all the required resources for successful execution of the program

Read more

Summary

INTRODUCTION

A distributed computer system (DCS) is a collection of interconnected independent computers (hosts) that appears to its users as a single coherent system[1]. We seek to address some of these limitations in developing a model for the reliability analysis of DCS subject to CCF by allowing for multiple CC that can affect different subsets of system components, and which can occur statisticallydependently. The existing methods did not consider IPC and CCF in a DCS and they share a restrictive assumption that a single elementary CC leads to simultaneous failures of all components of a system. Binary Decision Diagram Common Cause Common-Cause Event Common-Cause Failure Common-Cause Group Distributed Computer System Distributed Program Reliability Distributed Program UnReliability Distributed System Reliability File Spanning Tree Imperfect Coverage Imperfect Coverage Model Minimal File Spanning Tree Reduced Ordered BDD Implies: statistical(ly) spanning tree (FST) is defined as a spanning tree that connects the root node, i.e., the host running the program under consideration to other nodes such that its vertices hold all the required resources for successful execution of the program. We assume that the three exit probabilities of the IPCM: transient restoration (r), permanent coverage (c) and single point of failure (s) for each component are given as fixed probabilities

PROBLEM STATEMENT
SEPARABLE AND EFFICIENT DPR ANALYSIS
ACCE i φ
Based on the CCE space we developed and the
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call