Abstract

Many timing constraint (or real-time) distributed systems, such as real-time database systems, are now being used in safety critical applications. However, they are subject to system failures caused by the malfunction of underlying network components. Without the helps of network experts or sophisticated management tools, most users cannot resolve these network problems by themselves. Sometimes, worse, it is usually prohibited to use these management tools, e.g. the ‘ping’ command, for the security sake. Accordingly, we develop a management system to automate network fault identification merely based on the analysis of the abnormal events from the monitored timing constraint distributed system. In this system, a fault identification framework is designed to identify automatically faulty network elements by using a two-level fault propagation model which combines Timing Constraint Petri nets with an alarm clustering mechanism. In addition, the concepts of redundant/ringleader alarms and innocent network elements are also introduced into the framework to obtain an effective diagnosis. At last, the management system is implemented according to the framework to demonstrate the performance of our fault identification.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call