High availability (HA) is becoming an increasingly important requirement in a growing number of domains. It is even mandatory for critical systems, such as networking and communications, that cannot afford downtime. Such systems often monitor the state of crucial services and produce huge amounts of execution trace data, where functional and non-functional log entries are intertwined; hence they are hard to dissociate and analyze. Dynamic analysis aims at capturing and analyzing run-time behavior of a system based on its execution traces. In this paper, we apply dynamic analysis to retrieve and analyze HA scenarios from system execution traces. Our proposed approach aims to help analysts understand and report on how a highly available system detects and recovers from failures. As a proof of concept, we have selected the Hot Standby Router Protocol (HSRP) in order to demonstrate the applicability of our approach. We have evaluated empirically the effectiveness of our technique using four real-world case studies of IP networks running HSRP. Results have shown that high availability scenarios were successfully retrieved and analyzed. Moreover, results have shown that our prototype tool HAAnalyzer was able to effectively unveil high availability behavioral and temporal errors, that were seeded in the execution traces.
Read full abstract