In this paper, we propose a High Availability Open Shortest Path First (HA-OSPF) router which consists of two OSPF router modules, active and standby, to support a high availability network. First, we used the continuous-time Markov chain (CTMC) to analyze the steady-state availability of an HA-OSPF router with one active router and N standby routers (1 + N redundancy model). Then, with the failure detection and recovery rate considered, from analytic results, we show that the HA-OSPF router with 1 + 1 redundancy model, one active and one standby, is the preferred model for enhancing router availability. We also show that the carrier-grade HA-OSPF router availability (i.e., five-nine availability) can be achieved under an appropriate combination of the router module failure rate (λ), repair rate (μ), and the failure detection and recovery rate (δ). Since there is a lack of research on the integration of the redundancy model, link state information backup, and failure detection and recovery, we propose a high availability management middleware (HAM middleware) framework to integrate these three elements. The HAM middleware consists of Availability Management Framework (AMF) service, Checkpoint service, and Failure Manager. It supports health check, state information exchange, and failure detection and recovery. Each HA-OSPF router was designed to have a Linux operating system, HAM middleware, and OSPF process. We have implemented the HA-OSPF router on a PC-based system. Experimental results show that the failure detection and recovery times of the proposed PC-based HA-OSPF router were reduced by 98.76% and 91.45% compared to those of an industry standard approach, VRRP (Virtual Router Redundancy Protocol), for a software failure and a hardware failure, respectively. In addition, we have also implemented the HA-OSPF router on an ATCA (Advanced Telecom Computing Architecture) platform, which can provide an industrial standardized modular architecture for an efficient, flexible, and reliable router design. Based on our ATCA-based platform with 1/δ=217 ms for a software failure and 1/δ=1066 ms for a hardware failure, along with the router module data, 1/λ = 7 years and 1/μ = 4 hours, obtained from Cisco, the availabilities of the proposed ATCA-based HA-OSPF router are 99.99999905% for a software failure and 99.99999867% for a hardware failure. Therefore, the experimental results have shown that both our proposed ATCA-based and PC-based HA-OSPF routers with 1 + 1 redundancy model can easily meet the requirement of carrier-grade availabilities with five-nine.
Read full abstract