Abstract Software rejuvenation is a proactive fault management technique that is used to counteract aging phenomena in continuously running software systems. To mitigate such phenomena, rejuvenation includes preventive periodic stoppage of the running software, cleaning its internal state by garbage collection, flushing operating system kernel tables, defragmentation and reinitialization of internal data structures, and then restarting it. In this paper, a two-unit series software system is considered which can experience different failure modes. Each software component can experience both soft and hard failures. A hard failure is counteracted by a hardware reboot, though a soft failure is recovered by software rejuvenation. Additionally, rejuvenation is proactively initiated when a software component transitions into a degraded, failure-prone state. This paper introduces the innovative concept of smart rejuvenation, which strategically leverages system downtime caused by a hard failure in one component to simultaneously rejuvenate another component. To model the entire system’s evolution in time, a semi-Markov process is used. The aim of this work is twofold: firstly, to distinguish the rejuvenation policy for each software component that optimizes the entire system availability and operational cost, and secondly to examine if smart rejuvenation can improve these measures for the software system.
Read full abstract