Triple modular redundancy (TMR) fault tolerance mechanism can provide almost perfect fault-masking, which has the great potential to enhance the reliability of real-time systems. However, multiple copies of a task are executed concurrently, which will lead to a sharp increase in system energy consumption. In this work, the problem of parallel applications using TMR on heterogeneous multi-core platforms to minimize energy consumption is studied. First, the heterogeneous earliest finish time algorithm is improved, and then according to the given application's deadline constraints and reliability requirements, an algorithm to extend the execution time of the copies is designed. Secondly, based on the properties of TMR, an algorithm for minimizing the execution overhead of the third copy (MEOTC) is designed. Finally, considering the actual situation of task execution, an online energy management (OEM) method is proposed. The proposed algorithms were compared with the state-of-the-art AFTSA algorithm, and the results show significant differences in energy consumption. Specifically, for light fault detection, the energy consumption of the MEOTC and OEM algorithms was found to be 80% and 72% respectively, compared with AFTSA. For heavy fault detection, the energy consumption of MEOTC and OEM was measured at 61% and 55% respectively, compared with AFTSA.
Read full abstract