The multiprocessor system on chips (MPSoCs) are considered today the core of most modern systems. Most of the applications of these heterogeneous MPSoCs include critical systems and hence terms of fault tolerance and reliability have become essential. Task replication is a technique to carry out fault tolerance and can help for reducing the schedule length by increasing locality. It introduces an upper and lower bound for the makespan of each schedule while each task is replicated more than once. If a fault occurs during execution, the expected makespan will be some value between the upper bound and the lower bound based on when and where the fault has occurred. In this research a new performance parameter namely the weighted average makespan is introduced. It is calculated as the average of the lower and upper bounds of makespan using the probability of occurrence of each. Two scheduling algorithms are presented for fault tolerant scheduling based on directed acyclic graphs. These algorithms are the list scheduling algorithm and the optimizing of the weighted average makespan based on simulated annealing method. The simulation results show that the techniques can improve the schedule length and increase the system reliability without compromising the performance.
Read full abstract