Field-programmable gate arrays are susceptible to radiation-induced single event upsets. These are commonly dealt with using triple modular redundancy (TMR) and module-based configuration memory error recovery (MER). By triplicating components and voting on their outputs, TMR helps localise configuration memory errors, and by reconfiguring faulty components, MER swiftly corrects them. However, the order in which TMR voters are checked inevitably impacts the overall system reliability. In this study, the authors outline an approach for computing the reliability of TMR–MER systems that consist of finitely many components. They demonstrate that system reliability is improved when the more vulnerable components are checked more frequently than when they are checked in round-robin order. They propose a genetic algorithm for finding a voter checking schedule that maximises the reliability of TMR–MER systems. Results indicate that the mean time to failure (MTTF) of these systems can be increased by up to 400% when variable-rate voter checking (VRVC) is used instead of round robin. They show that VRVC achieves 15–23% increase in MTTF with a 10× reduction in checking frequency to reduce system power. They also found that VRVC detects errors 44% faster on average than round robin.
Read full abstract