As an important component of space electronic systems, static random-access memory based field programmable gate arrays (FPGAs) are inevitably affected by single-event upsets, including single-cell upset (SCU) and multiple-cell upset (MCU), caused by space radiation. In particular, as the technology scales down, the probability of MCU increases significantly; however, the analysis of MCU remains an open challenge. Based on the common cause failure theory and continuous-time Markov chain, this article proposes a new hybrid model to quantify the coexistence effect of MCU and SCU on mitigation design modes with different combinations of strategies, such as triple modular redundancy, partition, and scrubbing. Synchronization and interleaving modeling technologies are used to construct the concurrent behavior of an FPGA system under different mitigation strategies and MCU problems. In addition, we introduce a partition factor to quantitatively explain the phenomena and laws of MCUs acting on adjacent partitions. The proposed method is demonstrated using TMR with scrubbing and partition strategies, which verify that the existence of MCU reduces the system reliability and availability, and the scrubbing and partition strategy are effective against MCU issues. More importantly, the limitations of mitigation strategies in different occurrence probabilities of MCU can be found by the proposed method. In summary, the analysis and discussion presented in this article can provide useful insights for relevant designers to select and optimize different design patterns of a system operating in a dynamic and complex radiation environment.
Read full abstract