Abstract

In satellite systems, triple modular redundancy (TMR) method with interconnected 3-CPUs is widely used to improve fault tolerance for the SEU/SET. Fault Detection, Isolation and Recovery (FDIR) functionality is also used to improve robustness of the system which isolates a faulty CPU and switches to a redundant CPU automatically. However, the FDIR does not work correctly in the following cases. First, SEU and SET may cause an unnecessary link occupation on the SpaceWire network. In this case, the voting mechanism and the fault detection mechanism work incorrectly due to the communication failure. Second, it is difficult to classify the cause of the fault combined with more than 1 failure mode by the master CPU. This paper proposes a novel FDIR method to overcome examples described above. The proposed method masks output signals of the SpaceWire interface with the error signal outputted from the voter. It enables the system to reset the link and notify the faults automatically. Furthermore, the CPUs notify each other the signal applying exclusive-OR (XOR) operation to the calculation results and a Timecode. This mechanism improves granularity of the fault classification. Finally, this paper clarifies the recovery time of the system in case of the double-fault including the link occupation by computer simulation. The simulation results show that the proposed method recovers the system with the same speed of the method which only uses a timeout mechanism.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call