Abstract

This paper presents a platform and design approach for enabling radiation-tolerant deep learning acceleration on SRAM-based 20nm Kintex UltraScale™ FPGAs, for terrestrial and high-radiation environments. The presented platform is suitable for deep neural network (DNN) implementations with an emphasis on image classification and includes solutions to mitigate both radiation-induced Single Event Functional Interrupts (SEFIs) and network datapath corruptions. The radiation-tolerant deep learning platform combines Xilinx’s Deep Learning Processing Unit (DPU) IP, Triple Modular Redundancy (TMR) MicroBlaze soft processor IP and Soft Error Mitigation (SEM)-IP to mitigate SEFIs. Furthermore, a technique known as Fault Aware Training (FAT) was applied to effectively mitigate single event effects in the datapath. Test results from a high-energy proton beam (> 60 MeV) experiment using the ResNet-18 Convolutional Neural Network (CNN) for image classification are presented. The Single Event Upset (SEU) rate, system-level SEFI rate and neural network classification/datapath performance are compared between the radiation-tolerant platform and a standard, non-mitigated approach. Results show that datapath classification errors dominate the system response (90%) vs. SEFIs (10%). When compared to standard non-mitigated training techniques, the radiation-tolerant platform using fault aware training methods shows dramatic improvements in overall system response: the overall single event cross-section was reduced by half and 40% reduction in misclassification errors were observed. Also, datapath events with classification accuracy degradation larger than 5% were completely mitigated. The SEFI rate was reduced by 100X with implemented solutions and can be further reduced by optimizing the physical separation between TMR modules.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call