Abstract

In the past decade, there has been a push for deployment of commercial-off-the-shelf (COTS) avionics due in part to cheaper costs and the desire for more performance. Traditional radiation-hardened processors are expensive and only provide limited processing power. With smaller mission budgets and the need for more computational power, low-cost and highperformance COTS solutions become more attractive for these missions. Due to the computational capacity enhancements provided by COTS technology, machine-learning and computer-vision applications are now being deployed on modern space missions. However, COTS electronics are highly susceptible to radiation environments. As a result, reliability in the underlying computations becomes a concern. Matrix multiplication is used in machine-learning and computer-vision applications as the main computation for decisions, making it a critical part of the application. Therefore, the large time and memory footprint of the matrix multiplication in machine-learning and computer-vision applications makes them even more susceptible to single-event upsets. In this paper, algorithm-based fault tolerance (ABFT) is investigated to mitigate silent data errors in machine learning and computer vision. ABFT is a methodology of data error detection and correction using information redundancy contained in separate data structures from the primary data. In matrix multiplication, ABFT consists of storing checksum data in vectors separate from the matrix to use for error detection and correction. Fault injection into a matrix-multiplication kernel was performed prior to irradiation. Irradiation was then performed on the kernel under wide-spectrum neutrons at Los Alamos Neutron Science Center to observe the mitigation effects of ABFT. Fault injections targeted towards the general-purpose registers show a $48 \times$ reduction in data errors using data-error mitigation with ABFT with a negligible change in run-time. Cross-section results from irradiation show a $5.3 \times$ improvement in reliability of using ABFT as opposed to no mitigation with a $> 99.9999$ confidence level. The results of this experiment demonstrate that ABFT is a viable solution for run-time error correction in matrix multiplication for machine-learning and computer-vision applications in future spacecraft.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.