Convolutional Neural Network (CNN) operators, mostly based on mathematical linear computations, are of vital importance to developing CNN-based software. Existing studies reveal that these operators are prone to floating-point precision problems (FPPs). In a CNN-based application, such problems can be propagated and result in catastrophic consequences. Thus, it is highly desired to detect and repair the FPPs in CNN operators. Considering the FPPs in CNN operators are mainly caused by accumulated floating-point errors and diverse floating-point tensors instead of wrong codes or bad implementations, it requires much time cost and is difficult to tackle these FPPs. In this paper, we propose the first method for the automated detection and repair of FPPs in CNN operators from the perspective of floating-point tensors. To generate diverse tensors with floating-point numbers, we design two levels of mutation rules, namely computation-level mutation and input-level mutation, containing a total of five mutation methods. To detect the FPPs caused by the accumulated floating-point errors, our method uses a weight matrix to guide the progressive mutation. To repair the detected FPPs, our method transforms the error-prone floating-point tensors based on the mathematical rewriting of the floating-point linear computational properties without destroying the original computation. Experimental results show that our methods can detect and repair FPPs in CNN operators effectively and efficiently and could reduce 93.32% to 100% of the FPPs in CNN operators. We conduct a case study on six different widely-used CNN models and confirm that the proposed FPP method is generalizable and effective across a variety of tasks and architectures. Our detection and repair method offers an intuitive way to handle FPPs during development, allowing users to continue building and fine-tuning their models without being slowed down by numerical precision errors. We believe that our method could open up a new way to enhance the quality of CNN operators and CNN-based software.
Read full abstract