An attribution-based pruning method for real-time mango detection with YOLO network

Rui Shi,Tianxing Li,Yasushi Yamaguchi

doi:10.1016/j.compag.2020.105214

Abstract

Real-time fruit detection and localization in orchards are essential for agronomic applications of yield estimation, yield mapping, and automated harvesting. Traditional detection methods based on hand-crafted feature extractors are difficult to adapt to the complicated variations in real orchard environments. Modern deep neural networks (DNNs) need high performance computing units for inference, this is not practical for typical farms and orchards, despite the high detection performance. To reduce the computation cost of DNNs, we propose a generalized attribution method for pruning detection networks which can be easily finetuned to accurately detect mango in real time. By designing the channel and spatial masks to generalize the attribution method, the convolutional kernels that are firmly correlated with specific target output in the original YOLOv3-tiny network can be detected. Then, the uncorrelated kernels are pruned in channel-dimension layer-by-layer. Before finetuning the pruned network, anchor sizes, data augmentation, and learning rate decay were adapted for mango detection. The experimental results show that the proposed pruning method could identify the highly target-related convolutional kernels and that the finetuned network provides better mango detection performance than the original. Our resulting network which is a scale and rotation invariant mango detection network achieved an F1-score of 0.944 with 2.6 GFLOPs (giga-floating point operations). Compared to the finetuned network without pruning, the computation of our network was reduced by 68.7% whereas the accuracy was increased by 0.4%. Compared to a state-of-the-art network trained with the same mango dataset, the computation was reduced by 83.4% with only about 2.4% loss in accuracy. The proposed pruning method can strip a sub-network from a large-scale detection network to meet the real-time requirements of low-power-consumption processors for mobile devices, e.g., ARM Cortex-A8 performs around 4.0 GFLOPS (giga-floating point operations per second). The trained network and test code are available for comparative studies.

Full Text