Abstract

Traditional pest detection methods are challenging to use in complex forestry environments due to their low accuracy and speed. To address this issue, this paper proposes the YOLOv4_MF model. The YOLOv4_MF model utilizes MobileNetv2 as the feature extraction block and replaces the traditional convolution with depth-wise separated convolution to reduce the model parameters. In addition, the coordinate attention mechanism was embedded in MobileNetv2 to enhance feature information. A symmetric structure consisting of a three-layer spatial pyramid pool is presented, and an improved feature fusion structure was designed to fuse the target information. For the loss function, focal loss was used instead of cross-entropy loss to enhance the network’s learning of small targets. The experimental results showed that the YOLOv4_MF model has 4.24% higher mAP, 4.37% higher precision, and 6.68% higher recall than the YOLOv4 model. The size of the proposed model was reduced to 1/6 of that of YOLOv4. Moreover, the proposed algorithm achieved 38.62% mAP with respect to some state-of-the-art algorithms on the COCO dataset.

Highlights

  • Forestry is crucial in national defense construction, industrial and agricultural production, daily life, and national economic construction [1]

  • In YOLOv3, a top-down feature pyramid network (FPN) is used as the feature fusion structure of the network to transfer high-level semantic information to the lower layers

  • The BA module consists of the Weighted Bi-directional Feature Pyramid Network (BiFPN) [41] and the Adaptive Spatial Feature Fusion (ASFF) [42]

Read more

Summary

Introduction

Forestry is crucial in national defense construction, industrial and agricultural production, daily life, and national economic construction [1]. At present, automated pest detection methods can be divided into two main categories: sensor-based methods [2,3,4,5] and visual image-based methods [6,7,8,9,10,11,12,13,14,15,16]. With the development of deep learning, automated feature extraction based on convolutional neural networks can extract rich information from images [17]. Deep learning-based approaches face some challenges: (1) the detection of small targets is difficult; (2) models deployed in mobile or embedded devices pose difficulties in the balance of recognition effectiveness and light weight. Focal loss was used for the classification, and confidence loss to make the model more accurate in recognizing small targets

Related Work
YOLOv4 Network Model
Backbone
Inverse
Attention Mechanism
Coordinate Information Embedding
Attention Generation
Multi-Scale Feature Fusion
Composition of the BiFPN
Composition of ASFF
Loss Function
Experimental Results and Analysis
Evaluation
Performance Comparison
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call