Weeds are one of the main hazards affecting the yield and quality of rice. In farmland ecosystem, weeds compete with rice for resources such as light, water, soil and space, and also cause plant disease by providing habitats for pests, resulting in a decline in yield of rice. Spraying herbicides is used most frequently due to their lower cost than manual weeding and they can effectively control a varieties of weeds. Precise spraying application may utilise intelligent technology to precisely limit the weed growth at specific locations and control the consumption of herbicides, which not only has lower cost and risk for the environment but also increases the economic benefits of agricultural products. The target of this work is to use deep learning model to perform weed detection in rice crop images and to achieve accurate real-time detection and low machine cost resulting in widely used in practice. For this purpose, a dataset of rice and weeds for object detection is established by means of on-site shooting and crawling of images on the web, which contains rice and eight categories of weeds. However, the overlap between crops and weeds poses a great challenge to weed detection. To overcome this challenge, this paper proposed a model named WeedDet based on RetinaNet, which improves the feature extraction ability of the backbone, feature pyramid network and detection head to deal with the complex information in the image, respectively. And the structure of feature pyramid and detection head are lightened to improve speed of detection. The specific implementation methods are as follows: Firstly, we propose Det-ResNet to reduce the loss of detailed information and improve the feature extraction ability of detailed textures by modifying the initial convolution structure of ResNet. In addition, the improved feature pyramid network fuses the features extracted by Det-ResNet with higher efficiency without loss of accuracy. Secondly, we propose Efficient Retina Head (ERetina-Head) with thin feature maps (small channel number feature maps) and large separable convolution, which not only saves memory and computation, but also pooling more powerful feature maps during training and inference. Finally, we combine SmoothL1 loss and GIoU loss to calculate the regression loss. In addition, Varifocal loss is used to unify the classification and quality evaluation branches in training. Our network achieves a high mAP of 94.1% and the frame rate of 24.3 fps, which is 5.5% mAP and 5.6 fps higher than the baseline RetinaNet. Experiments show that fps of WeedDet (24.3) is second only to YOLOv3 (26.3) while the mAP is 9.9% (0.941 vs 0.842) higher than YOLOv3, and the mAP is also higher than other models, verifying the effectiveness and efficiency of the model.
Read full abstract