Abstract

Apple detection in complex orchard environments holds significant research importance for yield estimation. Although convolutional neural networks have been widely used in the field of object detection, they also have certain limitations. One of the major drawbacks is their inductive biases of locality and scale invariance, which often pose challenges in capturing global and long-term dependencies. In this work, we replace the backbone network with a moving window transformer based on Faster RCNN, fusing features from different stages and introducing an enhanced smoothing loss function called Faster RFormer. We created an apple detection dataset called AD-2023 to validate the reliability of the model. The results indicate that the proposed method in this paper achieved impressive results with 0.692 mAP, 0.796 AP@0.75 and 0.941 AP@0.5, surpassing existing algorithms. More importantly, our study not only provides a reliable idea for the inadequacy of convolutional neural networks for detection tasks in complex environments, but also establishes a new benchmark in apple detection methodologies, showcasing the potential for broader applications in agricultural automation and robotics.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call