The accurate identification of wheat heads is critical for assessing production and managing agricultural fields effectively. One major challenge lies in detecting overlapping and small wheat spikes amidst complex backgrounds. Recent advancements in deep learning, specifically Convolutional Neural Networks (CNNs), have been employed to address this challenge. However, conventional CNNs often struggle with capturing global interdependencies due to their focus on local features and limited scale invariance. To overcome these limitations, this study proposes a Transformer-based Feature Learning Network (FLTrans-Net). FLTrans-Net aims to learn discriminative features while reducing background noise. The methodology comprises three key components: a Multi-Scale Fusion Block, a Spatial Attention Block, and a Lightweight RetinaNet Detection Block. The Multi-Scale Fusion Block integrates high-scale and low-scale features to extract multi-scale representations, essential for detecting small-sized wheat heads in complex backgrounds. The Spatial Attention Block utilizes transformer encoders to highlight salient object characteristics, enhancing the model’s performance. Additionally, a Lightweight RetinaNet Detection Block is incorporated to capture potentially valuable features extracted by preceding blocks. Experimental results demonstrate FLTrans-Net’s effectiveness in wheat head detection in challenging field environments, indicating its potential for real-time deployment on resource-constrained devices. Overall, FLTrans-Net represents a significant advancement in wheat head detection, offering both efficacy and efficiency in handling complex agricultural landscapes.
Read full abstract