Abstract

Classification of tobacco leaves by stalk position and color is a necessary prerequisite for tobacco accurate pricing and subsequent fine blending. The similarity of color, contour, and texture features limits the classification accuracy of current tobacco recognition technology, which is focusing on either a local feature or global feature extraction. To improve the feature fusion ability, we propose a tobacco classification model FSWPNet based on pyramid feature fusion through shifted window self-attention. Firstly, owing to the large proportion of tobacco targets in the original image, it is difficult for the standard pooling module to make full use of the overall appearance feature information. The Serial Maximum Spatial Pyramid Pooling (SMSPP) module is designed to aggregate multi-scale leaf apex features by concatenating multiple maximum pooling layers consecutively. Subsequently, to fusion, both local and global tobacco features, the Shifted Window Transformer Bottleneck (STRB) module is integrated into the feature pyramid fusion architecture. In the window encoder, window self-attention focuses on the low-level semantic local features. While shifted window self-attention is capable of long-distance modeling. It can obtain macroscopic tobacco features such as color, contour, and texture through a cross-window connection. The tobacco classification needs to be combined with the comprehensive identification of multiple feature information. The Spatial and Channel Mixed Multi-Layer Perceptron (SCMix MLP) is used to blend intra-patch positional feature and inter-patch spatial feature information. Finally, aiming at the hard samples with similar features, FSWPNet adopted DIoU Non-Maximum Suppression (DIoU-NMS) strategy for calculating the minimum outer rectangular diagonal distance to eliminate redundant regression boxes with low scores. The FSWPNet model is deployed and tested on the tobacco grading equipment. The experimental results show that FSWPNet achieves the best performance in terms of average classification accuracy 75.8%, inference time 12 ms and model size 16.8 MB. The proposed FSWPNet model can effectively classify 10 categories of tobacco agricultural products, which provides an important theoretical basis and premise for acquisition and fine processing.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call