Abstract

Pattern classification has always been essential in computer vision. Transformer paradigm having attention mechanism with global receptive field in computer vision improves the efficiency and effectiveness of visual object detection and recognition. The primary purpose of this article is to achieve the accurate ripeness classification of various types of fruits. We create fruit datasets to train, test, and evaluate multiple Transformer models. Transformers are fundamentally composed of encoding and decoding procedures. The encoder is to stack the blocks, like convolutional neural networks (CNN or ConvNet). Vision Transformer (ViT), Swin Transformer, and multilayer perceptron (MLP) are considered in this paper. We examine the advantages of these three models for accurately analyzing fruit ripeness. We find that Swin Transformer achieves more significant outcomes than ViT Transformer for both pears and apples from our dataset.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.