Abstract

Nowadays, target recognition, driverless, medical impact diagnosis, and other applications based on image recognition in life, scientific research, and work, rely mainly on a variety of large models with excellent performance, from the Convolutional Neural Network (CNN) at the beginning to the various variants of the classical model proposed now. In this paper, we will take the example of identifying catamount and canid datasets, comparing the efficiency and accuracy of CNN, Vision Transformer (ViT), and Swin Transformer laterally. We plan to run 25 epochs for each model and record the accuracy and time consumption separately. After the experiments we find that from the comparison of the epoch numbers and the real-time consumption, the CNN takes the least total time, followed by Swin Transformer. Also, ViT takes the least time to reach convergence, while Swin Transformer takes the most time. In terms of training accuracy, ViT has the highest training accuracy, followed by Swin Transformer, and CNN has the lowest training accuracy; the validation accuracy is similar to the training accuracy. ViT has the highest accuracy, but takes the longest time; conversely, CNN takes the shortest time and has the lowest accuracy. Swin Transformer, which seems a combination of CNN and ViT, is most complex but with ideal performance. In the future, ViT is indeed a promising model that deserves further research and exploration to contribute to the computer vision field.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.