Abstract

Food image classification is an important research direction in the field of computer vision and machine learning. However food image classification faces great challenges when dealing with foods with similar shapes but different nutritional values. In order to improve this problem, this paper proposes a high-accuracy food image classification with data augmentation and feature enhancement through vision transformer (AlsmViT), which can accurately handle foods with similar shapes but different nutritional values, which is expected to help people better manage their diet and improve their health. Our approach incorporates Augmentplus, LayerScale, and multi-layer perception mechanisms for feature local enhancement. Our models are trained and validated on the public datasets Food-101 and Vireo Food-172, respectively, where the accuracy of the AlsmViT-L model validation set is 95.17% and 94.29%, respectively. Compared with other state-of-the-art self-supervised methods, our proposed method exhibits higher accuracy in food image classification tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call