High accuracy food image classification via vision transformer with data augmentation and feature augmentation

Xinle Gao,Zhiyong Xiao,Zhaohong Deng

doi:10.1016/j.jfoodeng.2023.111833

Abstract

Food image classification is an important research direction in the field of computer vision and machine learning. However food image classification faces great challenges when dealing with foods with similar shapes but different nutritional values. In order to improve this problem, this paper proposes a high-accuracy food image classification with data augmentation and feature enhancement through vision transformer (AlsmViT), which can accurately handle foods with similar shapes but different nutritional values, which is expected to help people better manage their diet and improve their health. Our approach incorporates Augmentplus, LayerScale, and multi-layer perception mechanisms for feature local enhancement. Our models are trained and validated on the public datasets Food-101 and Vireo Food-172, respectively, where the accuracy of the AlsmViT-L model validation set is 95.17% and 94.29%, respectively. Compared with other state-of-the-art self-supervised methods, our proposed method exhibits higher accuracy in food image classification tasks.

Full Text