Lightweight food classification model based on MobileViT and ULSAM

Tian Zhang

doi:10.54254/2755-2721/27/20230300

Abstract

This paper presents a novel approach to enhance the image classification performance by incorporating Unified Local-Scale Attention Module mechanism into the lightweight MobileViT architecture. The MobileViT+ Ultra-Lightweight Subspace Attention Module model achieved remarkable accuracy on the ISIA food-500 dataset, while maintaining computational efficiency and parameter quantity similar to the original MobileViT model. Moreover, the MobileViT+ Ultra-Lightweight Subspace Attention Module model outperforms other lightweight models such as MobileNetV2 and LCNet. The ablation experiments confirmed the effectiveness of Ultra-Lightweight Subspace Attention Module in enhancing classification accuracy and its ability to uniformly optimize multiple model structures. Additionally, this paper explored a more lightweight model that significantly reduced FLOPs and parameter quantity while maintaining the same model performance. Overall, this research provides a practical and resource-efficient approach for improving image classification performance in various deep learning.

Full Text