Abstract

Recognizing the ingredients composition for given food images facilitates the estimation of nutrition facts, which is crucial to various health relevant applications. Nevertheless, ingredient recognition is a multi-label long-tailed classification problem, where each image may contain multiple labels and the class distributions are highly imbalanced. Most existing approaches leverage off-the-shelf Convolutional Neural Networks (CNN) for multi-label ingredient recognition, overlooking the long-tailed issue, which results in low accuracy for tail ingredient categories. To address this problem, this paper proposes a dynamic Mixup (D-Mixup) approach, aiming to dynamically augment minority ingredients, in order to boost the recognition performance for tail ingredient categories. Specifically, our D-Mixup approach dynamically selects two training images based on the predictions of the previous training epoch, and generates a new synthetic image to train the recognition network. In this way, the training samples of tailed classes can be dynamically enlarged and better discriminative representations can be learnt for rare classes. Extensive experiments on both VIREO Food-172 dataset and UEC Food-100 dataset demonstrate the effectiveness of the proposed D-Mixup method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call