State Recognition of Food Images Using Deep Features

Gianluigi Ciocca,Paolo Napoletano,Giovanni Micali

doi:10.1109/access.2020.2973704

Gianluigi Ciocca, Paolo Napoletano + Show 1 more

Open Access

https://doi.org/10.1109/access.2020.2973704

Copy DOI

Abstract

State recognition of food images is a recent topic that is gaining a huge interest in the Computer Vision community. Recently, researchers presented a dataset of food images at different states where unfortunately no information regarding the food category was included. In practical food monitoring applications it is important to be able to recognize a peeled tomato instead of a generic peeled item. To this end, in this paper, we introduce a new dataset containing 20 different food categories taken from fruits and vegetables at 11 different states ranging from solid, sliced to creamy paste. We experiment with most common Convolutional Neural Network (CNN) architectures on three different recognition tasks: food categories, food states, and both food categories and states. Since lack of labeled data is a common situation in practical applications, here we exploits deep features extracted from CNNs combined with Support Vector Machines (SVMs) as an alternative to the End-to-End classification. We also compare deep features with several hand-crafted features. These experiments confirm that deep features outperform hand-crafted features on all the three classification tasks and whatever is the food category or food state considered. Finally, we test the generalization capability of the most performing deep features by using another, publicly available, dataset of food states. This last experiment shows that the features extracted from a CNN trained on our proposed dataset achieve performance quite close to the one achieved by the state of the art method. This confirms that our deep features are robust with respect to data never seen by the CNN.

Highlights

In the last few years, one of the most active topics in the Computer Vision community is the image understanding for object recognition [1], [2]
We started our investigation with several questions: Can we recognize foods across different states? Can we recognize a food state independently by its identity? How robust are end-to-end Convolutional Neural Networks (CNNs)? How robust are CNN-based features with respect to hand-crafted features? Our experiments effectively show that, with the proper network, we can obtain robust features to be used for different food-related classification tasks
There is no significant advantages in combining hand-crafted features with learned ones. It seems that the state recognition problem is more approachable by the CNN-based features than the food classification one

Summary

Introduction

In the last few years, one of the most active topics in the Computer Vision community is the image understanding for object recognition [1], [2] Within this context, automatic food analysis [3]–[7] is one application scenario that received great attention recently. Accurate tracking of daily nutrition intake is conducive for people to maintain a healthy weight, and important to treat and control food-related health problems like obesity and diabetes. This has been accomplished by exploring daily recorded manual logs.

Objectives

Methods

Results

Conclusion