Abstract

In this paper, we propose a novel method to infer plate regions of food images without any pixel-wise annotation. We synthesize plate segmentation masks using difference of visualization in food image classifiers. To be concrete, we use two types of classifiers: a food category classifier and a food/non-food classifier. Using the Class Activation Mapping (CAM) which is one of the basic visualization techniques of CNNs, a food category classifier can highlight food regions containing no plate regions, while a food/non-food category classifier can highlight food regions including plate regions. By taking advantage of the difference between the food regions estimated by visualization of two kinds of the classifiers, in this paper, we demonstrate that we can estimate plate regions without any pixel-wise annotation, and we proposed the approach for boosting the accuracy of weakly-supervised food segmentation using the plate segmentation. In experiments, we show the effectiveness of the proposed approach by evaluating and comparing the accuracy of the weakly-supervised segmentation. The proposed approaches certainly improved an image-level weakly-supervised segmentation method in the food domain and outperformed a well-known bounding box-level weakly-supervised segmentation method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call