Using Variational Multi-View Learning for Classification of Grocery Items

Marcus Klasson,Hedvig Kjellström,Cheng Zhang

doi:10.2139/ssrn.3588894

Marcus Klasson, Hedvig Kjellström + Show 1 more

Open Access

https://doi.org/10.2139/ssrn.3588894

Copy DOI

Abstract

An essential task for computer vision-based assistive technologies is to help visually impaired people to recognize objects in constrained environments, for instance, recognizing food items in a grocery store. In this paper, we introduce a novel dataset with natural images of grocery items -- fruits, vegetables and packaged products -- where all images have been taken inside grocery stores to resemble an actual shopping scenario. In addition to the natural images, we download an iconic image and a text description of each item that can be utilized for constructing better representations of the grocery items. We select a multi-view generative model called Variational Canonical Correlation Analysis (VCCA), which efficiently combines the different information of the items into a single lower-dimensional representation. In the experiments, we show that utilizing the additional information with VCCA yields higher accuracies on classifying grocery items over standard image classifiers that only uses the natural images. We observe from visualizing the latent representations that the iconic images help to construct representations that are separated by the visual differences of the items, while the text descriptions enable the model to distinguish between visually similar items by their different ingredients and flavors. Moreover, we investigate a variant of VCCA called VCCA-private that separates shared and private information of the different data views. We verify that VCCA-private can separate variations in image backgrounds and structures of text sentences from the shared representation to enable a more accurate classification of grocery items in their natural environment.

Full Text