Using Variational Multi-view Learning for Classification of Grocery Items

Marcus Klasson,Cheng Zhang,Hedvig Kjellström

doi:10.1016/j.patter.2020.100143

Marcus Klasson, Cheng Zhang + Show 1 more

Open Access

https://doi.org/10.1016/j.patter.2020.100143

Copy DOI

Abstract

SummaryAn essential task for computer vision-based assistive technologies is to help visually impaired people to recognize objects in constrained environments, for instance, recognizing food items in grocery stores. In this paper, we introduce a novel dataset with natural images of groceries—fruits, vegetables, and packaged products—where all images have been taken inside grocery stores to resemble a shopping scenario. Additionally, we download iconic images and text descriptions for each item that can be utilized for better representation learning of groceries. We select a multi-view generative model, which can combine the different item information into lower-dimensional representations. The experiments show that utilizing the additional information yields higher accuracies on classifying grocery items than only using the natural images. We observe that iconic images help to construct representations separated by visual differences of the items, while text descriptions enable the model to distinguish between visually similar items by different ingredients.

Highlights

In recent years, computer vision-based assistive technologies have been developed for supporting people with visual impairments
We investigate whether the classification performance of grocery items in natural images can be improved by extracting the view-specific variations in the additional views from the shared latent space with this variant of Variational CCA (VCCA), called VCCA-private
We show how iconic images can be used for enhancing the interpretability of the classification, which was illustrated by Klasson et al.[2]

Summary

Introduction

Computer vision-based assistive technologies have been developed for supporting people with visual impairments. Such technologies exist in the form of mobile applications, e.g., Microsoft’s Seeing AI (https://www.microsoft.com/en-us/ seeing-ai/) and Aipoly Vision (https://www.aipoly.com/), and as wearable artificial vision devices, e.g., Orcam MyEye (https:// www.orcam.com/en/), Transsense (https://www.transsense.ai/), and the Sound of Vision system.[1] These products can support people with visual impairments in many different situations, such as reading text documents, describing the user’s environment, and recognizing people the user may know. We focus on an application that is essential for assistive vision, namely visual support when shopping for grocery items, considering a large range of edible items including fruits, vegetables, and refrigerated products, e.g., milk and juice packages. Similar items are usually stacked next to each other, so that can be misplaced into neighboring

Objectives

Methods

Results

Conclusion