Abstract
An essential task for computer vision-based assistive technologies is to help visually impaired people to recognize objects in constrained environments, for instance, recognizing food items in a grocery store. In this paper, we introduce a novel dataset with natural images of grocery items -- fruits, vegetables and packaged products -- where all images have been taken inside grocery stores to resemble an actual shopping scenario. In addition to the natural images, we download an iconic image and a text description of each item that can be utilized for constructing better representations of the grocery items. We select a multi-view generative model called Variational Canonical Correlation Analysis (VCCA), which efficiently combines the different information of the items into a single lower-dimensional representation. In the experiments, we show that utilizing the additional information with VCCA yields higher accuracies on classifying grocery items over standard image classifiers that only uses the natural images. We observe from visualizing the latent representations that the iconic images help to construct representations that are separated by the visual differences of the items, while the text descriptions enable the model to distinguish between visually similar items by their different ingredients and flavors. Moreover, we investigate a variant of VCCA called VCCA-private that separates shared and private information of the different data views. We verify that VCCA-private can separate variations in image backgrounds and structures of text sentences from the shared representation to enable a more accurate classification of grocery items in their natural environment.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.