VISTA: Vision Transformer enhanced by U-Net and Image Colorfulness Frame Filtration for Automatic Retail Checkout

Md. Istiak Hossain Shihab,Nabeel Mohammed,Hasib Zunair,Labiba Kanij Rupty,Nazia Tasnim

doi:10.1109/cvprw56347.2022.00359

Md. Istiak Hossain Shihab, Nabeel Mohammed + Show 3 more

Open Access

https://doi.org/10.1109/cvprw56347.2022.00359

Copy DOI

Abstract

Multi-class product counting and recognition identifies product items from images or videos for automated retail checkout. The task is challenging due to the real-world scenario of occlusions where product items overlap, fast movement in conveyor belt, large similarity in overall appearance of the items being scanned, novel products, the negative impact of misidentifying items. Further there is a domain bias between training and test sets, specifically the provided training dataset consists of synthetic images and the test set videos consist of foreign objects such as hands and tray. To address these aforementioned issues, we propose to segment and classify individual frames from a video sequence. The segmentation method consists of a unified single product item- and hand-segmentation followed by entropy masking to address the domain bias problem. The multi-class classification method is based on Vision Transformers (ViT). To identify the frames with target objects, we utilize several image processing methods and propose a custom metric to discard frames not having any product items. Combining all these mechanisms, our best system achieves 3rd place in the AI City Challenge 2022 Track 4 with F1 score of 0.4545. Code will be available at https://github.com/istiakshihab/automated-retail-checkout-aicity22.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

VISTA: Vision Transformer enhanced by U-Net and Image Colorfulness Frame Filtration for Automatic Retail Checkout

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Automatic Myocardial Segmentation in Four-Chamber View Echocardiography Images
Shakiba Moradi ... Azin Alizadehasl
Iranian Journal of Radiology | VOL. 16
Shakiba Moradi, et. al.Shakiba Moradi ... Azin Alizadehasl
10 Dec 2019
Iranian Journal of Radiology | VOL. 16

MicroRNA Signature Predicts Survival and Relapse in Lung Cancer
Sung-Liang Yu ...
Cancer Cell | VOL. 13
Sung-Liang Yu, et. al.Sung-Liang Yu ...
01 Jan 2008
Cancer Cell | VOL. 13

Classification of High‐Activity Tiagabine Analogs by Binary QSAR Modeling
Andreas Jurik ... Gerhard F Ecker
Molecular Informatics | VOL. 32
Andreas Jurik, et. al.Andreas Jurik ... Gerhard F Ecker
15 May 2013
Molecular Informatics | VOL. 32

Development and validation of prognostic nomogram for malignant pleural mesothelioma
S Y Liu ... D Han
Zhonghua zhong liu za zhi [Chinese journal of oncology] | VOL. 45
S Y Liu, et. al.S Y Liu ... D Han
23 May 2023
Zhonghua zhong liu za zhi [Chinese journal of oncology] | VOL. 45

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

VISTA: Vision Transformer enhanced by U-Net and Image Colorfulness Frame Filtration for Automatic Retail Checkout

Abstract

Talk to us

Similar Papers