Abstract

Visual tracking and attribute estimation related to age or gender information of multiple person entities in a scene are mature research topics with the advent of deep learning techniques. However, when it comes to indoor images such as video sequences of retail consumers, data are not always adequate or accurate enough to essentially train effective models for consumer detection and tracking under various adverse factors. This in turn affects the quality of recognizing age or gender for those detected instances. In this work, we introduce two novel datasets: Consumers comprises 145 video sequences compliant to personal information regulations as far as facial images are concerned and BID is a set of cropped body images from each sequence that can be used for numerous computer vision tasks. We also propose an end-to-end framework which comprises CNNs as object detectors, LSTMs for motion forecasting of the tracklet association component in a sequence, along with a multi-attribute classification model for apparent demographic estimation of the detected outputs, aiming to capture useful metadata of consumer product preferences. Obtained results on tracking and age/gender prediction are promising with respect to reference systems while they indicate the proposed model’s potential for practical consumer metadata extraction.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.