Abstract
Deep learning has revolutionized the field of computer vision and image processing. Its ability to extract the compact image representation has taken the person re-identification (re-id) problem to a new level. However, in most cases, researchers are focused on developing new approaches to extract more fruitful image representation and use it in the re-id task. The extra information about images is rarely taken into account because the traditional person re-id datasets usually do not have it. Nevertheless, the research in multimodal machine learning has demonstrated that the utilization of the information from different sources leads to better performance. In this work, we demonstrate how a person re-id problem can benefit from the utilization of multimodal data. We have used the UAV drone to collect and label the new person re-id dataset, which is composed of pedestrian images and its attributes. We have manually annotated this dataset with attributes, and in contrast to the recent research, we do not use the deep network to classify them. Instead, we employ the continuous bag-of-words model to extract the word embeddings from text descriptions and fuse it with features extracted from images. Then the deep neural decision forest is used for pedestrians classification. The extensive experiments on the collected dataset demonstrate the effectiveness of the proposed model.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: ACM Transactions on Multimedia Computing, Communications, and Applications
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.