Multiple instance deep learning for weakly-supervised visual object tracking

Kaining Huang,Yan Shi,Fuqi Zhao,Zijun Zhang,Shanshan Tu

doi:10.1016/j.image.2020.115807

Abstract

Intelligently tracking objects with varied shapes, color, lighting conditions, and backgrounds is an extremely useful application in many HCI applications, such as human body motion capture, hand gesture recognition, and virtual reality (VR) games. However, accurately tracking different objects under uncontrolled environments is a tough challenge due to the possibly dynamic object parts, varied lighting conditions, and sophisticated backgrounds. In this work, we propose a novel semantically-aware object tracking framework, wherein the key is weakly-supervised learning paradigm that optimally transfers the video-level semantic tags into various regions. More specifically, give a set of training video clips, each of which is associated with multiple video-level semantic tags, we first propose a weakly-supervised learning algorithm to transfer the semantic tags into various video regions. The key is a MIL (Zhong et al., 2020) [1]-based manifold embedding algorithm that maps the entire video regions into a semantic space, wherein the video-level semantic tags are well encoded. Afterward, for each video region, we use the semantic feature combined with the appearance feature as its representation. We designed a multi-view learning algorithm to optimally fuse the above two types of features. Based on the fused feature, we learn a probabilistic Gaussian mixture model to predict the target probability of each candidate window, where the window with the maximal probability is output as the tracking result. Comprehensive comparative results on a challenging pedestrian tracking task as well as the human hand gesture recognition have demonstrated the effectiveness of our method. Moreover, visualized tracking results have shown that non-rigid objects with moderate occlusions can be well localized by our method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multiple instance deep learning for weakly-supervised visual object tracking

Abstract

Talk to us

Similar Papers

More From: Signal Processing: Image Communication

Lead the way for us

Journal: Signal Processing: Image Communication	Publication Date: Feb 8, 2020
Citations: 8

Similar Papers

A Novel Dynamic Hand Gesture and Movement Trajectory Recognition model for Non-Touch HRI Interface
Raihan Kabir ... Niloy Roy
-
Raihan Kabir, et. al.Raihan Kabir ... Niloy Roy
01 Oct 2019
01 Oct 2019

A Hybrid Image Augmentation Technique for User- and Environment-Independent Hand Gesture Recognition Based on Deep Learning
Baiti-Ahmad Awaluddin ... Juing-Shian Chiou
Mathematics | VOL. 12
Baiti-Ahmad Awaluddin, et. al.Baiti-Ahmad Awaluddin ... Juing-Shian Chiou
02 May 2024
Mathematics | VOL. 12

Reduction of gesture feature dimension for improving the hand gesture recognition performance of numerical sign language
Rasel Ahmed Bhuiyan ... Abdul Kawsar Tushar
-
Rasel Ahmed Bhuiyan, et. al.Rasel Ahmed Bhuiyan ... Abdul Kawsar Tushar
01 Dec 2017
01 Dec 2017

Human Hand Gesture Recognition Using Motion Orientation Histogram for Interaction of Handicapped Persons with Computer
Maryam Vafadar ... Alireza Behrad
-
Maryam Vafadar, et. al.Maryam Vafadar ... Alireza Behrad
01 Jan 2008
01 Jan 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multiple instance deep learning for weakly-supervised visual object tracking

Abstract

Talk to us

Similar Papers

More From: Signal Processing: Image Communication