Abstract
In a crowded and cluttered environment, identifying a particular person is a challenging problem. Current identification approaches are not able to handle the dynamic environment. In this paper, we tackle the problem of identifying and tracking a person of interest in the crowded environment using egocentric and third person view videos. We propose a novel method (Visual-GPS) to identify, track, and localize the person, who is capturing the egocentric video, using joint analysis of imagery from both videos. The output of our method is the bounding box of the target person detected in each frame of the third person view and the 3D metric trajectory. At glance, the views of the two cameras are quite different. This paper illustrates an insight into how they are correlated. Our proposed method uses several difference clues. In addition to using RGB images, we take advantage of both the body motion and action features to correlate the two views. We can track and localize the person by finding the most correlated individual in the third view. Furthermore, the target person's 3D trajectory is recovered based on the mapping of the 2d-3D body joints. Our experiment confirms the effectiveness of ETVIT network and shows 18.32 % improvement in detection accuracy against the baseline methods.
Paper version not known (Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have