Abstract

Most modern approaches for video-based multiple people tracking rely on human appearance to exploit similarities between person detections. Consequently, tracking accuracy degrades if this kind of information is not discriminative or if people change apparel. In contrast, we present a method to fuse video information with additional motion signals from body-worn inertial measurement units (IMUs). In particular, we propose a neural network to relate person detections with IMU orientations, and formulate a graph labeling problem to obtain a tracking solution that is globally consistent with the video and inertial recordings. The fusion of visual and inertial cues provides several advantages. The association of detection boxes in the video and IMU devices is based on motion, which is independent of a person's outward appearance. Furthermore, inertial sensors provide motion information irrespective of visual occlusions. Hence, once detections in the video are associated with an IMU device, intermediate positions can be reconstructed from corresponding inertial sensor data, which would be unstable using video only. Since no dataset exists for this new setting, we release a dataset of challenging tracking sequences, containing video and IMU recordings together with ground-truth annotations. We evaluate our approach on our new dataset, achieving an average IDF1 score of 91.2%. The proposed method is applicable to any situation that allows one to equip people with inertial sensors.

Highlights

  • M ULTIPLE people tracking (MPT) in image sequences has been an active field of research for decades

  • This work introduces a novel extension to the common multiple people tracking problem; combining video information with measurements from body-worn inertial measurement units (IMUs) for the purpose of multiple people tracking, which we call Video Inertial Multiple People Tracking (VIMPT)

  • An interesting characteristic of VIMPT is that video-based trajectories of objects equipped with an IMU have to be assigned to the respective IMU devices

Read more

Summary

Introduction

M ULTIPLE people tracking (MPT) in image sequences has been an active field of research for decades. A crucial part of this strategy is to derive a measure of whether two detections belong to the same person or not This involves a motion or appearance model. Most motion models assume low and constant velocities, which holds for pedestrians only within a short temporal window [12] Another complementary strategy is to model relations between detections based on the appearance information. A major advantage of utilizing appearance information over motion models is that they allow to relate detections that are temporally far apart. This facilitates re-identification of people even after long-term occlusions or if they temporarily fall out of the camera view

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.