Abstract

Abstract. This paper presents a method for multiple object tracking (MOT) in video streams. The method incorporates the prediction of physical locations of people into a tracking-by-detection paradigm. We predict the trajectories of people on an estimated ground plane and apply a learning-based network to extract the appearance features across frames. The method transforms the detected object locations from image space to an estimated ground space to refine the tracking trajectories. This transform space allows the objects detected from multi-view images to be associated under one coordinate system. Besides, the occluded pedestrians in image space can be well separated in a rectified ground plane where the motion models of the pedestrians are estimated. The effectiveness of this method is evaluated on different datasets by extensive comparisons with state-of-the-art techniques. Experimental results show that the proposed method improves MOT tasks in terms of the number of identity switches (IDSW) and the fragmentations (Frag).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call