In this paper, the problem of automated scene understanding by tracking and predicting paths for multiple humans is tackled, with a new methodology using data from a single, fixed camera monitoring the environment. Our main idea is to build goal-oriented prior motion models that could drive both the tracking and path prediction algorithms, based on a coarse-to-fine modeling of the target goal. To implement this idea, we use a dataset of training video sequences with associated ground-truth trajectories and from which we extract hierarchically a set of key locations. These key locations may correspond to exit/entrance zones in the observed scene, or to crossroads where trajectories have often abrupt changes of direction. A simple heuristic allows us to make piecewise associations of the ground-truth trajectories to the key locations, and we use these data to learn one statistical motion model per key location, based on the variations of the trajectories in the training data and on a regularizing prior over the models spatial variations. We illustrate how to use these motion priors within an interacting multiple model scheme for target tracking and path prediction, and we finally evaluate this methodology with experiments on common datasets for tracking algorithms comparison.
Read full abstract