Abstract

Determining the number of pedestrians from video surveillance has become a very important task in recent years. Available techniques in support of this task include regression-based approaches, which have shown a satisfactory performance in estimating this number from a crowd of pedestrians. However, most of these approaches suffer from treating a surveillance video as a sequence of separate frames, resulting in some temporal information being lost. To address this issue, this paper proposes a semisupervised methodology to extract temporal consistency in a continuous sequence of unlabeled frames. In addition to the temporal consistency, this paper also employs spatial consistency in the sum of pedestrians in subgroups, or subblobs, to determine the total number of pedestrians, or the ground truth. This is effectively achieved by incorporating regularization terms in the objective function to account for temporal and spatial consistencies. The experimental results show that the proposed technique, based on temporal and spatial consistencies, is more robust and can be trained with relatively few labeled frames (e.g., ten frames).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call