Abstract

Head detection in real-world videos is a classical research problem in computer vision. Head detection in videos is challenging than in a single image due to many nuisances that are commonly observed in natural videos, including arbitrary poses, appearances, and scales. Generally, head detection is treated as a particular case of object detection in a single image. However, the performance of object detectors deteriorates in unconstrained videos. In this paper, we propose a temporal consistency model (TCM) to enhance the performance of a generic object detector by integrating spatial-temporal information that exists among subsequent frames of a particular video. Generally, our model takes detection from a generic detector as input and improves mean average precision (mAP) by recovering missed detection and suppressing false positives. We compare and evaluate the proposed framework on four challenging datasets, i.e., HollywoodHeads, Casablanca, BOSS, and PAMELA. Experimental evaluation shows that the performance is improved by employing the proposed TCM model. We demonstrate both qualitatively and quantitatively that our proposed framework obtains significant improvements over other methods.

Highlights

  • Pedestrian detection is gaining much attention from the research community

  • We show that generic detectors compared to the specific detector can perform well by integrating the proposed temporal consistency model (TCM) model

  • We propose a novel model to recover missed detection and suppress false positives by leveraging temporal consistency that exists among subsequent frames of videos

Read more

Summary

Introduction

Pedestrian detection is gaining much attention from the research community. Pedestrian detection has numerous applications in the surveillance domain, such as tracking [1, 2], anomaly detection [3, 4], congestion detection [5, 6], and behavior analysis [7, 8]. Most of the existing methods rely on face and pedestrian detection for tracking, counting, and behavior analysis. While pedestrian and face detection algorithms have gained much popularity, the task of detecting people in complex scenes is still a challenging task. Pedestrian detection relies on detecting the whole pedestrian, which is not possible due to a number of problems in an unconstrained video environment. With these limitations, face and pedestrian detection methods cannot be employed in complex scenes. To detect pedestrians in complex scenes, the head is the only visible and reliable clue

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call