Abstract

In computer science, contextual information can be used both to reduce computations and to increase accuracy. This paper discusses how it can be exploited for people surveillance in very cluttered environments in terms of perspective (i.e., weak scene calibration) and appearance of the objects of interest (i.e., relevance feedback on the training of a classifier). These techniques are applied to a pedestrian detector that uses a LogitBoost classifier, appropriately modified to work with covariance descriptors which lie on Riemannian manifolds. On each detected pedestrian, a similar classifier is employed to obtain a precise localization of the head. Two novelties on the algorithms are proposed in this case: polar image transformations to better exploit the circular feature of the head appearance and multispectral image derivatives that catch not only luminance but also chrominance variations. The complete approach has been tested on the surveillance of a construction site to detect workers that do not wear the hard hat: in such scenarios, the complexity and dynamics are very high, making pedestrian detection a real challenge.

Highlights

  • The research in computer vision and pattern recognition is often challenged by two conflicting goals, that is, strong requirements on accuracy of results, that should ideally reproduce or even improve the outcome of the human vision system, and tight constraints in the response time

  • As the human visual perception is correlated to its context belief or knowledge, computer vision models gain from considering contextual information

  • We propose to exploit contextual information in two manners: (i) a relevance feedback (RF) strategy which enriches the pedestrian detection phase by replacing the final stages of a cascade of classifiers, with new stages trained on positive and negative samples that are automatically extracted from a specific context only; (ii) a weak scene calibration which roughly estimates the scene perspective in order to discard out-of-scale detections

Read more

Summary

Introduction

The research in computer vision and pattern recognition is often challenged by two conflicting goals, that is, strong requirements on accuracy of results, that should ideally reproduce or even improve the outcome of the human vision system, and tight constraints in the response time For this reason, it is always helpful to exploit contextual information provided as prior or additional knowledge (learned either before or at run time). As the human visual perception is correlated to its context belief or knowledge (e.g., a moving object in a soccer field is unlikely to be associated to a bike), computer vision models gain from considering contextual information With these premises, this paper discusses how to include and model the contextual information in a generic framework for people surveillance, applied in large and complex open areas, like construction working sites.

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call