Abstract

Nowadays, video surveillance scenarios usually rely on manually annotated focus areas to constrain automatic video analysis tasks. Although manual annotation simplifies several stages of the analysis, its use hinders the scalability of the developed solutions and might induce operational problems in scenarios recorded with multiple moving cameras (MMCs). To tackle these problems, an automatic method for the cooperative extraction of areas of interest (AoIs) is proposed. Each captured frame is segmented into regions with semantic roles using a state-of-the-art method. Semantic evidences from different junctures, cameras, and points-of-view are, then, spatio-temporally aligned on a common ground plane. Experimental results on widely used datasets recorded with multiple but static cameras suggest that this process provides broader and more accurate AoIs than those manually defined in the datasets. Moreover, the proposed method naturally determines the projection of obstacles and functional objects in the scene, paving the road towards systems focused on the automatic analysis of human behavior. To our knowledge, this is the first study dealing with this problem, as evidenced by the lack of publicly available MMC benchmarks. To also cope with this issue, we provide a new MMC dataset with associated semantic scene annotations.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.