Automatic Semantic Parsing of the Ground Plane in Scenarios Recorded With Multiple Moving Cameras

Alejandro Lopez-Cifuentes,Jesus Bescos,Marcos Escudero-Vinolo

doi:10.1109/lsp.2018.2865833

Abstract

Nowadays, video surveillance scenarios usually rely on manually annotated focus areas to constrain automatic video analysis tasks. Although manual annotation simplifies several stages of the analysis, its use hinders the scalability of the developed solutions and might induce operational problems in scenarios recorded with multiple moving cameras (MMCs). To tackle these problems, an automatic method for the cooperative extraction of areas of interest (AoIs) is proposed. Each captured frame is segmented into regions with semantic roles using a state-of-the-art method. Semantic evidences from different junctures, cameras, and points-of-view are, then, spatio-temporally aligned on a common ground plane. Experimental results on widely used datasets recorded with multiple but static cameras suggest that this process provides broader and more accurate AoIs than those manually defined in the datasets. Moreover, the proposed method naturally determines the projection of obstacles and functional objects in the scene, paving the road towards systems focused on the automatic analysis of human behavior. To our knowledge, this is the first study dealing with this problem, as evidenced by the lack of publicly available MMC benchmarks. To also cope with this issue, we provide a new MMC dataset with associated semantic scene annotations.

Full Text