Saliency Prediction for 360-degree Video

Chuong H Vo,Tuan V Pham,Thu T.A Nguyen,Duy H Le,Jui-Chiu Chiang

doi:10.1109/gtsd50082.2020.9303135

Abstract

Saliency detection simulates human's perception in locating crucial regions, enabling further processing for many practical applications. Even though saliency prediction for conventional 2D images and videos have been well developed, prediction on 360° contents is still challenging. For each pixel in the equirectangular frame, there will be corresponding surrounding pixels according to their spherical coordinate. Therefore, the conventional convolution method may induce certain inaccuracy in attempt to simulate humans perceive the surrounding environment. This paper proposes a novel spherical convolutional network concentrating on 360° video saliency prediction in which the kernel is defined as a spherical cap. In the process of convolution, instead of using neighboring pixels with regular relationship in the equirectangular projection coordinate, the convolutional patches will be changed to preserve the spherical perspective of the spherical signal. Our model is trained and tested on the dataset including 104 360° videos that comprise dynamic sporty content. The proposed spherical convolutional network is evaluated by Pearson correlation coefficient (CC) and Kullback-Leibler divergence (KLD). Our experiments show the efficiency of our pro-posed spherical convolution method's application in 360° video saliency detection utilizing spherical U-net model. Further analysis on the proposed system have been presented in this study.

Full Text