Spatiotemporal feature learning for no-reference gaming content video quality assessment

Ngai-Wing Kwong,Yui-Lam Chan,Sik-Ho Tsang,Ziyin Huang,Kin-Man Lam

doi:10.1016/j.jvcir.2024.104118

Abstract

Recently, over-the-top live gaming content video (GCV) services have significantly contributed to the overall internet traffic. Consequently, there is a growing demand of GCV quality assessment (GCVQA) to maintain service quality. Although recent literature has proposed a few GCVQA methods, these mainly focus on extracting spatial features and temporal fusion separately, limiting their performance due to the neglect of spatiotemporal feature learning, which is crucial for GCV as it typically shares spatial and temporal features across frames. To address this, we propose a novel GCVQA model, focusing on GCV spatiotemporal feature learning. First, we employ a multi-task self-supervised learning spatiotemporal pyramid convolutional neural network (STP-CNN) model to extract short-term spatiotemporal quality feature representations (STQFR) of GCVs. Our STP-CNN model specifically extracts multiscale spatiotemporal features from various temporal scales of multi-frames in pyramid mode, enabling dynamic learning of diverse spatiotemporal cues. Subsequently, we propose the differential Transformer model to process all short-term STQFR within a GCV, extracting global spatiotemporal features of GCV to assess the overall quality of GCV. To evaluate the effectiveness of our proposed method, we conducted experiments using four GCVQA datasets. The results demonstrate that our method outperforms existing approaches in predicting the perceived quality of GCV.

Full Text