Abstract
In this paper, we propose a semantic segmentation-based static video stitching method to reduce parallax and misalignment distortion for sports stadium scenes with dynamic foreground objects. First, video frame pairs for stitching are divided into segments of different classes through semantic segmentation. Region-based stitching is performed on matched segment pairs, assuming that segments of the same semantic class are on the same plane. Second, to prevent degradation of the stitching quality of plain or noisy videos, the homography for each matched segment pair is estimated using the temporally consistent feature points. Finally, the stitched video frame is synthesized by stacking the stitched matched segment pairs and the foreground segments to the reference frame plane by descending order of the area. The performance of the proposed method is evaluated by comparing the subjective quality, geometric distortion, and pixel distortion of video sequences stitched using the proposed and conventional methods. The proposed method is shown to reduce parallax and misalignment distortion in segments with plain texture or large parallax, and significantly improve geometric distortion and pixel distortion compared to conventional methods.
Highlights
With the development of information and communications technology (ICT) such as 5G and artificial intelligence and changes to the content creation environment, there is a growing demand for immersive media [1,2,3,4], which refers to a medium that conveys information of all types of senses in the scene to maximize immersion and presence for user satisfaction
For the training of the semantic segmentation tool, DeepLab, the training image dataset in Figure 5 was constructed; the dataset consists of 380 images captured at different locations with different viewing angles in a sports stadium located on a university campus
We proposed a semantic segmentation-based video stitching method to reduce parallax and misalignment distortion between cameras
Summary
With the development of information and communications technology (ICT) such as 5G and artificial intelligence and changes to the content creation environment, there is a growing demand for immersive media [1,2,3,4], which refers to a medium that conveys information of all types of senses in the scene to maximize immersion and presence for user satisfaction. Only the rotational transform component is present in the extrinsic parameters of the cameras; the translational component is absent or small enough to be ignored In such a setting, parallax distortion in a wide-angle panorama or a 360-degree stitched image rarely occurs. The ground plane consisting of a grass field and a running track may not provide a sufficient number of feature points for homography estimation This may cause misalignment distortion in the stitched ground region. The proposed method can reduce the quality degradation of stitched videos by reducing the parallax distortion around foreground objects drawing great visual attention.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have