Abstract

In this paper, we propose a semantic segmentation-based static video stitching method to reduce parallax and misalignment distortion for sports stadium scenes with dynamic foreground objects. First, video frame pairs for stitching are divided into segments of different classes through semantic segmentation. Region-based stitching is performed on matched segment pairs, assuming that segments of the same semantic class are on the same plane. Second, to prevent degradation of the stitching quality of plain or noisy videos, the homography for each matched segment pair is estimated using the temporally consistent feature points. Finally, the stitched video frame is synthesized by stacking the stitched matched segment pairs and the foreground segments to the reference frame plane by descending order of the area. The performance of the proposed method is evaluated by comparing the subjective quality, geometric distortion, and pixel distortion of video sequences stitched using the proposed and conventional methods. The proposed method is shown to reduce parallax and misalignment distortion in segments with plain texture or large parallax, and significantly improve geometric distortion and pixel distortion compared to conventional methods.

Highlights

  • With the development of information and communications technology (ICT) such as 5G and artificial intelligence and changes to the content creation environment, there is a growing demand for immersive media [1,2,3,4], which refers to a medium that conveys information of all types of senses in the scene to maximize immersion and presence for user satisfaction

  • For the training of the semantic segmentation tool, DeepLab, the training image dataset in Figure 5 was constructed; the dataset consists of 380 images captured at different locations with different viewing angles in a sports stadium located on a university campus

  • We proposed a semantic segmentation-based video stitching method to reduce parallax and misalignment distortion between cameras

Read more

Summary

Introduction

With the development of information and communications technology (ICT) such as 5G and artificial intelligence and changes to the content creation environment, there is a growing demand for immersive media [1,2,3,4], which refers to a medium that conveys information of all types of senses in the scene to maximize immersion and presence for user satisfaction. Only the rotational transform component is present in the extrinsic parameters of the cameras; the translational component is absent or small enough to be ignored In such a setting, parallax distortion in a wide-angle panorama or a 360-degree stitched image rarely occurs. The ground plane consisting of a grass field and a running track may not provide a sufficient number of feature points for homography estimation This may cause misalignment distortion in the stitched ground region. The proposed method can reduce the quality degradation of stitched videos by reducing the parallax distortion around foreground objects drawing great visual attention.

Related Works
Seam-Based Stitching
Multiple Homography-Based Stitching
Semantic Segmentation-Based Static Video Stitching
Semantic Segmentation and Matching
Homography Estimation for Matched Segment Pairs
Panoramic Video Frame Synthesis Based on Segment-Based Stitching
Experimental Environments
Region-Based Stitching Results
Evaluation of Subjective Quality of Stitched Videos
Objective Quality Evaluation of Stitched Videos
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call