Abstract

We present a novel concept audio–visual object removal in 360-degree videos, in which a target object in a 360-degree video is removed in both the visual and auditory domains synchronously. Previous methods have solely focused on the visual aspect of object removal using video inpainting techniques, resulting in videos with unreasonable remaining sounds corresponding to the removed objects. We propose a solution which incorporates direction acquired during the video inpainting process into the audio removal process. More specifically, our method identifies the sound corresponding to the visually tracked target object and then synthesizes a three-dimensional sound field by subtracting the identified sound from the input 360-degree video. We conducted a user study showing that our multi-modal object removal supporting both visual and auditory domains could significantly improve the virtual reality experience, and our method could generate sufficiently synchronous, natural and satisfactory 360-degree videos.

Highlights

  • 360-degree videos, or spherical panoramic videos, have become popular among end-users thanks to consumer-level 360-degree cameras [9,19] as well as video-sharing platforms that support 360-degree videos [4,7]

  • We propose a novel concept of audio–visual object removal in 360-degree videos, in which a user-specified object in the target 360-degree video is removed in both the visual and auditory domains synchronously

  • – We propose a novel concept of audio–visual object removal in 360-degree videos and a method to implement this concept

Read more

Summary

Introduction

360-degree videos, or spherical panoramic videos, have become popular among end-users thanks to consumer-level 360-degree cameras [9,19] as well as video-sharing platforms that support 360-degree videos [4,7]. We propose a novel concept of audio–visual object removal in 360-degree videos, in which a user-specified object in the target 360-degree video is removed in both the visual and auditory domains synchronously (see Fig. 1). The key idea is to effectively incorporate information acquired from the video inpainting process into the audio removal process This multi-modal approach can reduce mismatches between visual and auditory domains, and we expect that. Our method combines the processed visual and auditory information to produce a resulting 360-degree video, where a synchronized audio–visual object removal is achieved. To validate this concept, we captured multiple test scenes in varied conditions and conducted a user study using these test scenes and our implementation. It indicated that the multi-modal approach could offer better experiences than single-modal (i.e., visual- and audioonly) ones

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call