Abstract

We propose a novel binauralization method that is robust to camera rotation. Since binaural audio can bring a 3D sensation to the listener, it can enhance the immersive experience of the video. Researchers have been explored binaural audio generation from monaural audio to deepen the experience of already captured videos without special recording devices. However, this binauralization on real-world videos can be a challenging task due to camera rotation. Camera rotation makes it difficult to predict the exact sound source position due to the motion of the sound sources and the background in videos. To tackle this problem, we propose a training data generation pipeline using 360° videos for binauralization. We generate monocular videos and binaural audio with camera rotation from 360° videos for the training of binauralization. Additionally, we newly construct a binauralization framework that conducts multi-task learning with camera localization. The camera localization predicts the camera rotation and helps the binauralization. Experimental results show that our method can achieve the binauralization on videos with camera rotation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call