Abstract
Considering both audio and visual modalities is helpful for understanding a video. In the face of harsh environmental interference or signal packet loss, automatically compensating for audio and vision is a challenging task. We propose a dynamic cross-modal visual-audio mutual generation model (VAMG), which includes audio to visual conversion, visual to audio conversion, audio self-generation, and visual self-generation. VAMG jointly optimizes modal reconstruction and adversarial constraints, effectively solving the problems of structural alignment and signal compensation in incomplete videos. We conducted an instrument-oriented and pose-oriented cross-modal audio-visual mutual generation experiment on the sub-University of Rochester Musical Performance dataset to verify the effectiveness of the model.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE transactions on neural networks and learning systems
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.