A Facial Feature and Lip Movement Enhanced Audio-Visual Speech Separation Model.

Guizhu Li,Mengnan Sun,Xuefeng Liu,Bing Zheng,Min Fu

doi:10.3390/s23218770

A Facial Feature and Lip Movement Enhanced Audio-Visual Speech Separation Model.

Guizhu Li, Mengnan Sun + Show 3 more

Open Access

https://doi.org/10.3390/s23218770

Copy DOI

Journal: Sensors (Basel, Switzerland)	Publication Date: Oct 27, 2023
License type: CC BY 4.0

Affiliation: Ocean University of China, Qingdao University of Science and Technology

#Cocktail Party Problem #Facial Feature + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

The cocktail party problem can be more effectively addressed by leveraging the speaker's visual and audio information. This paper proposes a method to improve the audio's separation using two visual cues: facial features and lip movement. Firstly, residual connections are introduced in the audio separation module to extract detailed features. Secondly, considering the video stream contains information other than the face, which has a minimal correlation with the audio, an attention mechanism is employed in the face module to focus on crucial information. Then, the loss function considers the audio-visual similarity to take advantage of the relationship between audio and visual completely. Experimental results on the public VoxCeleb2 dataset show that the proposed model significantly enhanced SDR, PSEQ, and STOI, especially 4 dB improvements in SDR.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: Sensors (Basel, Switzerland)

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.