Abstract

For the study of single-modal recognition, for example, the research on speech signals, ECG signals, facial expressions, body postures and other physiological signals have made some progress. However, the diversity of human brain information sources and the uncertainty of single-modal recognition determine that the accuracy of single-modal recognition is not high. Therefore, building a multimodal recognition framework in combination with multiple modalities has become an effective means of improving performance. With the rise of multi-modal machine learning, multi-modal information fusion has become a research hotspot, and audio-visual fusion is the most widely used direction. The audio-visual fusion method has been successfully applied to various problems, such as emotion recognition and multimedia event detection, biometric and speech recognition applications. This paper firstly introduces multimodal machine learning briefly, and then summarizes the development and current situation of audio-visual fusion technology in some major areas, and finally puts forward the prospect for the future.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.