Abstract

The human brain perceives its surroundings through multiple sensory organs and integrates these multi-sensory perceptions to generate a comprehensive understanding. Inspired by synaesthesia, multi-modal cognitive computing endows machines with multi-sensory capabilities and has become the key to general artificial intelligence. With the explosion of multi-modal data such as image, video, text, and audio, a large number of methods have been developed to address this topic. However, the theoretical basis of multi-modal cognitive computing is still unclear. From the perspective of information theory, this paper establishes an information transmission model to profile the cognitive process. Based on the theory of information capacity, this study finds out that multi-modal cognitive computing helps machines extract more information. In this way, multi-modal cognitive computing research is unified by the same theoretical basis. Then, the development of typical tasks is reviewed and discussed, including multi-modal correlation, cross-modal generation, and multi-modal collaboration. Finally, focusing on the opportunities and challenges faced by multi-modal cognitive computing, some potential directions are discussed in depth, and several open-ended questions are considered.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.