Abstract

When haptic signals are integrated with existing audio-visual dominated multimedia applications, multi-modal services that can satisfy an individual's immersive experience have emerged. In order to support multi-modal services, cross-modal communications come into being. However, facing collaborative transmission and comprehensive processing requirements of audio, visual, and haptic signals as well as their influences for user's immersive experience, the research of cross-modal communications is still in its infancy and needs to tackle many technical challenges. Due to great successes and powerful abilities of artificial intelligence (AI), it can be expected to underpin cross-modal communications. In this article, we try to deal with issues in cross-modal communications by using AI technology. Specifically, we first adopt the federated learning paradigm to solve sparse data collection and privacy protection problems in the immersive experience description of multi-modal services. Then, we resort to the reinforcement learning paradigm to construct a joint optimization framework of caching, communication, and computation, realizing collaborative transmission of audio, visual, and haptic streams. Finally, we attempt to use the transfer learning paradigm to extract, transfer, and fuse knowledge, semantics, and characteristics from different modalities, recovering corrupted signals and promoting rendering effects at the receiver. Experimental results validate the effectiveness of the AI-enabled cross-modal communications strategies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call