Self-supervised deep partial adversarial network for micro-video multimodal classification

Yun Li,Shuyi Liu,Xuejun Wang,Peiguang Jing

doi:10.1016/j.ins.2022.11.111

Yun Li, Shuyi Liu + Show 2 more

https://doi.org/10.1016/j.ins.2022.11.111

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Micro-videos have gained popularity on various social media platforms because they provide a great medium for real-time storytelling. Although micro-videos can be naturally characterized by several modalities, for situations with uncertain missing modalities, a flexible multimodal representation learning framework that integrates complementary and consistent information has been difficult to develop. To better deal with the issue regarding incomplete modalities in multimodal micro-video classification, in this paper, we propose a self-supervised deep multimodal adversarial network (SDMAN) to learn comprehensive and robust micro-video representations. Specifically, we first consider a parallel multi-head attention (MHA) encoding module that simultaneously learns the representations of complete and incomplete modality groupings. We then present a multimodal self-supervised cycle generative adversarial network module, in which multiple generative adversarial networks are explored to transfer the information obtained from the complete modality grouping to the incomplete modality groupings. As a result, complementarity and consistency are mutually promoted among the modalities. Furthermore, experiments conducted on a large-scale micro-video dataset demonstrate that the SDMAN performs better than the state-of-the-art methods.

Full Text