Abstract

With the dramatic growth of various short video platforms, users are more likely to share their social stream online and make their social connections stronger. To better understand their preferences, personality analysis has attracted more attention. Unlike single modal data such as text or images, which is hard to comprehensively uncover one's personal traits, personality analysis on short video is verified to be much more accurate but also more challenging because of the huge gap between incompatible data modalities. We have noticed that the key problem is how to disentangle the complexity from multimodal data to find their consistency and uniqueness. In this article, we propose a novel video analysis framework for personality detection with visual, acoustic, and textual neural networks. Specifically, to enhance our model's sensitivity to personality detection, we first propose three deep learning channels to learn modal features. The framework can not only extract each modal feature but also learn time-varying pattern via a temporal alignment network. To identify the consistency and uniqueness across multiple modalities, we creatively propose to maximize the similarity of common information learned by a shared neural network across multiple modalities and extend the distance of exclusive information learned by private networks of different modalities. Extensive experiments on the real-world dataset demonstrate that our model can outperform existing baselines.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call