Chapter 6 - Multimodal learning of social image representation

Feiran Huang,Wenxiao Liu,Zhiying Li,Weichang Huang

doi:10.1016/b978-0-32-398370-9.00013-5

Abstract

In recent years, the representation of social images has achieved significant results in multilabel classification and cross-modal retrieval. However, simply modeling content information may lead to less accurate embeddings, as social images contain both content and social relationships between images in a variety of modalities. Learning low-dimensional, dense, and sequential representations of social networked multimodal data is the goal of social image representation learning. Many practical applications can also be facilitated by these methods. Many tasks are aided by multiview representations learning, such as cross-view classification. Since social images often contain link information in addition to multimodal content, simply adopting data content may lead to suboptimal multiview representations of social images. Social images are often rich in metadata, and there are ways to turn images and their textual descriptions into links between content, which are ultimately handled as multimodal content.

Full Text