Abstract
Content curation social networks (CCSNs), such as Pinterest and Huaban, are interest driven and content centric. On CCSNs, user interests are represented by a set of boards, and a board is composed of various pins. A pin is an image with a description. All entities, such as users, boards, and categories, can be represented as a set of pins. Therefore, it is possible to implement entity representation and the corresponding recommendations on a uniform representation space from pins. Furthermore, lots of pins are re-pinned from others and the pin’s re-pin sequences are recorded on CCSNs. In this paper, a framework which can learn the multimodal joint representation of pins, including text representation, image representation, and multimodal fusion, is proposed. Image representations are extracted from a multilabel convolutional neural network. The multiple labels of pins are automatically obtained by the category distributions in the re-pin sequences, which benefits from the network architecture. Text representations are obtained with the word2vec tool. Two modalities are fused with a multimodal deep Boltzmann machine. On the basis of the pin representation, different recommendation tasks are implemented, including recommending pins or boards to users, recommending thumbnails to boards, and recommending categories to boards. Experimental results on a dataset from Huaban demonstrate that the multimodal joint representation of pins contains the information of user interests. Furthermore, the proposed multimodal joint representation outperformed unimodal representation in different recommendation tasks. Experiments were also performed to validate the effectiveness of the proposed recommendation methods.
Highlights
Content curation social networks (CCSNs) are booming social networks where users demonstrate, collect, and organize their multimedia contents
We crawled data used in experiments from Huaban, a typical Chinese CCSN
We propose a framework for multimodal joint representation learning of pins on CCSNs
Summary
Content curation social networks (CCSNs) are booming social networks where users demonstrate, collect, and organize their multimedia contents. The problem can be broken down into two questions: how to represent a given pin effectively; and how to implement the different tasks with the obtained representation Ci next to it is the category given by the corresponding user. On the basis of the characteristics of CCSNs, an easy-to-accomplish annotation method is proposed to automatically label the images by the category distributions on the re-pin tree of the corresponding pins. Network was fine-tuned, which significantly enhances the capability of image representation; We designed a framework which combines deep features of images and texts into a joint representation to maintain both consistent information and specific characteristic of different modalities On this basis, a uniform recommendation scheme was designed for different tasks on CCSNs; The experimental results demonstrate that the proposed multimodal representation is more effective than representations learned from unimodal information. The proposed method performs better than existing multimodal representation learning methods on multiple recommendation tasks
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.