Abstract

Content curation social networks (CCSNs), such as Pinterest and Huaban, are interest driven and content centric. On CCSNs, user interests are represented by a set of boards, and a board is composed of various pins. A pin is an image with a description. All entities, such as users, boards, and categories, can be represented as a set of pins. Therefore, it is possible to implement entity representation and the corresponding recommendations on a uniform representation space from pins. Furthermore, lots of pins are re-pinned from others and the pin’s re-pin sequences are recorded on CCSNs. In this paper, a framework which can learn the multimodal joint representation of pins, including text representation, image representation, and multimodal fusion, is proposed. Image representations are extracted from a multilabel convolutional neural network. The multiple labels of pins are automatically obtained by the category distributions in the re-pin sequences, which benefits from the network architecture. Text representations are obtained with the word2vec tool. Two modalities are fused with a multimodal deep Boltzmann machine. On the basis of the pin representation, different recommendation tasks are implemented, including recommending pins or boards to users, recommending thumbnails to boards, and recommending categories to boards. Experimental results on a dataset from Huaban demonstrate that the multimodal joint representation of pins contains the information of user interests. Furthermore, the proposed multimodal joint representation outperformed unimodal representation in different recommendation tasks. Experiments were also performed to validate the effectiveness of the proposed recommendation methods.

Highlights

  • Content curation social networks (CCSNs) are booming social networks where users demonstrate, collect, and organize their multimedia contents

  • We crawled data used in experiments from Huaban, a typical Chinese CCSN

  • We propose a framework for multimodal joint representation learning of pins on CCSNs

Read more

Summary

Introduction

Content curation social networks (CCSNs) are booming social networks where users demonstrate, collect, and organize their multimedia contents. The problem can be broken down into two questions: how to represent a given pin effectively; and how to implement the different tasks with the obtained representation Ci next to it is the category given by the corresponding user. On the basis of the characteristics of CCSNs, an easy-to-accomplish annotation method is proposed to automatically label the images by the category distributions on the re-pin tree of the corresponding pins. Network was fine-tuned, which significantly enhances the capability of image representation; We designed a framework which combines deep features of images and texts into a joint representation to maintain both consistent information and specific characteristic of different modalities On this basis, a uniform recommendation scheme was designed for different tasks on CCSNs; The experimental results demonstrate that the proposed multimodal representation is more effective than representations learned from unimodal information. The proposed method performs better than existing multimodal representation learning methods on multiple recommendation tasks

Related Work
Multimodal Joint Representation of Pins
Image Representation
Text Representation
Multimodal Fusion
Implementation of Recommendations for Different Tasks
Pin Recommendation
Board Thumbnail Recommendation
Board Category Recommendation
Board and User Recommendation
Experiments and Results
Datasets and Implementation Details
Analysis of Interests Represented by Pins
Board Recommendation
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call