Abstract

Content curation social networks (CCSNs), where users share interests by images and their text descriptions, are booming social networks. For the purpose of fully utilizing user-generated contents to analysis user interests on CCSNs, we propose a framework of learning multimodal joint representations of pins for user interest analysis. First, images are automatically annotated with category distributions, which benefit from the network characteristics and represent interests of users. Further, image representations are extracted from an intermediate layer of a fine-tuned multilabel convolutional neural network (CNN) and text representations are obtained with a trained Word2Vec. Finally, a multimodal deep Boltzmann machine (DBM) are trained to fuse two modalities. Experiments on a dataset from Huaban demonstrate that using category distributions instead of single categories as labels to fine-tune CNN significantly improve the performance of image representation, and multimodal joint representations perform better than either of unimodal representations.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.