Abstract

Recommending hashtags for micro-videos is a challenging task due to the following two reasons: 1) micro-video is a unity of multi-modalities, including the visual, acoustic, and textual modalities. Therefore, how to effectively extract features from multi-modalities and utilize them to express the micro-video is of great significance; 2) micro-videos usually include moods and feelings, which may provide crucial cues for recommending proper hashtags. However, most of the existing works have not considered the sentiment of media data for hashtag recommendation. In this paper, the senTiment enhanced multi-mOdal Attentive haShtag recommendaTion (TOAST) model is proposed for micro-video hashtag recommendation. Different from previous hashtag recommendation models, which merely consider content features, sentiment features of modalities are further incorporated in TOAST to improve the recommendation performance of the sentiment hashtags (e.g., #funny, #sad). Specifically, the multi-modal content features and the multi-modal sentiment features are modeled by a content common space learning branch based on self-attention and a sentiment common space learning branch, respectively. Furthermore, the varying importance of the multi-modal sentiment and content features are dynamically captured via an attention neural network according to their consistency with the hashtag semantic embedding by an attention neural network. Extensive experiments on a real-world dataset have demonstrated the effectiveness of the proposed method compared with the baseline methods. Meanwhile, the findings from the experiments may provide new insight for future developments of micro-video hashtag recommendation.

Highlights

  • Nowadays, watching micro-videos for leisure and entertainment has gained tremendous user enthusiasm

  • RQ1 Can our proposed TOAST approach outperform other state-of-the-art competitors? Do the proposed selfattention mechanism and hashtag embedding as well as the multi-modal sentiment and content feature importance capture contribute to our model performance?

  • The number (i.e., λ ) of randomly sampled negative hashtags for each micro-video and the dimension of sentiment common space and content common space are all hyper parameters in our work

Read more

Summary

Introduction

Nowadays, watching micro-videos for leisure and entertainment has gained tremendous user enthusiasm. Micro-video platforms and apps, such as Vine, Snapchat, Kuaishou, and Douyin, etc., have received unprecedented growth in recent years. Hashtags, emerging as a label mechanism on these social platforms, are words or unspaced phrases prefixed with character ‘‘#’’ By virtue of their characteristics of emphasizing the topics and the crucial information within posts, hashtags have provided a highly feasible paradigm to deal with this problem. Some methods [1]–[7] have been proposed to recommend hashtags for texts, images or microblogs, they are not feasible for micro-videos. Because these models are tailored for their domains and the structure of micro-video is different from text and image

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call