Find right countenance for your input—Improving automatic emoticon recommendation system with distributed representations

Yuki Urabe,Rafal Rzepka,Kenji Araki

doi:10.1016/j.ipm.2020.102414

Abstract

Emoticons are popularly used to express user’s feelings in social media, blogs, and instant messaging. However, the number of emoticons existing in emoticon dictionaries which users select from is large, thus, it is difficult for users to find the desired emoticon that matches the content of their messages. In this paper, we propose a method that supports users’ emoticon selection by reordering 167 unique emoticons in the emoticon dictionary by applying pre-trained models learned from large data in Japanese. We evaluated whether adapting a pre-trained model to our emoticon recommendation system achieves better results than just learning surface patterns of text and emoticon. We collected sets of sentences and emoticons in Japanese from the Internet and pre-trained models (i.e. Word2vec, ELMo, and BERT) that learned from large Japanese textual data and used deep learning techniques such as BiLSTM and fine-tuning for learning. We confirmed that fine-tuning our data with BERT achieved the best recommendation accuracy of 52.98%, recommending the correct emoticon within the top 25 (top 15%) of the emoticons. Moreover, we confirmed our intuition that widely used Wikipedia-based pre-trained models are not the best voice for the facemark recommendations.

Full Text