Abstract

Inferring users' occupational categories on the basis of user-generated content becomes an important issue in user profiling and applications such as personalised recommendation systems with the rapid explosion usage of online social me-dia. Although previous work has demonstrated that language features extracted from social media content can effectively predict users' occupations, work on overcoming the challenge of time-consuming, expensive expert knowledge and low prediction performance is fairly limited and mostly based on English social platforms. In this paper, we first investigate the relationship between users' language usage in their Chinese blogs and users' occupations, employing tools to extract quantitative features related to users' psychological states and social relationships. Additionally, We propose a novel content-aware hierarchical model called T-LSTM for the user occupation prediction, which is mainly divided into a word-level Transformer encoder layer overcoming the problem of neglecting mining the importance of words in users' texts and a blog-level bidirectional LSTM layer exploiting temporal information of blogs to obtain users' representations. Our experimental results on our collected real-world Chinese social media dataset shows that the proposed model greatly outperforms the baseline methods for occupation prediction and verifies the effectiveness of components as well as the robustness of the model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call