Abstract

Profession is an important social attribute of people. It plays a crucial role in commercial services such as personalized recommendation and targeted advertising. In practice, profession information is usually unavailable due to privacy and other reasons. In this article, we explore the task of identifying user professions according to their behaviors in social media. The task confronts the following challenges that make it non-trivial: how to incorporate heterogeneous information of user behaviors, how to effectively utilize both labeled and unlabeled data, and how to exploit community structure. To address these challenges, we present a framework called Profession Identification in Social Media. It takes advantage of both personal information and community structure of users in the following aspects: (1) We present a cascaded two-level classifier with heterogeneous personal features to measure the confidence of users belonging to different professions. (2) We present a multi-training process to take advantages of both labeled and unlabeled data to enhance classification performance. (3) We design a profession identification method synthetically considering the confidences from personal features and community structure. We collect a real-world dataset to conduct experiments, and experimental results demonstrate the significant effectiveness of our method compared with other baseline methods. By applying prediction on large-scale users, we also analyze characteristics of microblog users, finding that there are significant diversities among users of different professions in demographics, social network structures, and linguistic styles.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call