Abstract

Social media is the source of data for different purposes: advertisement, social study, human recruiting. However, usually, we are limited to readily available, structured information: age, gender, education, occupation. We have to work with unstructured data such as texts related to a user if we want to extract more complex, implicit features. We show the case of complex user analysis in social media using textual data. The task we solve is detecting parents on social networks. Our approach works with content that is not generated by a user, but with the content, the user was interested in implicitly - the user liked, or explicitly - the user subscribed to a group, where the content was published. In this paper, we compare classification methods for the task of parents detection on social media. Using mentioned above user's likes and other information it is required to estimate chances if a user has got a child or children already or not. This task is an example of positive-unlabeled learning: data from social networks and media may contain explicit signals about users' parenthood but there is no ground to make a backward conclusion. It can be considered as a case of look-a-like modelling or in other words a one-class classification problem. We propose a retrospective approach that can exploit data from social media to allow building a binary classifier. We compare both these approaches and conclude that the retrospective approach albeit requiring more efforts to be implemented may yield better results. This approach may be useful in similar tasks having look-a-like problem statement.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call