Abstract

Social media allow web users to create and share content pertaining to different subjects, exposing their activities, opinions, feelings and thoughts. In this context, online social media has attracted the interest of data scientists seeking to understand behaviours and trends, whilst collecting statistics for social sites. One potential application for these data is personality prediction, which aims to understand a user’s behaviour within social media. Traditional personality prediction relies on users’ profiles, their status updates, the messages they post, etc. Here, a personality prediction system for social media data is introduced that differs from most approaches in the literature, in that it works with groups of texts, instead of single texts, and does not take users’ profiles into account. Also, the proposed approach extracts meta-attributes from texts and does not work directly with the content of the messages. The set of possible personality traits is taken from the Big Five model and allows the problem to be characterised as a multi-label classification task. The problem is then transformed into a set of five binary classification problems and solved by means of a semi-supervised learning approach, due to the difficulty in annotating the massive amounts of data generated in social media. In our implementation, the proposed system was trained with three well-known machine-learning algorithms, namely a Naïve Bayes classifier, a Support Vector Machine, and a Multilayer Perceptron neural network. The system was applied to predict the personality of Tweets taken from three datasets available in the literature, and resulted in an approximately 83% accurate prediction, with some of the personality traits presenting better individual classification rates than others.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.