Abstract

Data in the timeline of social media users consists of data in the form of text, images, audio, and video. Large and unstructured data in social media can be processed using various techniques such as text processing or image processing. In this study, the processed text data is used to classify Twitter users’ personality based on the DISC framework. Out of the initial collected 292 users, we semi-automatically filtered them for only personal accounts with Indonesian language posts. For being able to observe and assess a user’s personality out of their tweets choice of words, we made relevant keyword vocabularies corresponding to DISC framework and theory. There are four experiment scenarios done in this study, with variations on whether the keywords and text data are stemmed or not, and the keywords frequency calculation being weighted or not. Weighting the keywords using the current number in calculation based on their level does not show positive results, neither does stemming as the best results are shown by the not stemmed and not weighted scenario. This study is a preliminary research for an automatic profiling system which employs a combination of Natural Language Processing and Machine Learning approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call