Abstract

Purpose Gathering knowledge regarding personality traits has long been the interest of academics and researchers in the fields of psychology and in computer science. Analyzing profile data from personal social media accounts reduces data collection time, as this method does not require users to fill any questionnaires. A pure natural language processing (NLP) approach can give decent results, and its reliability can be improved by combining it with machine learning (as shown by previous studies). Design/methodology/approach In this, cleaning the dataset and extracting relevant potential features “as assessed by psychological experts” are essential, as Indonesians tend to mix formal words, non-formal words, slang and abbreviations when writing social media posts. For this article, raw data were derived from a predefined dominance, influence, stability and conscientious (DISC) quiz website, returning 316,967 tweets from 1,244 Twitter accounts “filtered to include only personal and Indonesian-language accounts”. Using a combination of NLP techniques and machine learning, the authors aim to develop a better approach and more robust model, especially for the Indonesian language. Findings The authors find that employing a SMOTETomek re-sampling technique and hyperparameter tuning boosts the model’s performance on formalized datasets by 57% (as measured through the F1-score). Originality/value The process of cleaning dataset and extracting relevant potential features assessed by psychological experts from it are essential because Indonesian people tend to mix formal words, non-formal words, slang words and abbreviations when writing tweets. Organic data derived from a predefined DISC quiz website resulting 1244 records of Twitter accounts and 316.967 tweets.

Highlights

  • Personality is what distinguishes individual humans from each other and defines their tendencies in their reactions and actions

  • We aim to develop a better approach, combining natural language processing (NLP) techniques and machine learning to form a more robust model

  • The most challenging part of the study was the textual feature preparation, which we handled by using NLP techniques to clean and preprocess the data

Read more

Summary

Introduction

Personality is what distinguishes individual humans from each other and defines their tendencies in their reactions and actions. Analysis of individuals’ personal social media accounts offers a promising approach, as this method does not require users to complete any questionnaires, thereby reducing necessary time and increasing credibility. Social media usage is increasing every day, and a huge amount of textual and visual data is uploaded to the Internet daily [1]. For such tasks, Twitter and Facebook are among the two most popular social media platforms, as they provide accessible application programming interfaces (APIs) that might be used in conjunction with external testing applications for corpus and data collection [2]. Twitter has a tremendous number of tweets from users, especially in Indonesia, and this is beneficial for personality profiling; it has been proven that data volume correlates positively with profiling accuracy [4]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call