MBTI Personality Prediction Using Machine Learning and SMOTE for Balancing Data Based on Statement Sentences

Gregorius Ryan,Pricillia Katarina,Derwin Suhartono

doi:10.3390/info14040217

Gregorius Ryan, Pricillia Katarina + Show 1 more

Open Access

https://doi.org/10.3390/info14040217

Copy DOI

Journal: Information	Publication Date: Apr 3, 2023
Citations: 6	License type: CC BY 4.0

Affiliation: Binus University

Abstract

The rise of social media as a platform for self-expression and self-understanding has led to increased interest in using the Myers–Briggs Type Indicator (MBTI) to explore human personalities. Despite this, there needs to be more research on how other word-embedding techniques, machine learning algorithms, and imbalanced data-handling techniques can improve the results of MBTI personality-type predictions. Our research aimed to investigate the efficacy of these techniques by utilizing the Word2Vec model to obtain a vector representation of words in the corpus data. We implemented several machine learning approaches, including logistic regression, linear support vector classification, stochastic gradient descent, random forest, the extreme gradient boosting classifier, and the cat boosting classifier. In addition, we used the synthetic minority oversampling technique (SMOTE) to address the issue of imbalanced data. The results showed that our approach could achieve a relatively high F1 score (between 0.7383 and 0.8282), depending on the chosen model for predicting and classifying MBTI personality. Furthermore, we found that using SMOTE could improve the selected models’ performance (F1 score between 0.7553 and 0.8337), proving that the machine learning approach integrated with Word2Vec and SMOTE could predict and classify MBTI personality well, thus enhancing the understanding of MBTI.

Full Text