Abstract

Introversion and extroversion are personality traits that assess the type of interaction between people and others. Introversion and extraversion have their advantages and disadvantages. Knowing their personality, people can utilize these advantages and disadvantages for their benefit. This study compares and evaluates several machine learning models and dataset balancing methods to predict the introversion-extraversion personality based on the survey result conducted by Open-Source Psychometrics Project. The dataset was balanced using three balancing methods, and fifteen questions were chosen as the features based on their correlations with the personality self-identification result. The dataset was used to train several supervised machine-learning models. The best model for the Synthetic Minority Oversampling (SMOTE), Adaptive Synthesis Sampling (ADASYN), and Synthetic Minority Oversampling-Edited Nearest Neighbor (SMOTE-ENN) datasets was the Random Forest with the 10-fold cross-validation accuracy of 95.5%, 95.3%, and 71.0%. On the original dataset, the best model was Support Vector Machine, with a 10-fold cross-validation accuracy of 73.5%. Based on the results, the best balancing methods to increase the models’ performance were oversampling. Conversely, the hybrid method of oversampling-undersampling did not significantly increase performance. Furthermore, the tree-like models, like Random Forest and Decision Tree, improved performance substantially from the data balancing. In contrast, the other models, excluding the SVM, did not show a significant rise in performance. This research implies that further study is needed on the hybrid balancing method and another classification model to improve personality classification performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call