Abstract
This study aims to develop a Natural Language Processing (NLP)-based feature extraction algorithm optimized for personality type classification in adolescents. The algorithm used is TF-IDF + N-Gram Z, which combines Term Frequency-Inverse Document Frequency (TF-IDF) with the N-Gram Z technique to improve the feature representation of the analyzed text. TF-IDF functions to measure the importance of words in a document, while N-Gram Z enriches the context by considering the order of words that appear sequentially. The dataset in this study consists of 3,200 sentences generated by adolescent respondents through a survey designed to explore aspects of their personality. After the feature extraction process is complete, three variants of the Naïve Bayes method are applied for classification, namely Multinomial Naïve Bayes, Bernoulli Naïve Bayes, and Complement Naïve Bayes. Each variant has distinctive characteristics in handling certain data types, such as binomial and multinomial data. The results of the study show that the combined TF-IDF + N-Gram Z algorithm can produce highly representative features, as evidenced by high classification performance. The Multinomial Naïve Bayes and Complement Naïve Bayes variants each achieved 98% accuracy. These findings provide significant contributions to the development of NLP-based personality classification methods for Detecting Adolescent Personality. The combination of the TF-IDF + N-Gram Z algorithm with various Naïve Bayes variants produces an exceedingly high level of accuracy and can be applied in practice in the fields of psychology and adolescent education.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have