FastText ve Kelime Çantası Kelime Temsil Yöntemlerinin Turistik Mekanlar İçin Yapılan Türkçe İncelemeler Kullanılarak Karşılaştırılması

Muhammed Çağrı Aksu,Ersin Karaman

doi:10.31590/ejosat.776629

Abstract

Nowadays, with the increasing number and use of social media platforms, people now share their experiences about a product they have bought or a place they have been to on social media platforms more frequently. Considering the volume of data on social media platforms, it is considered that there is some meaningful information for institutions or companies in the reviews and experiences shared on social media platforms. As such, it is important to improve the methods of extracting meaningful information from the reviews and experiences shared on social media and to know which method is better. In this study, the classification successes of the bag of words and the fastText word representation methods, which are among the word representation methods in sentiment analysis methods mentioned above, were compared by using Turkish reviews performed for touristic places. Besides, while performing the comparison process, it was measured whether the process of separating the words into their roots and negation of the words, which is the preliminary stage of the sentiment analysis process, contributed to the classification success. In the study, both two-class (positive, negative) sentiment analysis and three-class (positive, negative, neutral) sentiment analysis were performed. Six data sets were created to carry out the mentioned comparison operations. The data sets were first classified using the Naive Bayes (NB), Multinomial Naive Bayes (MNB), k-Nearest Neighbor (k-NN) and Support Vector Machines (SVM) algorithms, which are frequently used in text mining, and based on bag of words word representation method, they were classified with WEKA program. After the test results of all data sets were obtained according to the bag of words word representation method, the tests of the fastText word representation method were carried out using the fastText library of the Python programming language. Classification procedures were carried out with 10-fold cross-validation methods, and f-score values of the classification processes were obtained. Finally, it was determined that bag of words word representation method performed a more successful classification than the fastText word representation method in two-class emotion analysis, while the fastText word representation method performed a more successful classification process than bag of words word representation method in three-class emotional analysis. It was observed that the process of separating the words into their roots and negating the words, which are the preliminary processes of sentiment analysis, did not contribute positively or negatively to the classification processes performed with the fastText word representation method. However, it was determined that it had a minor contribution to sentiment analysis processes performed by using bag of words word representation method. In the two-class sentiment analysis, the most successful classification result was achieved by using the machine learning model created with the SVM algorithm with the value of 0.91 f-score employing bag of words word representation method. In the three-class sentiment analysis, the most successful classification result was achieved with the machine learning model created using the fastText word representation method with the value of 0.78 f-score.

Full Text