Abstract

Sentiment analysis is one of the most frequently used aspects of Natural Language Processing (NLP), which utilizes the polarity classification of reviews expressed at the aspect, sentence or document level. Several businesses and organizations utilize this technique to improve production, as well as employee and service efficiency. However, the users’ reviews in our study were expressed in an unstructured data form, which contained spelling errors, leading to complex classifications for both the users and the machine. To solve the problem, a supervised technique of Machine Learning (ML) algorithms can be applied to the data extraction, where classification polarity can be categorized into a positive, negative or neutral class. In this research, we compared nine ML algorithms to determine the most suitable ML algorithm for creating sentiment polarity classification of customer reviews in Thai, which is a low-resource language. The dataset was collected manually from two online agencies (Agoda.com and Booking.com) utilizing a special Thai language. We employed 11 preprocessing steps to clean and handle the large amount of noise data. Next, the Delta TF-IDF, TF-IDF, N-Gram, and Word2Vec techniques were applied to convert the text reviews into vectors, processed with different ML algorithms, to determine sentiment polarity classification and to make accurate comparisons. All ML algorithms were evaluated for sentiment polarity classification with ten-fold cross-validation, with which to compare the values of recall, precision, F1-score and accuracy. The experiment results show that the Support Vector Machine (SVM) using the Delta TF-IDF technique was the best ML algorithm for polarity classification of hotel reviews in the Thai language with the highest accuracy of 89.96%. The results of this research can be applied as the tool for small and medium-sized enterprises within the field of sentiment analysis of the Thai language in the hotel domain.

Highlights

  • Customer reviews are an important source of information for many companies, as they can help improve product and service quality

  • We focused on the problem of polarity classification in sentiment analysis, including positive and negative classes, based on Thai language reviews of the hotel domain

  • The results show that the continuous bag of words (CBOW) method achieved better performance than skip-gram

Read more

Summary

Introduction

Customer reviews are an important source of information for many companies, as they can help improve product and service quality. Customer reviews are one of the most important assets of hotel business companies, which can find hidden insight reviews in order to improve core business functions such as satisfaction, security, product, location and comfort. Customers must spend more time reading and analysing long reviews to classify the sentiment polarity manually (Sungsri & Ua-apisitwong, 2017). Understanding such reviews can prove difficult for both people and machines. To solve this problem, the sentiment analysis technique was employed to interpret the customer reviews and provide polarity classification as positive, neutral and negative (Rathee et al, 2018)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call