Abstract
Hotel booking service providers in the form of websites or online-based applications have provided features where consumers can provide a review regarding their assessment of the hotel. But the number of reviews available makes users unable to filter out all the reviews. Sentiment analysis can be used as a solution to overcome this by classifying reviews into positive or negative sentiments. This study aims to determine the application of n-gram and naive bayes methods in sentiment analysis classification. The research phase includes: (1) the hotel review data collection was obtained from the TripAdvisor.com website, (2) the data preprocessing process is data cleaning and case folding, (3) the process of tokenization using the n-gram method consists unigram, bigram, and trigram, (4) the process of word weighting using Term Frequence Inverse-Document Frequency (TF-IDF) method, (5) the process of classification using the Naive Bayes method to classify hotel reviews to be positive or negative, (6) the evaluation process to determine the results of the performance of the algorithm using a confusion matrix that will produce the value of precision, recall, accuracy and error rate. Based on these results, classification using the Naive Bayes and unigram methods obtained precision results of 94%, recall 100%, accuracy 97% and error rate 3%. The bigram methods obtained precision results of 89%, recall 94%, accuracy 92% and error rate 8%. The trigram methods obtained precision results of 52%, recall 80%, accuracy 58% and error rate 42%. Based on accuracy results, It can be concluded that tokenization unigram method better than other tokenization methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.