Abstract

Hotel booking service providers in the form of websites or online-based applications have provided features where consumers can provide a review regarding their assessment of the hotel. But the number of reviews available makes users unable to filter out all the reviews. Sentiment analysis can be used as a solution to overcome this by classifying reviews into positive or negative sentiments. This study aims to determine the application of n-gram and naive bayes methods in sentiment analysis classification. The research phase includes: (1) the hotel review data collection was obtained from the TripAdvisor.com website, (2) the data preprocessing process is data cleaning and case folding, (3) the process of tokenization using the n-gram method consists unigram, bigram, and trigram, (4) the process of word weighting using Term Frequence Inverse-Document Frequency (TF-IDF) method, (5) the process of classification using the Naive Bayes method to classify hotel reviews to be positive or negative, (6) the evaluation process to determine the results of the performance of the algorithm using a confusion matrix that will produce the value of precision, recall, accuracy and error rate. Based on these results, classification using the Naive Bayes and unigram methods obtained precision results of 94%, recall 100%, accuracy 97% and error rate 3%. The bigram methods obtained precision results of 89%, recall 94%, accuracy 92% and error rate 8%. The trigram methods obtained precision results of 52%, recall 80%, accuracy 58% and error rate 42%. Based on accuracy results, It can be concluded that tokenization unigram method better than other tokenization methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call