Abstract

Hand-crafted features engineering is a labor-intensive and highly-cost task. In this paper, we implement Word2Vec as an alternative solution of hand-crafted features for sentiment analysis of hotel reviews in the Indonesian language. To obtain the highest performance of sentiment analysis, we evaluate three parameters of Word2Vec include Word2Vec model architecture, evaluation method, and vector dimension. This evaluation process was implemented towards our proposed corpus for a specific domain, i.e. hotel reviews, consists of 2500 hotel reviews in the Indonesian language (1250 positive reviews and 1250 negative reviews). The result shows that the highest accuracy values are obtained under the combination of the following parameters, namely the architecture of Word2Vec Model is Skip-gram model, the evaluation method is Hierarchical Softmax, as well as the vector dimension is 100. The Skip-gram model results highest accuracy for words that rarely appear, such as in sentiment analysis task, whereas the Hierarchical Softmax provides better results since during the training process using a binary tree model to represent all of the words in the vocabulary and leaf nodes representing rare words so that rarely appearing words will inherit vector representations in it. Furthermore, to obtain the optimal value of accuracy, then we should increase the vector dimensions and amount of data simultaneously.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.