Abstract

A novel Indonesian sentiment lexicon (SentIL -- Sentiment Indonesian Lexicon) is created with an automatic pipeline; from creating sentiment seed words, adding new words with slang words, emoticons, and from the given dictionary and sentiment corpus, until tuning sentiment value with tagged sentiment corpus. It begins by taking seed words from WordNet Bahasa that mapped with sentiment value from English SentiWordNet . The seed words are enriched by combining the dictionary-based method with words’ synonyms and antonyms, and corpus-based methods with word embedding for word similarity that trained in positive and negative sentiment corpus from online marketplaces review and Twitter data. The valence score of each lexicon is recalculated based on its relative occurrence in the corpus. We also add some famous slang words and emoticons to enrich the lexicon. Our experiment shows that the proposed method can provide an increase of 3.5 times lexicon number as well as improve the accuracy of 80.9% for online review and 95.7% for Twitter data, and they are better than other published and available Indonesian sentiment lexicons.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call