Abstract

A novel Indonesian sentiment lexicon (SentIL -- Sentiment Indonesian Lexicon) is created with an automatic pipeline; from creating sentiment seed words, adding new words with slang words, emoticons, and from the given dictionary and sentiment corpus, until tuning sentiment value with tagged sentiment corpus. It begins by taking seed words from WordNet Bahasa that mapped with sentiment value from English SentiWordNet . The seed words are enriched by combining the dictionary-based method with words’ synonyms and antonyms, and corpus-based methods with word embedding for word similarity that trained in positive and negative sentiment corpus from online marketplaces review and Twitter data. The valence score of each lexicon is recalculated based on its relative occurrence in the corpus. We also add some famous slang words and emoticons to enrich the lexicon. Our experiment shows that the proposed method can provide an increase of 3.5 times lexicon number as well as improve the accuracy of 80.9% for online review and 95.7% for Twitter data, and they are better than other published and available Indonesian sentiment lexicons.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.