Abstract

The presence of the word negation is able to change the polarity of the text if it is not handled properly it will affect the performance of the sentiment classification. Negation words in Indonesian are ‘tidak’, ‘bukan’, ‘belum’ and ‘jangan’. Also, there is a conjunction word that able to reverse the actual values, as the word ‘tetapi’, or ‘tapi’. Unigram has shortcomings in dealing with the existence of negation because it treats negation word and the negated words as separate words. A general approach for negation handling in English text gives the tag ‘NEG_’ for following words after negation until the first punctuation. But this may gives the tag to un-negated, and this approach does not handle negation and conjunction in one sentences. The rule-based method to determine what words negated by adapting the rules of Indonesian language syntactic of negation to determine the scope of negation was proposed in this study. With adapting syntactic rules and tagging “NEG_” using SVM classifier with RBF kernel has better performance results than the other experiments. Considering the average F1-score value, the performance of this proposed method can be improved against baseline equal to 1.79% (baseline without negation handling) and 5% (baseline with existing negation handling) for a dataset that all tweets contain negation words. And also for the second dataset that has the various number of negation words in document tweet. It can be improved against baseline at 2.69% (without negation handling) and 3.17% (with existing negation handling).

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.