Abstract

Hate speech is one of the most common cases on Twitter. It is limited to 280 characters in uploading tweets, resulting in many word variations and possible vocabulary mismatches. Therefore, this study aims to overcome these problems and build a hate speech detection system on Indonesian Twitter. This study uses 20,571 tweet data and implements the Feature Expansion method using Word2vec to overcome vocabulary mismatches. Other methods applied are Bag of Word (BOW) and Term Frequency-Inverse Document Frequency (TF-IDF) to represent feature values in tweets. This study examines two methods in the classification process, namely Support Vector Machine (SVM) and Random Forest (RF). The final result shows that the Feature Expansion method with TF-IDF weighting in the Random Forest classification gives the best accuracy result, which is 88,37%. The Feature Expansion method with TF-IDF weighting can increase the accuracy value from several tests in detecting hate speech and overcoming vocabulary mismatches.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.