Abstract

Cyberbullying has become a serious problem in Thai social media. For example, some Thai people posted hate speeches on Myanmar workers in Thailand during the COVID-19 pandemic, which might elevate hate crime. It is imperative and urgent to detect cyberbullying on Thai social media. The task is a text classification problem. Moreover, hate speeches contain the order of severity levels, but many pieces of work did not consider this point in the model. Therefore, we developed a Thai hate-speech classification method with various loss functions to detect such hate speeches accurately. We evaluated them on a corpus of ordinal-imbalanced Thai text. The evaluated outcomes indicated that the best-in terms of <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$F$</tex> <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</inf> -score-model was the model with a loss function of a hybrid between an Ordinal regression loss function and Pearson correlation coefficients (common in similarity function). It yielded an average F <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</inf> -score of 78.38 %-0.88 % significantly higher than the score achieved by a conventional loss function-and an average mean squared error of 0.2478-5.49 % relative improvement. Thus, the proposed hybrid loss function improved the efficiency of the model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.