Abstract
Hate speech is characterized as a deliberate attack directed towards a group of people motivated by aspects of the group, s identity. There is a growing interest in solutions involving automatic hate speech detection in response to the proliferation of hate speech. However., most automatic hate speech detection tools are designed for high-resource languages such as English which results in challenges in detecting hate speech in low-resource languages such as Filipino. Social media users within the Philippines predominantly use native language or a code-switched variation such as Taglish as the preferred linguistic style in online communication. This study seeks to determine linguistic features that characterize hate speech in the Philippine setting. The study characterizes hate speech using the following features: bilingual., part-of-speech., and psycho-linguistic features. Feature extraction was facilitated via fastText., NLTK (Natural Language Toolkit)., and LIWC (Linguistic Inquiry and Word Count) from an existing Filipino hate speech corpus collected during the 2016 Philippine Presidential Elections. Results show that hate speech from this dataset has significantly different features from non-hate speech. Specifically., the distinct features include language dominance., frequency of code-switching., frequency of parts-of-speech., and LIWC's summary variables and psychological process. These features which have been demonstrated to be statistically different between hate speech and non-hate speech can be leveraged to augment existing hate speech detection models., particularly within low-resource language contexts.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.