Abstract

AbstractOur previous study on classification and detection of abusive language in South African social media space has shown the high prospect of surface level features and classical machine learning models in terms of accuracy. However, much improvement is still needed in the aspect of F1-score. Therefore, the state-of-the-arts such as neural embeddings (Word2Vec, Doc2Vec, GloVe) and neural network-based transformer models (BERT and mBERT) which have performed well in many hate speech isolation tasks are explored in this work. In the evaluation of classical machine learning algorithms, Word2Vec with Support Vector Machine outperformed the previous models, Doc2Vec and GloVe in terms of F1-score. In the evaluation of neural networks, all the neural embedding and transformer models performed worse than the previous models in terms of F1-score. In conclusion, the impressive performance of Word2Vec neural embedding with classical machine learning algorithms in terms of best F1-score of 0.62 and accuracy of 0.86 shows its good prospect in the isolation of abusive language and hate speech in South African social media space.KeywordsComputational modelNeural modelsSouth African abusive languagesClassification

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.