Abstract

Social media has become a major factor in people's lives, which affects their communication and psychological state. The widespread use of social media has formed new types of violence, such as cyberbullying. Manual detection and reporting of violent texts in social media applications are challenging due to the increasing number of social media users and the huge amounts of generated data. Automatic detection of violent texts is language-dependent, and it requires an efficient detection approach, which considers the unique features and structures of a specific language or dialect. Only a few studies have focused on the automatic detection and classification of violent texts in the Arabic Language. This paper aims to build a two-level classifier model for classifying Arabic violent texts. The first level classifies text into violent and non-violent. The second level classifies violent text into either cyberbullying or threatening. The dataset used to build the classifier models is collected from Twitter, using specific keywords and trending hashtags in Saudi Arabia. Supervised machine learning is used to build two classifier models, using two different algorithms, which are Support Vector Machine (SVM), and Naive Bayes (NB). Both models are trained in different experimental settings of varying the feature extraction method and whether stop-word removal is applied or not. The performances of the proposed SVM-based and NB-based models have been compared. The SVM-based model outperforms the NB-based model with F1 scores of 76.06%, and 89.18%, and accuracy scores of 73.35% and 87.79% for the first and second levels of classification, respectively.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.