Abstract

AbstractThe aim of this research is to use the sentiment analysis techniques to deal with large dataset corpus, which has been collected, to detect and classify anti-Islamic online contents. Anti-Islamic websites have spread a lot in the last decade causing a lot of hate toward the Muslims communities; there have been many websites that attack Islam and Muslims and insult the Messenger, blessings and peace be upon him. We have gathered our proper dataset from different sources into a large corpus, and we have produced two datasets (balanced and non-balanced) for the English language. The framework of our proposed methodology has been described. Two approaches are used in this framework, the first one is based on supervised Machine Learning (ML) approach using Support Vector Machines (SVM) model as classifier and Term Frequency-Inverse Document Frequency (TF-IDF) as feature extraction; the second one is a hybrid approach combining lexicon-based dictionary and TF-IDF as feature extraction with SVM algorithm. We conducted different experiments and we compared the obtained results. We first use TF-IDF on word level, and then we have improved the model using tri-gram level. The experimental results show that the ML approach is the best approach for both datasets that produces high accuracy of 97% applied on the non-balanced English dataset using SVM with tri-gram level TF-IDF as feature extraction. Additionally, SVM with word-level TF-IDF also provides excellent results regardless of the type of dataset.KeywordsWeb text miningText analysisText classificationSVMSentiment analysisFake newsHate speechToxicity detection

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call