Abstract

Abstract: The Sentiment Analysis is used for the text analysing, and classification of the text attitude.We are using the computing advancement in the form of Machine Learning (ML) and Support Vector Machine (SVM) algorithm to train a dataset which is collected automatically through ArabiTools and Twitter API. The dataset contents are labelled by both means, automatic and manual, in order to maintain the efficiency of the detection of CyberBullying tweets. The dataset is automatically labelled with respect to the nature of the tweet. If a tweet contains one or more CyberBullying words, it is labelled as CyberBullying, while if there isn’t any word with aggressive meaning found, it is marked as the NonCyberBullying. After the data collection, there are several pre-processing techniques utilized, including the Normalization, Tokenization, Light Stemmer, ArabicStemmerKhoja, and Term Frequency-Inverse Document Frequency (TF-IDF)” term weighting schema.” After the preliminary steps, (SVM), a “supervised algorithm,” is used with WEKA and Python. There are three experiments that take place one with the WEKA tool using the Light Stemmer, the other is again with WEKA using ArabicStemmerKhoja, and the final experiment was performed with Python. The results are showing the WEKA is more efficient in classifying the text correctly, while Python is more effective with time to build the model. WEKA using the Light Stemmer have the efficiency of 85.49% and taken 352.51 seconds, and the WEKA using ArabicStemmerKhoja have the efficiency of 85.38% and taken 212.12 seconds, while the Python have the efficiency of 84.03% and taken 142.68 seconds

Highlights

  • The Internet has revolutionized the lifestyle; a distance of thousands of kilometers is just a number; a person can remain in-contact with another person with the help of the Internet

  • Social media has given a great boom to the Internet; people can share their opinions regarding any topic on social media like Instagram [1], Twitter [2], Facebook [3], etc

  • 1) Data Collection and Annotation The data was collected by using the ArabiTools and Twitter Application Programming Interface” (API) via two different methods, one is a random selection, where the words searched randomly, and the other is query-oriented, where the words are searched by using specific keywords, the words that mostly used to do Arabic CyberBullying such as Racist, ‫عنصري‬

Read more

Summary

Introduction

The Internet has revolutionized the lifestyle; a distance of thousands of kilometers is just a number; a person can remain in-contact with another person with the help of the Internet. Some people agree with our point of view, and some do not In this regard, we receive a lot of aggressive and offensive comments on our tweets even sometimes these comments are not in the context of our opinion; this causes a risk of CyberBullying, which means to bully a person by using the Internet and technology [6]. We receive a lot of aggressive and offensive comments on our tweets even sometimes these comments are not in the context of our opinion; this causes a risk of CyberBullying, which means to bully a person by using the Internet and technology [6] In this regard, we decided to find out the solution to this modern problem. It transforms the given data into a “higher-dimensional feature space” and finds out an optimal hyperplane, which will separate the given dataset in such a way that the variable of one category lies on one side of hyperplane and the other variables lies on the other side of hyperplane [13]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call