Abstract

Internet-based applications are making the habitual society and exploring new ways to perform online-based crimes. Numerous cybercriminals are engaged in the different platforms of the internet-based virtual world, carrying out cybercrimes according to predetermined and preplanned agendas. As technology advances, cyberstalking, cyberbullying, and other forms of cyber harassment are growing on social media, email, and other online platforms. Cyberstalking uses internet-based technology to harass, intimidate, and undermine individuals online with different approaches. In order to examine the impact of feature selection strategies for improving model performance, this paper proposes a machine learning-based cyberstalking detection model. The proposed model used the Term Frequency-Inverse Document Frequency (TF-IDF) feature extraction method to extract features, and three distinct approaches, TF-IDF + Chi-Square Test, and TF-IDF + Information Gain, were used to select the different numbers of relevant features. In the cyberstalking detection model, a Support Vector Machine (SVM) was employed for classification purposes. Based on the SVM classifier's performance, each feature selection approach's impact on the various feature sets was assessed. According to experimental findings, the TF-IDF + Chi-Square Test outperformed other applied approaches and improved detection mode performance. Additionally, experimental findings demonstrate that the TFIDF + Chi-Square Test approach also performs better in a small collection of relevant features than other approaches that have been utilized.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call