In the interconnected scape of today's internet, the dark web emerges as a concealed point, covering a myriad of illicit activities that pose substantial cybersecurity risks. This study investigates the attribution of threats within the dark web environment, leveraging on a machine learning approach to bridge the gap between technical indicators and linguistic and behavioral insights. Through a comprehensive methodology involving web crawling and data gathering, a dataset encompassing key variables such as attack motivation, method, web part, and threat actor was gathered. Principal Component Analysis was employed for feature selection, followed by the application of Multinomial Naive Bayes (MNB), Support Vector Machine (SVM), Random Forest (RF), and CatBoost algorithms for classification. Performance evaluation metrics including precision, recall, and F1-score were utilized to assess the efficacy of each algorithm. Results indicate a notable prevalence of cybercrimes within the dark web, underscoring the necessity for enhanced cybersecurity strategies tailored to address its unique challenges. Furthermore, the comparative analysis demonstrates varying performance levels among the machine learning algorithms, with Multinomial Naive Bayes exhibiting the highest accuracy. This research contributes to advancing threat attribution techniques in the dark web, ultimately aiming to bolster cybersecurity defenses and mitigate future cyber threats.
Read full abstract