Abstract

The strong anonymity and hard-to-track mechanisms of the dark web provide shelter for illegal activities. The illegal content on the dark web is diverse and frequently updated. Traditional dark web classification uses large-scale web pages for supervised training. However, the difficulty of collecting enough illegal dark web content and the time consumption of manually labeling web pages have become challenges of current research. In this paper, we propose a method that can effectively classify illegal activities on the dark web. Instead of relying on the massive dark web training set, we creatively select laws and regulations related to each type of illegal activities to train the machine learning classifiers and achieve a good classification performance. In the areas of pornography, drugs, weapons, hackers, and counterfeit credit cards, we select relevant legal documents from the United States Code for supervised training and conduct a classification experiment on the illegal content of the real dark web we collected. The results show that combined with TF-IDF feature extraction and Naive Bayes classifier, we achieved an accuracy of 0.935 in the experimental environment. Our approach allows researchers and the network law enforcement to check whether their dark web corpus contains such illegal activities based on the relevant laws of the illegal categories they care about in order to detect and monitor potential illegal websites in a timely manner. And because neither a large training set nor the seed keywords provided by experts are needed, this classification method provides another idea for the definition of illegal activities on the dark web. Moreover, it makes sense to help explore and discover new types of illegal activities on the dark web.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.