Although the Dark web was originally used for maintaining privacy-sensitive communication for business or intelligence services for defence, government and business organizations, fighting against censorship and blocked content, later, the advantage of technologies behind the Dark web were abused by criminals to conduct crimes which involve drug dealing to the contract of assassinations in a widespread manner. Since the communication remains secure and untraceable, criminals can easily use dark web service via The Onion Router (TOR), can hide their illegal motives and can conceal their criminal activities. This makes it very difficult to monitor and detect cybercrimes over the dark web. With the evolution of machine learning, natural language processing techniques, computational big data applications and hardware, there is a growing interest in exploiting dark web data to monitor and detect criminal activities. Due to the anonymity provided by the Dark Web, the rapid disappearance and the change of the uniform resource locators (URLs) of the resources, it is not as easy to crawl the Drak web and get the data as the usual surface web which limits the researchers and law enforcement agencies to analyse the data. Therefore, there is an urgent need to study the technology behind the Dark web, its widespread abuse, its impact on society and the existing systems, to identify the sources of drug deal or terrorism activities. In this research, we analysed the predominant darker sides of the world wide web (WWW), their volumes, their contents and their ratios. We have performed the analysis of the larger malicious or hidden activities that occupy the major portions of the Dark net; tools and techniques used to identify cybercrimes which happen inside the dark web. We applied a systematic literature review (SLR) approach on the resources where the actual dark net data have been used for research purposes in several areas. From this SLR, we identified the approaches (tools and algorithms) which have been applied to analyse the Dark net data, the key gaps as well as the key contributions of the existing works in the literature. In our study, we find the main challenges to crawl the dark web and collect forum data are: scalability of crawler, content selection trade off, and social obligation for TOR crawler and the limitations of techniques used in automatic sentiment analysis to understand criminals’ forums and thereby monitor the forums. From the comprehensive analysis of existing tools, our study summarizes the most tools. However the forum topics rapidly change as their sources changes; criminals inject noises to obfuscate the forum’s main topic and thus remain undetectable. Therefore supervised techniques fail to address the above challenges. Semi-supervised techniques would be an interesting research direction.
Read full abstract