"When the Code becomes a Crime Scene" Towards Dark Web Threat Intelligence with Software Quality Metrics

G Cascavilla,G Catolino,F Ebert,W J Van Den Heuvel,D.A Tamburri

doi:10.1109/icsme55016.2022.00055

Abstract

The increasing growth of illegal online activities in the so-called dark web—that is, the hidden collective of internet sites only accessible by a specialized web browsers—has challenged law enforcement agencies in recent years with sparse research efforts to help. For example, research has been devoted to supporting law enforcement by employing Natural Language Processing (NLP) to detect illegal activities on the dark web and build models for their classification. However, current approaches strongly rely upon the linguistic characteristics used to train the models, e.g., language semantics, which threatens their generalizability. To overcome this limitation, we tackle the problem of predicting illegal and criminal activities—a process defined as threat intelligence—on the dark web from a complementary perspective—that of dark web code maintenance and evolution— and propose a novel approach that uses software quality metrics and dark website appearance parameters instead of linguistic characteristics. We performed a preliminary empirical study on 10.367 web pages and collected more than 40 code metrics and website parameters using sonarqube. Results show an accuracy of up to 82% for predicting the three types of illegal activities (i.e., suspicious, normal, and unknown) and 66% for detecting 26 specific illegal activities, such as drugs or weapons trafficking. We deem our results can influence the current trends in detecting illegal activities on the dark web and put forward a completely novel research avenue toward dealing with this problem from a software maintenance and evolution perspective.

Full Text