Supervised ranking approach to identify infLuential websites in the darknet

Mhd Wesam Al Nabki,Eduardo Fidalgo,Enrique Alegre,Deisy Chaves

doi:10.1007/s10489-023-04671-9

Mhd Wesam Al Nabki, Eduardo Fidalgo + Show 2 more

Open Access

https://doi.org/10.1007/s10489-023-04671-9

Copy DOI

Journal: Applied Intelligence	Publication Date: Jul 4, 2023
Citations: 1	License type: CC BY 4.0

Affiliation: University of Leon

Abstract

The anonymity and high security of the Tor network allow it to host a significant amount of criminal activities. Some Tor domains attract more traffic than others, as they offer better products or services to their customers. Detecting the most influential domains in Tor can help detect serious criminal activities. Therefore, in this paper, we present a novel supervised ranking framework for detecting the most influential domains. Our approach represents each domain with 40 features extracted from five sources: text, named entities, HTML markup, network topology, and visual content to train the learning-to-rank (LtR) scheme to sort the domains based on user-defined criteria. We experimented on a subset of 290 manually ranked drug-related websites from Tor and obtained the following results. First, among the explored LtR schemes, the listwise approach outperforms the benchmarked methods with an NDCG of 0.93 for the top-10 ranked domains. Second, we quantitatively proved that our framework surpasses the link-based ranking techniques. Third, we observed that using the user-visible text feature can obtain comparable performance to all the features with a decrease of 0.02 at NDCG@5. The proposed framework might support law enforcement agencies in detecting the most influential domains related to possible suspicious activities.

Full Text