Abstract
Dark web vendor identification can be seen as an authorship aliasing problem, aiming to determine whether different accounts on different markets belong to the same real-world vendor, in order to locate cybercriminals involved in dark web market transactions. Existing open-source datasets for dark web marketplaces are outdated and cannot simulate real-world situations, while data labeling methods are difficult and suffer from issues such as inaccurate labeling and limited cross-market research. The problem of identifying vendors’ multiple identities on the dark web involves a large number of categories and a limited number of samples, making it difficult to use traditional multiclass classification models. To address these issues, this paper proposes a metric learning-based method for dark web vendor identification, collecting product data from 21 currently active English dark web marketplaces and using a multi-dimensional feature extraction method based on product titles, descriptions, and images. Using pseudo-labeling technology combined with manual labeling improves data labeling accuracy compared to previous labeling methods. The proposed method uses a Siamese neural network with metric learning to learn the similarity between vendors and achieve the recognition of vendors’ multiple identities. This method achieved better performance with an average F1-score of 0.889 and an accuracy rate of 97.535% on the constructed dataset. The contributions of this paper lie in the proposed method for collecting and labeling data for dark web marketplaces and overcoming the limitations of traditional multiclass classifiers to achieve effective recognition of vendors’ multiple identities.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.