Abstract
As the regulation on the surface web becomes more stringent, criminals are gradually turning to the darknet markets for illicit operations. Moderating and studying the content on the marketplaces contribute to the combat of criminal forces in the darknet. Nevertheless, to evade the surveillance of law enforcement, jargons are widely used in criminal conversations as a disguise. These jargons misinterpret the meaning of seemingly innocuous words in cryptic ways, creating a huge challenge for criminal investigation. Current research on Chinese jargon detection focuses on keyword matching. However, this approach cannot keep up with the rapid update of new jargons from various domains. To the best of our knowledge, we are the first to conduct Chinese jargons detection research in the darknet markets. Specifically, we design an unsupervised cross-domain adaptation Chinese jargon detection framework (CJD-Framework) integrated with the pre-trained language model. Firstly, six underground markets in Chinese are crawled to build the first dataset of darknet corpus (DC-dataset). Next, a pre-training model based on Chinese word is proposed to extract contextual embeddings for darknet words. Finally, relying on semantic similarity analysis, a cross-corpus framework is constructed to effectively identify Chinese jargons in the darknet. Comprehensive experiments demonstrate the effectiveness and generalizability of the CJD-framework over the state-of-the-art models, with a detection accuracy of 91.5%. The darknet corpus dataset and innovative framework proposed in this research can provide sources and ideas for future analysis of underground crimes in the darknet markets.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.