Abstract

ABSTRACT In recent years, contactless fraud crimes via telecommunication and Internet have grown rapidly. Meanwhile, the rate of solved criminal cases is much lower, which is mainly due to two reasons. Firstly, the definition of risk factors in the field of new Internet and telecommunication fraud crime is not comprehensive, resulting in the problem not being well defined. Secondly, Internet fraud crime information is mostly recorded using natural language with huge volume, and there is a lack of automated and intelligent way to deeply analyze and extract the risk factor. To better analyze the Internet and telecommunication fraud crime to help solve more cases, in this paper, we propose a new Internet and telecommunication fraud crime risk factor extraction system. After studying the existing related research, we propose a novel risk factor extraction technology based on BERT. This novel technology can gracefully deal with multi-sources and heterogeneous data problems during the extraction of risk factors in multiple dimensions; meanwhile, it can significantly reduce the need for computation resources and improve the online serving performance. After experimentation, this technique can significantly reduce training time by 60%-70%, and meanwhile, it can reduce the computation resources by 80% and improve serving performance by 5 times during serving. In our approach, we propose a novel approach to set sample weight and loss weight based on data characteristics and data distribution during model training, which can significantly improve extraction precision. With adjusting the sample weight during model training, we can get 1.56% precision improved. Moreover, setting the loss weight during model training, the precision can be improved by 1.63% compared to baseline mode.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call