The relevance of the study is dictated by the current state in the field of telephone fraud. According to research conducted by Kaspersky Lab, the share of users who encountered various unwanted spam calls in the spring of 2022 was at the level of 71%. The subject of the research is machine learning and deep learning technologies for determining emotions by the timbre of the voice. The authors consider in detail such aspects as: the creation of a marked-up dataset; the conversion of WAV audio format into a numerical form convenient for fast processing; machine learning methods for solving the problem of multiclass classification; the construction and optimization of neural network architecture to determine emotions in real time. A special contribution to the study of the topic is that the authors implemented a fast method of conversion sound formats into numerical coefficients, which significantly increased the speed of data processing, practically without sacrificing their informativeness. As a result, the models were trained by machine learning algorithms quickly and efficiently. It should be particularly noted that the architecture of a convolutional neural network was modeled, which allowed to obtain the quality of model training up to 98%. The model turned out to be lightweight and was taken as the basis for training the model to determine emotions in real time. The results of the real-time operation of the model were comparable with the results of the trained model. The developed algorithms can be implemented in the work of mobile operators or banks in the fight against telephone fraud. The article was prepared as part of the state assignment of the Government of the Russian Federation to the Financial University for 2022 on the topic "Models and methods of text recognition in anti-telephone fraud systems" (VTK-GZ-PI-30-2022).
Read full abstract