NEURAL NETWORK VOICE RECOGNITION MODEL

Liudmyla Tereikovska

doi:10.32347/2412-9933.2020.41.95-100

Abstract

The article is devoted to the development of recognition tools for the emotional state of the speaker. The prospects of using neural networks for the analysis of fixed fragments of a voice signal is shown. The necessity of adapting the appearance and parameters of the neural network model to the conditions of the task of recognizing emotions by voice is established. As a result of the studies, it was determined that in the task of recognizing the speaker’s emotions by voice fragments of a fixed duration, it is advisable to use a two-layer perceptron, the input parameters of which are associated with mel-cepstral coefficients characterizing each of the quasi-stationary fragments of the analysed voice signal, and the output parameters correspond to the recognizable emotions of the speaker. The feasibility of using a two-layer perceptron is confirmed by computer experiments. It was determined that the directions of further research are related to determining the number of mel-cepstral coefficients, which is sufficient to describe a single quasistationary fragment, and adapting the parameters of the two-layer perceptron to recognition conditions under the influence of various kinds of interference.

Full Text