Abstract

The article is devoted to the development of recognition tools for the emotional state of the speaker. The prospects of using neural networks for the analysis of fixed fragments of a voice signal is shown. The necessity of adapting the appearance and parameters of the neural network model to the conditions of the task of recognizing emotions by voice is established. As a result of the studies, it was determined that in the task of recognizing the speaker’s emotions by voice fragments of a fixed duration, it is advisable to use a two-layer perceptron, the input parameters of which are associated with mel-cepstral coefficients characterizing each of the quasi-stationary fragments of the analysed voice signal, and the output parameters correspond to the recognizable emotions of the speaker. The feasibility of using a two-layer perceptron is confirmed by computer experiments. It was determined that the directions of further research are related to determining the number of mel-cepstral coefficients, which is sufficient to describe a single quasistationary fragment, and adapting the parameters of the two-layer perceptron to recognition conditions under the influence of various kinds of interference.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.