Analysis of speech MEL scale and its classification as big data by parameterized KNN

Skuratovskii R,Osadhyy E,Bazarna A

doi:10.15407/jai2021.01.042

Abstract

Recognizing emotions and human speech has always been an exciting challenge for scientists. In our work the parameterization of the vector is obtained and realized from the sentence divided into the containing emotional-informational part and the informational part is effectively applied. The expressiveness of human speech is improved by the emotion it conveys. There are several characteristics and features of speech that differentiate it among utterances, i.e. various prosodic features like pitch, timbre, loudness and vocal tone which categorize speech into several emotions. They were supplemented by us with a new classification feature of speech, which consists in dividing a sentence into an emotionally loaded part of the sentence and a part that carries only informational load. Therefore, the sample speech is changed when it is subjected to various emotional environments. As the identification of the speaker’s emotional states can be done based on the Mel scale, MFCC is one such variant to study the emotional aspects of a speaker’s utterances. In this work, we implement a model to identify several emotional states from MFCC for two datasets, classify emotions for them on the basis of MFCC features and give the correspondent comparison of them. Overall, this work implements the classification model based on dataset minimization that is done by taking the mean of features for the improvement of the classification accuracy rate in different machine learning algorithms. In addition to the static analysis of the author's tonal portrait, which is used in particular in MFFC, we propose a new method for the dynamic analysis of the phrase in processing and studying as a new linguistic-emotional entity pronounced by the same author. Due to the ranking by the importance of the MEL scale features, we are able to parameterize the vectors coordinates be processed by the parametrized KNN method. Language recognition is a multi-level task of pattern recognition. Here acoustic signals are analyzed and structured in a hierarchy of structural elements, words, phrases and sentences. Each level of such a hierarchy may provide some temporal constants: possible word sequences or known types of pronunciation that reduce the number of recognition errors at a lower level. An analysis of voice and speech dynamics is appropriate for improving the quality of human perception and the formation of human speech by a machine and is within the capabilities of artificial intelligence. Emotion results can be widely applied in e-learning platforms, vehicle on-board systems, medicine, etc

Full Text