Abstract

The goal of the article is to describe the methods and algorithms of information technology for removal of emotional signs from the speech signal which has been proposed by the authors. Methods of extracting and transforming emotional features, as well as methods of classification of speech signals obtained after transformation are described. The task of removing emotions will be considered successful if, after the transformations, the emotion classifier assigns a speech signal to a class of emotionless speech. The scheme for solving the problem consists of the following steps: 1) extracting informative features and establishing the boundaries of conversion; 2) transforming input data informative features; 3) checking the effectiveness of the transformation. The first algorithm used for transformation is the PSOLA (Pitch-Synchronous OverLap and Add) algorithm designed to change the pitch of the main tone by stretching and compressing the signal on the timeline. The PSOLA algorithm can be used to shift the pitch of voice, retaining the formant position, and thus the identity of the vowels. The basic idea is to extend the time on pitch marks, while the shape of the audio wave of the segment does not change. The second method of transformation, the normalization of the dynamic range of the logarithm of energy, reduces the distribution of energy due to different levels of background noise [2]. The input parameter for the energy normalization unit is the frame energy. Energy is replaced by the energy logarithm based on the similarity to the logarithmic perception of sound by the human auditory system, and also due to the stability of the logarithm to the sharp changes in the energy level. To test the effectiveness of the transformation, the task of recognizing emotions in a language requires the building of a classifier. The input data for classifying and recognizing were emotional expressions, each of which was represented by a set of signs: the frequency of the main tone, the energy and the frequency of crossing zero. Global signs of the minimum, the maximum and the average of each of the sets of values are the vector of expressions. For the classification a model of a mixture of normal sections was used. Using the constructed statistical model of the speech signal, the a posteriori probability of belonging of the utterance to the emotionally neutral class was calculated. The output speech signal meets the set criterion of a successful solution of the problem as formal (result of classification) and perceptual signs, which allows to consider the task of removing emotional features solved.

Highlights

  • Приводится структура и алгоритмы предложенной авторами информационной технологии удаления эмоциональных признаков с речевого сигнала

  • The goal of the article is to describe the methods and algorithms of information technology for removal of emotional signs from the speech signal which has been proposed by the authors

  • Methods of extracting and transforming emotional features, as well as methods of classification of speech signals obtained after transformation are described

Read more

Summary

СТРУКТУРА ІНФОРМАЦІЙНОЇ ТЕХНОЛОГІЇ ВИДАЛЕННЯ ЕМОЦІЙНИХ ОЗНАК З МОВНОГО

Наведено структуру та алгоритмів запропонованої авторами інформаційної технології видалення емоційних ознак з мовного сигналу. The output speech signal meets the set criterion of a successful solution of the problem as formal (result of classification) and perceptual signs, which allows to consider the task of removing emotional features solved. Першим з них є алгоритм PSOLA (Pitch-Synchronous OverLap and Add), призначений для зміни частоти основного тону сигналу шляхом його розтягання та стискання відносно часової шкали [1]. Алгоритм PSOLA може бути використаний для зміщення частоти основного голосу, зберігаючи позиції формант, і, таким чином, – ідентичність голосних. Алгоритм PSOLA може бути використаний для зміщення частоти основного тону (ЧОТ) звуку голосу, зберігаючи позиції формант, і, таким чином, ідентичність голосних. У цьому випадку для кожної ЧОТ-мітки синтезу t k перший крок алгоритму синтезу, представлений вище, буде модифіковано як вибір відповідного i-го сегменту аналізу (ідентифікується міткою часу ti ), Рисунок 1 – PSOLA: результат застосування.

Max log
Σh використовуються наступні формули переоцінки w'
Бібліографічні посилання
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call