Structure of information technology for the removal of emotional signs from a speech signal

O P Lutsenko,Yu M Ponomaryova

doi:10.15421/431805

Abstract

The goal of the article is to describe the methods and algorithms of information technology for removal of emotional signs from the speech signal which has been proposed by the authors. Methods of extracting and transforming emotional features, as well as methods of classification of speech signals obtained after transformation are described. The task of removing emotions will be considered successful if, after the transformations, the emotion classifier assigns a speech signal to a class of emotionless speech. The scheme for solving the problem consists of the following steps: 1) extracting informative features and establishing the boundaries of conversion; 2) transforming input data informative features; 3) checking the effectiveness of the transformation. The first algorithm used for transformation is the PSOLA (Pitch-Synchronous OverLap and Add) algorithm designed to change the pitch of the main tone by stretching and compressing the signal on the timeline. The PSOLA algorithm can be used to shift the pitch of voice, retaining the formant position, and thus the identity of the vowels. The basic idea is to extend the time on pitch marks, while the shape of the audio wave of the segment does not change. The second method of transformation, the normalization of the dynamic range of the logarithm of energy, reduces the distribution of energy due to different levels of background noise [2]. The input parameter for the energy normalization unit is the frame energy. Energy is replaced by the energy logarithm based on the similarity to the logarithmic perception of sound by the human auditory system, and also due to the stability of the logarithm to the sharp changes in the energy level. To test the effectiveness of the transformation, the task of recognizing emotions in a language requires the building of a classifier. The input data for classifying and recognizing were emotional expressions, each of which was represented by a set of signs: the frequency of the main tone, the energy and the frequency of crossing zero. Global signs of the minimum, the maximum and the average of each of the sets of values are the vector of expressions. For the classification a model of a mixture of normal sections was used. Using the constructed statistical model of the speech signal, the a posteriori probability of belonging of the utterance to the emotionally neutral class was calculated. The output speech signal meets the set criterion of a successful solution of the problem as formal (result of classification) and perceptual signs, which allows to consider the task of removing emotional features solved.

Highlights

Приводится структура и алгоритмы предложенной авторами информационной технологии удаления эмоциональных признаков с речевого сигнала
The goal of the article is to describe the methods and algorithms of information technology for removal of emotional signs from the speech signal which has been proposed by the authors
Methods of extracting and transforming emotional features, as well as methods of classification of speech signals obtained after transformation are described

Summary

СТРУКТУРА ІНФОРМАЦІЙНОЇ ТЕХНОЛОГІЇ ВИДАЛЕННЯ ЕМОЦІЙНИХ ОЗНАК З МОВНОГО

Наведено структуру та алгоритмів запропонованої авторами інформаційної технології видалення емоційних ознак з мовного сигналу. The output speech signal meets the set criterion of a successful solution of the problem as formal (result of classification) and perceptual signs, which allows to consider the task of removing emotional features solved. Першим з них є алгоритм PSOLA (Pitch-Synchronous OverLap and Add), призначений для зміни частоти основного тону сигналу шляхом його розтягання та стискання відносно часової шкали [1]. Алгоритм PSOLA може бути використаний для зміщення частоти основного голосу, зберігаючи позиції формант, і, таким чином, – ідентичність голосних. Алгоритм PSOLA може бути використаний для зміщення частоти основного тону (ЧОТ) звуку голосу, зберігаючи позиції формант, і, таким чином, ідентичність голосних. У цьому випадку для кожної ЧОТ-мітки синтезу t k перший крок алгоритму синтезу, представлений вище, буде модифіковано як вибір відповідного i-го сегменту аналізу (ідентифікується міткою часу ti ), Рисунок 1 – PSOLA: результат застосування.

Max log

Σh використовуються наступні формули переоцінки w'

Бібліографічні посилання

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Structure of information technology for the removal of emotional signs from a speech signal

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Actual problems of automation and information technology

Lead the way for us

Journal: Actual problems of automation and information technology	Publication Date: Dec 1, 2018
License type: cc-by

Similar Papers

Lip Contact Pressure While Talking in Background Noise
Greg Turner ... Robert De Jonge
Perspectives of the ASHA Special Interest Groups | VOL. 1
Greg Turner, et. al.Greg Turner ... Robert De Jonge
31 Mar 2016
Perspectives of the ASHA Special Interest Groups | VOL. 1

Effect of Noise on Speech Intelligibility and Perceived Listening Effort in Head and Neck Cancer.
Tanya L Eadie ... Kristie A Spencer
American journal of speech-language pathology | VOL. 30
Tanya L Eadie, et. al.Tanya L Eadie ... Kristie A Spencer
25 Feb 2021
American journal of speech-language pathology | VOL. 30

Hearing loss impacts neural alpha oscillations under adverse listening conditions.
Eline B Petersen ... Thomas Lunner
Frontiers in Psychology | VOL. 6
Eline B Petersen, et. al.Eline B Petersen ... Thomas Lunner
19 Feb 2015
Frontiers in Psychology | VOL. 6

Does the signal-to-noise ratio of an interlocutor influence a speaker's vocal intensity?
Rebecca S Tweedy ... John F Culling
Computer Speech & Language | VOL. 28
Rebecca S Tweedy, et. al.Rebecca S Tweedy ... John F Culling
05 Jul 2013
Computer Speech & Language | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Structure of information technology for the removal of emotional signs from a speech signal

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Actual problems of automation and information technology