Vocal Tract Research Articles

The evaluation of systems that claim to recognize emotions expressed by human beings is a contested and complex task: The early pioneers in this field gave the impression that these systems will eventually recognize a flash of anger, suppressed glee/happiness, momentary disgust or contempt, lurking fear, or sadness in someone’s face or voice (Picard and Klein, 2002; Schuller et al., 2011). Emotion recognition systems are trained on ‘labelled’ databases — collection of video/audio recording comprising images and voices of humans enacting one emotional state. Machine learning programmes then regress the pixel distributions or wave forms against the labels. The system is then said to have learnt how to recognize and interpret human emotions and rated using information science metrics. These systems are adopted by the world at large for applications ranging from autistic spectrum communications to teaching and learning, and onwards to covert surveillance. The training databases depend upon human emotions recorded in ideal conditions — faces looking at the camera and centrally located, voices articulated through noise-cancelling microphones. Yet there are reports that the posed training data set, that is racially-skewed and gender unbalanced, does not prepare these systems to cope with data-in-the-wild and that expression-unrelated variations (like illumination, head pose, and identity bias (Li and Deng, 2020)) can impact their performance as well. Deployments of these systems tend to adopt one or other and apply it to data collected outside laboratory conditions and use the resulting classifications in subsequent processing. We have devised a testing method that helps to quantify the similarities and differences of facial emotion recognition systems (FER) and speech emotion recognition systems (SER). We report on the development of a data base comprising videos and sound track of 64 politicians and 7 government spokespersons (25 F, 46 M; 34 White Europeans, 19 East Asians, and 18 South Asians), ranging in age from 32–85 years, and each of the 71 has on average three 180 s videos; a total of 16.66 h of data. We have compared the performance of two FERs (Emotient and Affectiva) and two SERs (OpenSmile and Vokaturi) on our data by analysing emotions reported by these systems on a frame-by-frame basis. We have analysed the directly observable head movements, and the indirectly observable muscle movement parts of the face and for the muscle movements in the vocal tract. There was marked disagreement in emotions recognized, and the differences were exacerbated more women than for men, and more for South and East Asians than for White Europeans. Levels of agreement and disagreement on both high-level (i.e. emotion labels) and lower-level features (e.g. Euler angles of head movement) are shown. We show that inter-system disagreement may also be used as an effective response variable in reasoning about data features that influence disagreement. We argue that reliability of subsequent processing in approaches that adopt these systems may be enhanced by restricting action to cases where systems agree within a given tolerance level. This paper may be considered as a foray into the greater debate about the so-called algorithmic (un)fairness and data bias in the development and deployment of machine learning systems of which FERs and SERs are a good exemplar.

Read full abstract

Objective One of the most voice-related complaints in teachers is vocal tract discomfort (VTD) which can increase the voice handicap index (VHI) to different degrees. In teachers Muscle Tension Dysphonia (MTD), increases voice complaints and decreases the voice-related quality of life. The aims of this study were to evaluate and compare the frequency and severity subscales of the VTD score and total score of VHI and determine the relation between them in primary school teachers with and without MTD. Materials & Methods This study was cross-sectional and was performed on 80 primary female teachers with and without MTD. The participants in both groups were randomly selected after checking the inclusion criteria. Voice history, auditory-perceptual assessment of voice, palpation, and laryngeal video-stroboscopy were assessed on the teachers in two groups. Then, asked the participants to complete the Persian versions of the VTD and VHI scales. After determining the normal distribution of the data using the Kolmogorov-Smirnov test, the results of VTD, VHI, and their subscales were compared between two groups with an independent t-test. Also, the relationship between them was assessed with the Pearson correlation coefficient analysis. Results The mean score of the frequency subscale of VTD in teachers with MTD was more than in teachers without MTD (30.17±5.11, 8.22±2.26), respectively. The score of severity subscales of VTD in teachers with MTD was 39.12±4.94 and in teachers without MTD was 7.89±2.13. Also, the total score of the VHI questionnaire in teachers with MTD was significantly higher than teachers without MTD (P<0.05). Moreover, there was a significant positive correlation between the frequency and severity subscales of VTD and the total score of VHI in the two groups (P<0.05). Conclusion This study showed the frequency and severity scores of the VTD and the degree of VHI experienced by MTD are remarkably higher than teachers without MTD. The authors emphasize on the use of VTD and VHI scales in screening teachers who are at risk of voice problems, and the researchers pointed out the importance of paying attention to the reduction of discomfort feelings in the vocal tract and voice handicap in the voice therapy sessions of teachers with MTD.

Read full abstract

Vocal Tract Research Articles

Related Topics

Articles published on Vocal Tract

An open-source toolbox for measuring vocal tract shape from real-time magnetic resonance images

Mechanisms of sensorimotor adaptation in a hierarchical state feedback control model of speech.

Multimodal impressions of voice quality settings: the role of vocal and visual symbolism

Real-Time Vocal Tract Model for Elongation of Segment Lengths in a Waveguide Model

Transfer Characteristics of Vocal Tract Closed by Mask Cavity

Influence of Overpressure Breathing on Vowel Formant Frequencies

Non-uniform Rectilinear Grid in the Waveguide Modeling of the Vocal Tract

Monitoring the Effect of Levodopa Using Sustained Phonemes in Parkinson's Disease Patients.

The Change of Vocal Tract Length in People with Parkinson's Disease.

Bora's high vowels involve a two-way dental contrast, not a three-way backness contrast

Deriving Vocal Fold Oscillation Information from Recorded Voice Signals Using Models of Phonation.

MALip: Modal Amplification Lipreading based on reconstructed audio features

Agreement and disagreement between major emotion recognition systems

Penguins perceive variations of source- and filter-related vocal parameters of species-specific vocalisations.

Comparison and Relation Between Vocal Tract Discomfort and Voice Handicap Index in Teachers With and Without Muscle Tension Dysphonia

Evaluation of Immediate Effects of Straw Phonation Exercise and Determination of the Ideal Performance Time in Healthy Adults

Comparison of Aerosol Emissions during Specific Speech Tasks

Comparison of Remote-Audiovisual Teaching and Face-to-Face Teaching on Factors Causing Teacher’s Voice Problems and Subjective Voice Evaluation

Estimating Sample Area Functions of Human Vocal Tracts in Emotional Speech Signals

Image Cryptosystem in Optical Gyrator Transform Domain Using Audio Keys

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Vocal Tract Research Articles

Related Topics

Articles published on Vocal Tract

An open-source toolbox for measuring vocal tract shape from real-time magnetic resonance images

Mechanisms of sensorimotor adaptation in a hierarchical state feedback control model of speech.

Multimodal impressions of voice quality settings: the role of vocal and visual symbolism

Real-Time Vocal Tract Model for Elongation of Segment Lengths in a Waveguide Model

Transfer Characteristics of Vocal Tract Closed by Mask Cavity

Influence of Overpressure Breathing on Vowel Formant Frequencies

Non-uniform Rectilinear Grid in the Waveguide Modeling of the Vocal Tract

Monitoring the Effect of Levodopa Using Sustained Phonemes in Parkinson's Disease Patients.

The Change of Vocal Tract Length in People with Parkinson's Disease.

Bora's high vowels involve a two-way dental contrast, not a three-way backness contrast

Deriving Vocal Fold Oscillation Information from Recorded Voice Signals Using Models of Phonation.

MALip: Modal Amplification Lipreading based on reconstructed audio features

Agreement and disagreement between major emotion recognition systems

Penguins perceive variations of source- and filter-related vocal parameters of species-specific vocalisations.

Comparison and Relation Between Vocal Tract Discomfort and Voice Handicap Index in Teachers With and Without Muscle Tension Dysphonia

Evaluation of Immediate Effects of Straw Phonation Exercise and Determination of the Ideal Performance Time in Healthy Adults

Comparison of Aerosol Emissions during Specific Speech Tasks

Comparison of Remote-Audiovisual Teaching and Face-to-Face Teaching on Factors Causing Teacher’s Voice Problems and Subjective Voice Evaluation

Estimating Sample Area Functions of Human Vocal Tracts in Emotional Speech Signals

Image Cryptosystem in Optical Gyrator Transform Domain Using Audio Keys