Voice Database Research Articles

OBJECTIVESThere is currently a lack of objective treatment outcome measures for transgender individuals undergoing gender-affirming voice care. Recently, Bensoussan et al. developed an AI model that is able to generate a voice femininity rating based on a short voice sample provided through a smartphone application. The purpose of this study was to examine the feasibility of using this model as a treatment outcome measure by comparing its performance to human listeners. Additionally, we examined the effect of two different training datasets on the model’s accuracy and performance when presented with external data. METHODS100 voice recordings from 50 cisgender males and 50 cisgender females were retrospectively collected from patients presenting at a university voice clinic for reasons other than dysphonia. The recordings were evaluated by expert and naïve human listeners, who rated each voice based on how sure they were the voice belonged to a female speaker (% voice femininity [R]). Human ratings were compared to ratings generated by (1) the AI model trained on a high-quality low-quantity dataset (voices from the Perceptual Voice Quality Database) (PVQD model), and (2) the AI model trained on a low-quality high-quantity dataset (voices from the Mozilla Common Voice database) (Mozilla model). Ambiguity scores were calculated as the absolute value of the difference between the rating and certainty (0 or 100%). RESULTSBoth expert and naïve listeners achieved 100% accuracy in identifying voice gender based on a binary classification (female >50% voice femininity [R]). In comparison, the Mozilla-trained model achieved 92% accuracy and the previously published PVQD model achieved 84% accuracy in determining voice gender (female > 50% AI voice femininity). While both AI models correlated with human ratings, the Mozilla-trained model showed a stronger correlation as well as lower overall rating ambiguity than the PVQD-trained model. The Mozilla model also appeared to handle pitch information in a similar way to human raters. CONCLUSIONSThe AI model predicted voice gender with high accuracy when compared to human listeners and has potential as a useful outcome measure for transgender individuals receiving gender-affirming voice training. The Mozilla-trained model performed better than the PVQD-trained model, indicating that for binary classification tasks, quantity of data may influence accuracy more than quality of the data used for training the voice AI models.

Read full abstract

In this paper we evaluate the hypothesis that automated methods for diagnosis of voice disorders from speech recordings would benefit from contextual information found in continuous speech. Rather than basing a diagnosis on how disorders affect the average acoustic properties of the speech signal, the idea is to exploit the possibility that different disorders will cause different acoustic changes within different phonetic contexts. Any differences in the pattern of effects across contexts would then provide additional information for discrimination of pathologies. We evaluate this approach using two complementary studies: the first uses a short phrase which is automatically annotated using a phonetic transcription, the second uses a long reading passage which is automatically annotated from text. The first study uses a single sentence recorded from 597 speakers in the Saarbrucken Voice Database to discriminate structural from neurogenic disorders. The results show that discrimination performance for these broad pathology classes improves from 59% to 67% unweighted average recall when classifiers are trained for each phone-label and the results fused. Although the phonetic contexts improved discrimination, the overall sensitivity and specificity of the method seems insufficient for clinical application. We hypothesise that this is because of the limited contexts in the speech audio and the heterogeneous nature of the disorders. In the second study we address these issues by processing recordings of a long reading passage obtained from clinical recordings of 60 speakers with either Spasmodic Dysphonia or Vocal fold Paralysis. We show that discrimination performance increases from 80% to 87% unweighted average recall if classifiers are trained for each phone-labelled region and predictions fused. We also show that the sensitivity and specificity of a diagnostic test with this performance is similar to other diagnostic procedures in clinical use. In conclusion, the studies confirm that the exploitation of contextual differences in the way disorders affect speech improves automated diagnostic performance, and that automated methods for phonetic annotation of reading passages are robust enough to extract useful diagnostic information.

Read full abstract

Voice Database Research Articles

Related Topics

Articles published on Voice Database

Machine Learning Models With Hyperparameter Optimization for Voice Pathology Classification on Saarbrücken Voice Database.

Harnessing machine learning in diagnosing complex hoarseness cases

Optimized early fusion of handcrafted and deep learning descriptors for voice pathology detection and classification

How well do acoustic parameters correlate our perception of voice quality?

Multifeature Fusion Method with Metaheuristic Optimization for Automated Voice Pathology Detection

Tracking age-related changes in voice and speech production with Landmark-based analysis of speech.

Pathological voice classification using MEEL features and SVM-TabNet model

Using Vocal-Based Emotions as a Human Error Prevention System with Convolutional Neural Networks

AROA based Pre-trained Model of Convolutional Neural Network for Voice Pathology Detection and Classification

Automated Acoustic Evaluation of Voice Disorders: A Comprehensive Study on Parameter Analysis Using ANN

Diagnosis of pathological speech with streamlined features for long short-term memory learning

Vocal Tract Acoustic Measurements for Detection of Pathological Voice Disorders

Validation of an AI-assisted Treatment Outcome Measure for Gender-Affirming Voice Care: Comparing AI Accuracy to Listener’s Perception of Voice Femininity

PVGAN: A Pathological Voice Generation Model Incorporating a Progressive Nesting Strategy

Unraveling the complexities of pathological voice through saliency analysis

Validation of scrambling methods for vocal affect bursts

End-to-end deep learning classification of vocal pathology using stacked vowels.

Automated voice pathology discrimination from audio recordings benefits from phonetic analysis of continuous speech

A Smart Image Encryption Technology via Applying Personal Information and Speaker-Verification System

Multiple voice disorders in the same individual: Investigating handcrafted features, multi-label classification algorithms, and base-learners

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Voice Database Research Articles

Related Topics

Articles published on Voice Database

Machine Learning Models With Hyperparameter Optimization for Voice Pathology Classification on Saarbrücken Voice Database.

Harnessing machine learning in diagnosing complex hoarseness cases

Optimized early fusion of handcrafted and deep learning descriptors for voice pathology detection and classification

How well do acoustic parameters correlate our perception of voice quality?

Multifeature Fusion Method with Metaheuristic Optimization for Automated Voice Pathology Detection

Tracking age-related changes in voice and speech production with Landmark-based analysis of speech.

Pathological voice classification using MEEL features and SVM-TabNet model

Using Vocal-Based Emotions as a Human Error Prevention System with Convolutional Neural Networks

AROA based Pre-trained Model of Convolutional Neural Network for Voice Pathology Detection and Classification

Automated Acoustic Evaluation of Voice Disorders: A Comprehensive Study on Parameter Analysis Using ANN

Diagnosis of pathological speech with streamlined features for long short-term memory learning

Vocal Tract Acoustic Measurements for Detection of Pathological Voice Disorders

Validation of an AI-assisted Treatment Outcome Measure for Gender-Affirming Voice Care: Comparing AI Accuracy to Listener’s Perception of Voice Femininity

PVGAN: A Pathological Voice Generation Model Incorporating a Progressive Nesting Strategy

Unraveling the complexities of pathological voice through saliency analysis

Validation of scrambling methods for vocal affect bursts

End-to-end deep learning classification of vocal pathology using stacked vowels.

Automated voice pathology discrimination from audio recordings benefits from phonetic analysis of continuous speech

A Smart Image Encryption Technology via Applying Personal Information and Speaker-Verification System

Multiple voice disorders in the same individual: Investigating handcrafted features, multi-label classification algorithms, and base-learners