Abstract

The purpose of this study was to investigate the feasibility of using neck-surface acceleration signals to discriminate between modal, breathy and pressed voice. Voice data for five English single vowels were collected from 31 female native Canadian English speakers using a portable Neck Surface Accelerometer (NSA) and a condenser microphone. Firstly, auditory-perceptual ratings were conducted by five clinically-certificated Speech Language Pathologists (SLPs) to categorize voice type using the audio recordings. Intra- and inter-rater analyses were used to determine the SLPs’ reliability for the perceptual categorization task. Mixed-type samples were screened out, and congruent samples were kept for the subsequent classification task. Secondly, features such as spectral harmonics, jitter, shimmer and spectral entropy were extracted from the NSA data. Supervised learning algorithms were used to map feature vectors to voice type categories. A feature wrapper strategy was used to evaluate the contribution of each feature or feature combinations to the classification between different voice types. The results showed that the highest classification accuracy on a full set was 82.5%. The breathy voice classification accuracy was notably greater (approximately 12%) than those of the other two voice types. Shimmer and spectral entropy were the best correlated metrics for the classification accuracy.

Highlights

  • Voice quality describes a wide range of multifaceted perceptual characteristics of human voice [1].One of these characteristics is the voice type

  • Results suggested that the Support Vector Machine (SVM), K-Nearest Neighbours (KNN) and Neural Network (NN) classifiers were more sensitive to the variation of the feature set than the other two classifiers (DT and Linear Discriminant (LD))

  • The spectral envelopes of the Neck Surface Accelerometer (NSA) signals were notably different for each voice type

Read more

Summary

Introduction

Voice quality describes a wide range of multifaceted perceptual characteristics of human voice [1]. One of these characteristics is the voice type. Modal and pressed voice have been viewed on a continuum paradigm of phonation in terms of vocal fold contact area and open quotient [2]. Electroglottalgraph waveforms have shown that breathy voice featured a small Vocal Fold (VF) contact area and a large open quotient, and implied a low laryngeal resistance [3]. Pressed voice displayed opposite trends [3]. Methods of voice type classification may broadly be subdivided into two categories:

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call