Abstract

Bio-acoustic properties of speech show evolving value in analyzing psychiatric illnesses. Obtaining a sufficient speech sample length to quantify these properties is essential, but the impact of sample duration on the stability of bio-acoustic features has not been systematically explored. We aimed to evaluate bio-acoustic features' reproducibility against changes in speech durations and tasks. We extracted source, spectral, formant, and prosodic features in 185 English-speaking adults (98 w, 87 m) for reading-a-story and counting tasks. We compared features at 25% of the total sample duration of the reading task to those obtained from non-overlapping randomly selected sub-samples shortened to 75%, 50%, and 25% of total duration using intraclass correlation coefficients. We also compared the features extracted from entire recordings to those measured at 25% of the duration and features obtained from 50% of the duration. Further, we compared features extracted from reading-a-story to counting tasks. Our results show that the number of reproducible features (out of 125) decreased stepwise with duration reduction. Spectral shape, pitch, and formants reached excellent reproducibility. Mel-frequency cepstral coefficients (MFCCs), loudness, and zero-crossing rate achieved excellent reproducibility only at a longer duration. Reproducibility of source, MFCC derivatives, and voicing probability (VP) was poor. Significant gender differences existed in jitter, MFCC first-derivative, spectral skewness, pitch, VP, and formants. Around 97% of features in both genders were not reproducible across speech tasks, in part due to the short counting task duration. In conclusion, bio-acoustic features are less reproducible in shorter samples and are affected by gender.

Highlights

  • H UMAN speech produces acoustic waves that carry information about the speaker’s gender, physiological condition, and pathophysiological state [1]

  • In our study, when men and women were analyzed separately, we found that significant differences in the correlation analysis of some speech properties, including jitter, Mel-frequency cepstral coefficients (MFCCs) delta, SS, pitch, voicing probability (VP), and formants, suggesting that the pattern of reliable markers may be different across gender

  • This study has examined the effect of speech duration and speech task on the reproducibility of bio-acoustic qualities

Read more

Summary

Introduction

H UMAN speech produces acoustic waves that carry information about the speaker’s gender, physiological condition, and pathophysiological state [1] These waves are generated when the mechanical vibration of vocal folds, affected by aerodynamic factors, are converted into acoustic energy (acoustic source signal). The ability to control articulatory and phonatory speech processes is affected by neuro-physiological changes in the brain associated with the speaker’s mental state. Such changes are encoded into acoustic speech signals and quantified through bio-acoustic qualities such as source, spectral, prosodic, and formants properties [5]–[7]. Formants are spectral peaks representing the vocal tract’s resonance frequencies and capture essential spectral characteristics for speech analysis [14]

Objectives
Methods
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call