Abstract

This paper addresses voice disorder assessment. It proposes an original back-and-forth methodology involving an automatic classification system as well as knowledge of the human experts (machine learning experts, phoneticians, and pathologists). The goal of this methodology is to bring a better understanding of acoustic phenomena related to dysphonia. The automatic system was validated on a dysphonic corpus (80 female voices), rated according to the GRBAS perceptual scale by an expert jury. Firstly, focused on the frequency domain, the classification system showed the interest of 0–3000 Hz frequency band for the classification task based on the GRBAS scale. Later, an automatic phonemic analysis underlined the significance of consonants and more surprisingly of unvoiced consonants for the same classification task. Submitted to the human experts, these observations led to a manual analysis of unvoiced plosives, which highlighted a lengthening of VOT according to the dysphonia severity validated by a preliminary statistical analysis.

Highlights

  • Assessment of voice quality is a key point for establishing telecommunication standards as well as for medical area linked to speech and voice disorders

  • Voice quality assessment is mainly addressed at the perceptual level using the Mean Opinion Score (MOS) scale [1] standardized by the International Telecommunication Union (ITU)

  • Abnormal Voice onset time (VOT) has been studied for second-language learning [41], aphasia or apraxia of speech [42], dysarthria [43], stuttering speech [44], dysphagia [45], spasmodic dysphonia [46]

Read more

Summary

Introduction

Assessment of voice quality is a key point for establishing telecommunication standards as well as for medical area linked to speech and voice disorders. If the PESQ (and its extensions) is well suited for the telecommunication field, it requires parallel audio records without and with noise disturbance to evaluate voice quality. This constraint is impossible to satisfy in the medical/pathological area. Independently of this difference it is interesting to notice that the MOS/PESQ is estimated at the perceptual level and that there is no analytical description of information at the acoustic or phonetic field characterizing a given level of quality. The human subjective perception is used as a baseline (MOS) and an automatic approach (PESQ) is used to match some signal differences with the MOS scale

Objectives
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call