Abstract

Laryngeal high-speed videoendoscopy (LHSV) is an imaging technique offering novel visualization quality of the vibratory activity of the vocal folds. However, in most image analysis methods, the interaction of the medical personnel and access to ground truth annotations are required to achieve accurate detection of vocal folds edges. In our fully automatic method, we combine video and acoustic data that are synchronously recorded during the laryngeal endoscopy. We show that the image segmentation algorithm of the glottal area can be optimized by matching the Fourier spectra of the pre-processed video and the spectra of the acoustic recording during the phonation of sustained vowel /i:/. We verify our method on a set of LHSV recordings taken from subjects with normophonic voice and patients with voice disorders due to glottal insufficiency. We show that the computed geometric indices of the glottal area make it possible to discriminate between normal and pathologic voices. The median of the Open Quotient and Minimal Relative Glottal Area values for healthy subjects were 0.69 and 0.06, respectively, while for dysphonic subjects were 1 and 0.35, respectively. We also validate these results using independent phoniatrician experts.

Highlights

  • Regular assessment of the health of the human voice is important for the accurate detection of voice disorders with varied etiology

  • We demonstrate that it is effective in segmenting images of the vibrating vocal folds, and phoniatricians positively evaluated our results

  • During the Laryngeal high-speed videoendoscopy (LHSV) recordings, the requirement was to record the voice signal simultaneously during phonation of vowel /i:/. Both the video and acoustic recordings were pre-processed according to the procedures described in Section 4 to make them suitable for computing the Fourier spectra, i.e., the pool of glottovibrograms was computed for a set of candidate segmentation parameters (α, β) and the acoustic recordings were down-sampled to match the sampling rate of the signal to the frame rate of the LHSVs

Read more

Summary

Introduction

Regular assessment of the health of the human voice is important for the accurate detection of voice disorders with varied etiology. Exposure to the risk factors of voice disorders is increasing in the contemporary world. It is estimated that about a third of workers in industrialized societies use voice as their main work tool. UK figures report that over five million workers are routinely affected by voice impairment, at an annual cost of around £200 million [1]. Constant advancements in technology and virtualization of life have rendered voice crucial for communication, in the case of individuals for whom it is a primary tool of trade and who are exposed to excessive vocal loading, e.g., actors, singers, coaches, teachers, call-center workers, etc. Professional voice users report to otolaryngological and phoniatric outpatient clinics with common problems

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call