A Deep Learning Approach for Quantifying Vocal Fold Dynamics During Connected Speech Using Laryngeal High-Speed Videoendoscopy.

Ahmed M Yousef,Dimitar D Deliyski,Stephanie R C Zacharias,Alessandro De Alarcon,Robert F Orlikoff,Maryam Naghibolhosseini

doi:10.1044/2022_jslhr-21-00540

Abstract

Voice disorders are best assessed by examining vocal fold dynamics in connected speech. This can be achieved using flexible laryngeal high-speed videoendoscopy (HSV), which enables us to study vocal fold mechanics with high temporal details. Analysis of vocal fold vibration using HSV requires accurate segmentation of the vocal fold edges. This article presents an automated deep-learning scheme to segment the glottal area in HSV from which the glottal edges are derived during connected speech. Using a custom-built HSV system, data were obtained from a vocally healthy participant reciting the "Rainbow Passage." A deep neural network was designed for glottal area segmentation in the HSV data. A recently introduced hybrid approach by the authors was utilized as an automated labeling tool to train the network on a set of HSV frames, where the glottis region was automatically annotated during vocal fold vibrations. The network was then tested against manually segmented frames using different metrics, intersection over union (IoU), and Boundary F1 (BF) score, and its performance was assessed on various phonatory events on the HSV sequence. The designed network was successfully trained using the hybrid approach, without the need for manual labeling, and tested on the manually labeled data. The performance metrics showed a mean IoU of 0.82 and a mean BF score of 0.96. In addition, the evaluation assessment of the network's performance demonstrated an accurate segmentation of the glottal edges/area even during complex nonstationary phonatory events and when vocal folds were not vibrating, thus overcoming the limitations of the previous hybrid approach that could only be applied to the vibrating vocal folds. The introduced automated scheme guarantees accurate glottis representation in challenging color HSV data with lower image quality and excessive laryngeal maneuvers during all instances of connected speech. This facilitates the future development of HSV-based measures to assess the running vibratory characteristics of the vocal folds in speakers with and without voice disorder. https://doi.org/10.23641/asha.19798864.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Deep Learning Approach for Quantifying Vocal Fold Dynamics During Connected Speech Using Laryngeal High-Speed Videoendoscopy.

Abstract

Talk to us

Similar Papers

More From: Journal of speech, language, and hearing research : JSLHR

Lead the way for us

Journal: Journal of speech, language, and hearing research : JSLHR	Publication Date: May 23, 2022
Citations: 16

Similar Papers

Deep-Learning-Based Representation of Vocal Fold Dynamics in Adductor Spasmodic Dysphonia during Connected Speech in High-Speed Videoendoscopy
Ahmed M Yousef ... Maryam Naghibolhosseini
Journal of Voice | VOL. -
Ahmed M Yousef, et. al.Ahmed M Yousef ... Maryam Naghibolhosseini
01 Sep 2022
Journal of Voice | VOL. -

Detection of Vocal Fold Image Obstructions in High-Speed Videoendoscopy During Connected Speech in Adductor Spasmodic Dysphonia: A Convolutional Neural Networks Approach
Ahmed M Yousef ... Maryam Naghibolhosseini
Journal of Voice | VOL. 38
Ahmed M Yousef, et. al.Ahmed M Yousef ... Maryam Naghibolhosseini
16 Mar 2022
Journal of Voice | VOL. 38

Spatial Segmentation for Laryngeal High-Speed Videoendoscopy in Connected Speech
Ahmed M Yousef ... Maryam Naghibolhosseini
Journal of Voice | VOL. 37
Ahmed M Yousef, et. al.Ahmed M Yousef ... Maryam Naghibolhosseini
27 Nov 2020
Journal of Voice | VOL. 37

Studying the glottal vibration onset and offset using laryngeal high-speed videoendoscopy in connected speech
Maryam Naghibolhosseini ... Stephanie R Zacharias
The Journal of the Acoustical Society of America | VOL. 153
Maryam Naghibolhosseini, et. al.Maryam Naghibolhosseini ... Stephanie R Zacharias
01 Mar 2023
The Journal of the Acoustical Society of America | VOL. 153

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Deep Learning Approach for Quantifying Vocal Fold Dynamics During Connected Speech Using Laryngeal High-Speed Videoendoscopy.

Abstract

Talk to us

Similar Papers

More From: Journal of speech, language, and hearing research : JSLHR