PurposeTraditional vocal fold pathology recognition typically requires expertise of laryngologists and advanced instruments, primarily through direct visualization. This study aims to augment this conventional paradigm by introducing a parallel diagnostic procedure. Our objective is to harness a machine-learning algorithm designed to discern intricate patterns within patients' voice recordings to distinguish not only between healthy and hoarse voices but also among various specific disorders. Materials and methodsWe employed a machine-learning algorithm, utilizing transfer learning on the HuBERT model with Saarbruecken Voice Database samples. The study was conducted in two stages: a binary classifier distinguishes healthy and hoarse voices, while a subsequent multi-class classifier identifies specific voice disorders. Data from 2103 sessions, including over 25,000 components, representing diverse pathologies and healthy individuals, was analyzed. The models were trained, validated, and tested with a focus on robustness and accuracy in diagnosis. ResultsThe binary classifier achieved 82 % accuracy in distinguishing healthy from pathological voices. The multi-class algorithm which aims to identify specific laryngeal disorders obtained the highest accuracy (>93 %) for Laryngeal Dystonia. Noteworthy is the persistent challenge posed by Laryngeal Dystonia, a condition lacking a definitive diagnostic modality. ConclusionsOur findings demonstrate the feasibility of utilizing machine-learning algorithms to process voice samples, categorizing them into distinct pathologies. This approach holds potential for enhance patient triage, streamline diagnostics, and elevate overall patient care. Particularly valuable for challenging diagnoses, such as Laryngeal Dystonia, this method underscores the transformative role of machine learning in optimizing healthcare practices.
Read full abstract7-days of FREE Audio papers, translation & more with Prime
7-days of FREE Prime access