Abstract

This study proposes two multimodal frameworks to classify pathological voice samples by combining acoustic signals and medical records. In the first framework, acoustic signals are transformed into static supervectors via Gaussian mixture models; then, a deep neural network (DNN) combines the supervectors with the medical record and classifies the voice signals. In the second framework, both acoustic features and medical data are processed through first-stage DNNs individually; then, a second-stage DNN combines the outputs of the first-stage DNNs and performs classification. Voice samples were recorded in a specific voice clinic of a tertiary teaching hospital, including three common categories of vocal diseases, i.e. glottic neoplasm, phonotraumatic lesions, and vocal paralysis. Experimental results demonstrated that the proposed framework yields significant accuracy and unweighted average recall (UAR) improvements of 2.02–10.32% and 2.48–17.31%, respectively, compared with systems that use only acoustic signals or medical records. The proposed algorithm also provides higher accuracy and UAR than traditional feature-based and model-based combination methods.

Highlights

  • Deep learning technology has shown excellent performance in a wide variety of practical applications

  • In hybrid GMM and DNN (HGD), the acoustic signals are first modeled by a Gaussian mixture model (GMM), and the means of the GMM are concatenated to form a supervector for feature combination instead of using MFCC+delta in one-stage DNN (OSD)

  • This study focuses on three typical voice disorders including phonotraumatic lesions, glottic neoplasm, and unilateral vocal paralysis

Read more

Summary

INTRODUCTION

Deep learning technology has shown excellent performance in a wide variety of practical applications (e.g. energy [1, 2], aviation [3, 4], software [5, 6], traffic [7,8,9,10], etc). Previous studies had already accomplished the detection of disease of abnormal status using one of the above-mentioned biomedical features, using two or more categories of features had rarely been attempted before To our knowledge, this is the first study to classify voice disorders based on acoustic signals and medical history, which brings great advancements to both modeling techniques and clinical practicability. We integrate a more comprehensive data set including demographics, medical history, clinical symptoms, and acoustic signals from dysphonic patients to examine if multimodal learning can be applied to classify common voice disorders. To the best of our knowledge, this is the first study combining both acoustic signals and patient-provided information in the task of computerized classification of voice disorders

PATHOLOGICAL VOICE CLASSIFICATION FRAMEWORKS
EXPERIMENTS AND RESULTS
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call