Abstract

Objective measurement of speech intelligibility is a challenge when working with speech-impaired patients. Speech intelligibility scores (the average transcription accuracy across a set of words or sentences by a listener) are a common way of assessing disordered speech. Human-based measurements are less than ideal due to individual differences in listening ability, the time it takes to collect the measures, and other challenges. The present study investigates deep neural networks for fast, automatic, and objective speech intelligibility scoring of head-and-neck cancer patients. We assessed models using the raw acoustic signal as the input to a network with multiple convolutional layers. It is believed that when the raw acoustic signal is used as the input, a convolutional network acts as a filter bank optimized for intelligibility scoring. We report the model accuracy results of repeated training, testing, and comparison of different model structures. Further, we compare the results using a 10-fold cross-validation approach and report the average correlation between the predicted and actual values.Objective measurement of speech intelligibility is a challenge when working with speech-impaired patients. Speech intelligibility scores (the average transcription accuracy across a set of words or sentences by a listener) are a common way of assessing disordered speech. Human-based measurements are less than ideal due to individual differences in listening ability, the time it takes to collect the measures, and other challenges. The present study investigates deep neural networks for fast, automatic, and objective speech intelligibility scoring of head-and-neck cancer patients. We assessed models using the raw acoustic signal as the input to a network with multiple convolutional layers. It is believed that when the raw acoustic signal is used as the input, a convolutional network acts as a filter bank optimized for intelligibility scoring. We report the model accuracy results of repeated training, testing, and comparison of different model structures. Further, we compare the results using a 10-fold cross...

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call