Abstract

BackgroundCleft palate patients have inability to produce adequate velopharyngeal closure, which results in hypernasal speech. In clinic, hypernasal speech is assessed through subject assessment by speech language pathologists. Automatic hypernasal speech detection can provide aided diagnoses for speech language pathologists and clinicians. ObjectivesThis study aims to develop Long Short-Term Memory (LSTM) based Deep Recurrent Neural Network (DRNN) system to detect hypernasal speech from cleft palate patients, thus to provide aided diagnoses for clinical operation and speech therapy. Meanwhile, the feature mining and classification abilities of LSTM-DRNN system are explored. MethodsThe utilized speech recordings are 14,544 vowels in Mandarin. Speech data is collected from 144 children (72 children with hypernasality and 72 controls) with the age of 5–12 years old. This work proposes a LSTM based DRNN system to achieve automatic hypernasal speech detection, since LSTM-DRNN can learn short-time dependences of hypernasal speech. The vocal tract based features are fed into LSTM-DRNN to achieve deep mining of features. To verify the feature mining ability of LSTM-DRNN, features projected by LSTM-DRNN are fed into shallow classifiers instead of the following two fully connected layers and a softmax layer. And the features without the projecting process of LSTM-DRNN are directly fed into shallow classifiers as a comparison. Hypernasality-sensitive vowels (/a/, /i/, and /u/) are analyzed for the first time. ResultsThis LSTM-DRNN based hypernasal speech detection method reaches higher detection accuracy than that using shallow classifiers, since LSTM-DRNN mines features through time axis and network depth simultaneously. The proposed LSTM-DRNN based hypernasality detection system reaches the highest accuracy of 93.35%. According to the analysis of hypernasality-sensitive vowels, the experimental result concludes that vowels /i/ and /u/ are the most sensitive vowels to hypernasal speech. ConclusionsThe results show that LSTM-DRNN has robust feature mining ability and classification ability. This is the first work that applies the LSTM-DRNN technique to automatically detect hypernasality in cleft palate speech. The experimental results demonstrate the potential of deep learning on pathologist speech detection.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.