Abstract

The feasibility of age estimation is explored using the ultrasound tongue image of the speakers. Motivated by the success of deep learning, a deep convolutional neural network model is trained on the UltraSuite dataset. The deep model achieves mean absolute error (MAE) of 2.03 years for the data from typically developing children, while MAE is 4.87 for the data from the children with speech sound disorders, which suggest that age estimation using ultrasound is more challenging for the children with speech sound disorder. Also, we explore to visualize what does the deep model learn for the age estimation task. We firstly visualize the convolutional layers in the learned convolutional neural networks. We observe that the deep model not only focuses on the contour in the ultrasound tongue image, but also pays more attention to the regions corresponding to the tendon and tongue root regions, which may provide guidance for future ultrasound tongue imaging interpretation tasks. The developed method can be used a tool to evaluate the performance of speech therapy sessions.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.