Tongue model construction based on ultrasound images with image processing and deep learning method.

Nobuhiko Mukai,Kimie Mori,Yoshiko Takei

doi:10.1007/s10396-022-01193-8

Abstract

The purpose of this paper is to construct a 3D tongue model and to generate an animation of tongue movement for speech therapy in patients with lateral articulation (LA). The 3D tongue model is generated based on ultrasound (US) images, which are widely used in many clinics. A tongue model is constructed by extracting the tongue surfaces from US images with the help of image processing techniques and a deep learning method. A reference tongue model is generated first using US images of a normal speaker, and a model of an LA patient is then constructed by modifying the reference tongue model. An animation of the tongue movement is generated by deforming the model according to a time sequence. The accuracy of the tongue surfaces estimated by a deep learning method were 22/45 = 49% and 29/45 = 64% for US images of a normal speaker and an LA patient, respectively. In addition, the maximum vertical errors between the ground truth and the estimated spline curves were 1.01 and 1.03 mm for US images of a normal speaker and an LA patient, respectively. We have constructed a tongue model and generated a tongue movement animation of an LA patient using US images. The maximum vertical error between the ground truth and the estimated spline curves was only 1.03 mm, and we have confirmed that the generated tongue model is very useful for speech therapy in LA patients.

Full Text