Abstract

Ultrasound imaging is becoming a practical tool in silent speech recognition. It is a challenge to accurately extract tongue contours due to the soft tissue characteristics of the tongue and the high level of speckle noise in ultrasound images. Based on the U-Net network, an improved network called wUnet is proposed to extract the contour of the ultrasonic tongue. First, upward interlevel jump connections are added to the coding network to extract sufficient tongue contour features at different levels to learn more coarse-grained information. Second, downward interlevel jump connections are added to the decoding network to have richer fine-grained semantics and more obvious image reconstruction effects in the information fusion stage. Finally, VGG16 convolutional blocks with a different number of layers are added between the two interlevel connections, and an end-to-end multilevel context encoder is jointly trained. The binary-cross entropy, IoU, and Dice coefficients are fused to form a composite function to calculate the network loss, which accelerates the process of determining the contour boundary. Extensive experiments on the NS, TJU, and TIMIT datasets show that the proposed approach yields a better extraction result on ultrasonic tongue contour than the baseline models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call