Abstract
Feature extraction is of great importance to ultrasound tongue image analysis. Inspired by the recent success of deep learning, we explore a novel approach to feature extraction from ultrasound tongue images using pre-trained convolutional neural networks (CNN). The bottleneck features from different pre-trained CNNs, including VGGNet and ResNet, are used as representations of the ultrasound tongue images. Then an image classification task is conducted to assess the effectiveness of CNN-based features. Our dataset consists of 20,000 ultrasound tongue images collected from a female speaker of Mandarin Chinese, which were manually labeled as containing one of the following consonants: /p, t, k, l/. Experiment results show that the Gradient Boost Machines (GBM) classifiers trained on the CNN-based features achieve the best performance, with a classification accuracy of 92.4% for ResNet and 91.6% for VGGNet, outperforming the benchmark GBM classifier trained on the features extracted using Principal Component Analysis (PCA), which only achieves an accuracy of 87.5%. In this preliminary dataset, our method of feature extraction is found to be superior to the PCA-based method. This work demonstrates the potential of applying the pre-trained convolutional neural networks to ultrasound tongue image analysis task.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.