Abstract
Human infants can discover words directly from unsegmented speech signals given by their mothers and other people without any explicitly labeled data. Developing a computational model and a machine learning method that enable an artificial system to acquire words and phonemes from speech signals automatically is an important challenge. It also provides a hypothesis that can explain the dynamic process performed by infants, i.e., word discovery and phoneme acquisition from daily experiences. The nonparametric Bayesian double articulation analyzer (NPB-DAA) is an unsupervised machine learning method that can automatically discover word-like and phoneme-like units from speech signals directly. However, its performance has only not been evaluated using natural spoken languages including consonants. For dealing with natural speech signals including consonants, a comparative study of the methods for extracting features from speech signals is crucially important. This paper provides a comparative study of feature extraction methods for direct word discovery with NPB-DAA from natural speech signals. We examined six types of feature extraction methods employing a mel-frequency cepstral coefficient and a deep sparse autoencoder (DSAE) with several types of employment of dynamic features on the TIDIGITS corpus, which contains utterances of connected digit sequences. The results showed that 1) NPB-DAA with/without DSAE can extract words and phonemes from natural speech signals containing consonants to a certain extent, 2) naive introduction of dynamics features can even harm the performance of word discovery, and 3) DSAE can consistently increase the correlation between the log-likelihood and the performance measure of word discovery.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.