Abstract
Spoken language identification is the process by which the language in a spoken utterance is recognized automatically. Spoken language identification is commonly used in speech translation systems, in multi-lingual speech recognition, and in speaker diarization. In the current paper, spoken language identification based on deep learning (DL) and the i-vector paradigm is presented. Specifically, a comparative study is reported, consisting of experiments on language identification using deep neural networks (DNN) and convolutional neural networks (CNN). Also, the integration of the two methods into a complete system is investigated. Previous studies demonstrated the effectiveness of using DNN in spoken language identification. However, to date, the integration of CNN and i-vectors in language identification has not been investigated. The main advantage of using CNN is that fewer parameters are required compared to DNN. As a result, CNN is cheaper in terms of memory and the computational power needed. The proposed methods are evaluated on the NIST 2015 i-vector Machine Learning Challenge task for the recognition of 50 in-set languages. Using DNN, a 3.55% equal error rate (EER) was achieved. The EER when using CNN was 3.48%. When DNN and CNN systems were fused, an EER of 3.3% was obtained. The results are very promising, and they also show the effectiveness of using CNN and i-vectors in spoken language identification. The proposed methods are compared to a baseline method based on support vector machines (SVM) and they demonstrated significantly superior performance.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.