Abstract

Spoken language recognition (SLR) is an identification process to detect the language of an audio file. Traditional SLR systems are mainly based on acoustic or phonetic approaches. These approaches have complementary characteristic, so fusing them may lead to improve the accuracy of final SLR system. In the phonetic-based approaches, the process of phone recognition, as its initial step, is a time consuming process, therefore it may results in a high computational cost for overall SLR system. In this paper, a new structure, named conditional cascade, is proposed to speed up the combined system of the phonetic and acoustic approaches. In the proposed structure, the phonetic approach is only used when the confidence score of the acoustic approach is not desirable. Hence, the performance of the SLR system can be improved, while its speed degradation is not significant. To calculate final confidence scores, a heuristic method is utilized based on the acoustic scores distribution for each language. The experimental results showed that the proposed conditional cascade system could decrease the classification error of target languages with an acceptable runtime.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.