Spoken language recognition using a new conditional cascade method to combine acoustic and phonetic results

Shabnam Gholamdokht Firooz,Yasser Shekofteh,Shaghayegh Reza

doi:10.1007/s10772-018-9526-5

Abstract

Spoken language recognition (SLR) is an identification process to detect the language of an audio file. Traditional SLR systems are mainly based on acoustic or phonetic approaches. These approaches have complementary characteristic, so fusing them may lead to improve the accuracy of final SLR system. In the phonetic-based approaches, the process of phone recognition, as its initial step, is a time consuming process, therefore it may results in a high computational cost for overall SLR system. In this paper, a new structure, named conditional cascade, is proposed to speed up the combined system of the phonetic and acoustic approaches. In the proposed structure, the phonetic approach is only used when the confidence score of the acoustic approach is not desirable. Hence, the performance of the SLR system can be improved, while its speed degradation is not significant. To calculate final confidence scores, a heuristic method is utilized based on the acoustic scores distribution for each language. The experimental results showed that the proposed conditional cascade system could decrease the classification error of target languages with an acceptable runtime.

Full Text