The automatic identification of foreign accents can play a crucial role in various speech systems, including speaker identification, e-learning, telephone banking, and more. Additionally, it can greatly enhance the robustness of Automatic Speech Recognition (ASR) systems. Non-native accents in speech signals are characterized by distinct pronunciations, prosody, and voice characteristics of the speaker. However, automatically identifying foreign accents poses significant challenges, particularly in the context of multi-class modeling. Multi-classification models face difficulties in achieving high performance and dealing with computational challenges when confronted with multi-dimensional and unbalanced datasets, such as those with more than two accents. Furthermore, the choice of features remains a bottleneck problem for Foreign Accent Identification (FAID), further hindering performance in these tasks. Consequently, the accuracy of current systems is typically low. To address these challenges, this paper proposes a framework based on the Multi-Kernel Extreme Learning Machine (MKELM) model for the multi-classification of FAID. The MKELM model utilizes a novel weighted scheme to classify various non-native English accents, including Arabic, Chinese, Korean, French, and Spanish. The model first combines Mel-frequency cepstral coefficients (MFCCs) and prosodic features as input, trains pairwise binary classifiers independently, and subsequently employs a weighting scheme to distinguish between classes and identify accents. Through experiments, the proposed model achieves an accuracy rate of 84.72% using a paired weighting scheme. In contrast, the accuracy rate drops to 66.5% when employing the traditional non-weighted multi-classification scheme. A comparison with other models demonstrates the significant advantages of the proposed model in FAID multi-class classification, showcasing improved accuracy, reduced computational complexity (requiring fewer computations, faster learning rates, and shorter training time), and enhanced stability compared to state-of-the-art classification methods.
Read full abstract