Abstract

Abstract The recognition and classification of English accents have high practical value in areas such as security management and information retrieval. This study introduced two English accent features, filter bank (FBank) and Mel-frequency cepstral coefficient (MFCC), based on deep learning techniques. It then combined convolutional neural network (CNN), gated recurrent unit, and an attention mechanism to design a 1D CNN-BiGRU-Attention model for English accent recognition and classification. Experimental tests were conducted on the VoxForge dataset. The results showed that compared to MFCC, FBank performed better in English accent recognition and classification, and 70FBank achieved the highest F1 value. Among the recurrent neural network, long short-term memory, and other models, the BiGRU model had the best performance. The average F1 value of the 1D CNN-BiGRU-attention model was the highest, reaching 85.52%, and all the F1 values were above 80% for different accents, indicating that the addition of the attention mechanism effectively improved the model’s recognition and classification effectiveness. The results prove the reliability of the method proposed in this article for English accent recognition and classification, making it suitable for practical application and promotion.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call