Abstract

An accent is the distinctive way words are pronounced. Every speaker has an accent, which varies by gender, age, formality, social class, geographical region, and native language. Accent recognition is an important task as it can help improve Automatic Speech Recognition by first identifying the accent and then transferring to a Speech Recognizer that is trained for a particular accent group. Accent recognition is a complex problem due to the numerous characteristics that set accents apart. Accents differ by voice quality, phoneme pronunciation, and prosody. Since it is difficult to extract these exact features, existing work uses alternate features such as spectral features, which captures the frequency of speech. Such features include the Mel-Frequency Cepstral Coefficient (MFCC), Spectrogram, Chromagram, Spectral Centroid, and Spectral Roll-off, which are extracted from raw audio samples. Previous work has not made clear which features yield the highest accuracy for an accent classification task. In this work, these five features were used to train a 2-layer Convolution Neural Network on a dataset of five distinct language-accents, namely, Arabic, English, French, Mandarin, and Spanish. The accuracy of each feature was evaluated and compared. The MFCC yielded the highest accuracy.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call