Abstract

We describe a new approach to automatic dialect and accent recognition which exceeds state-of-the-art performance in three recognition tasks. This approach improves the accuracy and substantially lower the time complexity of our earlier phoneticbased kernel approach for dialect recognition. In contrast to state-of-the-art acoustic-based systems, our approach employs phone labels and segmentation to constrain the acoustic models. Given a speaker’s utterance, we first obtain phone hypotheses using a phone recognizer and then extract GMM-supervectors for each phone type, effectively summarizing the speaker’s phonetic characteristics in a single vector of phone-type supervectors. Using these vectors, we design a kernel function that computes the phonetic similarities between pairs of utterances to train SVM classifiers to identify dialects. Comparing this approach to the state-of-the-art, we obtain a 12.9% relative improvement in EER on Arabic dialects, and a 17.9% relative improvement for American vs. Indian English dialects. We also see a 53.5% relative improvement over a GMM-UBM on American Southern vs. Non-Southern English.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.