Abstract

Speech accent recognition (SAR) plays a crucial role in enhancing communication between customers and service providers, enabling personalized interactions based on geographical, birthplace, and cultural cues derived from accents. However, current approaches predominantly rely on training SAR models from scratch, overlooking the potential of transfer learning from other speech processing tasks, despite the relatively small size of accent datasets. This paper represents the first comprehensive investigation into the effectiveness of transfer learning methods derived from a diverse array of data-rich speech processing tasks for SAR. Through experiments on a practical Vietnamese telephone dataset provided by Viettel, the largest telecommunications provider in Southeast Asia, our study reveals that our best-performing model outperforms previous state-of-the-art SAR models by 46.7% in terms of relative accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call