Abstract

Speech accent recognition (SAR) plays a crucial role in enhancing communication between customers and service providers, enabling personalized interactions based on geographical, birthplace, and cultural cues derived from accents. However, current approaches predominantly rely on training SAR models from scratch, overlooking the potential of transfer learning from other speech processing tasks, despite the relatively small size of accent datasets. This paper represents the first comprehensive investigation into the effectiveness of transfer learning methods derived from a diverse array of data-rich speech processing tasks for SAR. Through experiments on a practical Vietnamese telephone dataset provided by Viettel, the largest telecommunications provider in Southeast Asia, our study reveals that our best-performing model outperforms previous state-of-the-art SAR models by 46.7% in terms of relative accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.