Abstract
A speaker's accent is the most important factor affecting the performance of automatic speech recognition (ASR) systems because accents vary widely, even within the same country or community. This variation is due to the fact that when non- native speakers start to learn a second language, the substitution of native language phoneme pronunciation is a common process. Such substitution leads to fuzziness between the phoneme boundaries and phoneme classes. This fuzziness reduces out-of class variations and increases the similarities between the different sets of phonemes. In this paper, a new method is proposed based on the side information from dissimilar pairs of accent groups, to transfer data points to a new space where the Euclidian distances between similar and dissimilar points become minimum and maximum, respectively.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.