Abstract

Research has shown that automatic speech recognition (ASR) performance typically decreases when evaluated on a dialectal variation of the same language that was not used for training its models. Similarly, models simultaneously trained on a group of dialects tend to underperform when compared to dialect-specific models. When trying to decide which dialect-specific model (recognizer) to use to decode an utterance (e.g., a voice search query), possible strategies include automatically detecting the spoken dialect or following the user’s language preferences as set in his/her cell phone. In this paper, we observe that user’s voice search queries are usually directed to a dialect-specific recognizer that does not match the user’s current location, and present a study that shows that automatically selecting the recognizer based on the user’s geographical location helps improve the user experience.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.