Server-side rescoring of spoken entity-centric knowledge queries for virtual assistants

Youyuan Zhang,Sashank Gondala,Thiago Fraga-Silva,Christophe Van Gysel

doi:10.1007/s10772-024-10102-y

Youyuan Zhang, Sashank Gondala + Show 2 more

Open Access

PDF Available

https://doi.org/10.1007/s10772-024-10102-y

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

On-device virtual assistants (VAs) powered by automatic speech recognition (ASR) require effective knowledge integration for the challenging entity-rich query recognition. In this paper, we conduct an empirical study of modeling strategies for server-side rescoring of spoken information domain queries using various categories of language models (LMs) (N-gram word LMs, sub-word neural LMs). We investigate the combination of on-device and server-side signals, and demonstrate significant word error rate improvements of 23%-35% relative on various entity-centric query subpopulations by integrating various server-side LMs compared to performing ASR on-device only. We also perform a comparison between LMs trained on domain data and a generative pre-trained (GPT) (a variant GPT-3) offered by OpenAI as a baseline. Furthermore, we also show that model fusion of multiple server-side LMs trained from scratch most effectively combines complementary strengths of each model and integrates knowledge learned from domain-specific data to a VA ASR system.

Full Text