Abstract

As voice interfaces to devices and digital assistants have increased in popularity, so too, have the challenging environments in which they are expected to perform. In this talk, we’ll present an overview of the signal processing and speech recognition AI modeling techniques that we have developed at Facebook to enable robust voice interaction on Portal video calling devices and Oculus VR headsets. We will also describe progress in captioning and understanding the wide variety of video content shared on Facebook apps, where the acoustic conditions are diverse and challenging and the audio is typically captured on commodity mobile phones. While such systems have been historically developed to run on powerful servers in the cloud, there is increasing interest in speech models that can run locally on the client device. We will describe the challenges of on-device processing and our recent progress in creating efficient, low-footprint speech models. Finally, we will present the challenges and future directions we are exploring to enable rich voice interactions on the next generation of computing devices, including augmented reality glasses.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.