Spoken content retrieval and understanding using deep learning

Lin-Shan Lee

doi:10.21820/23987073.2021.1.9

Abstract

Spoken content refers to all content over the Internet which includes human voice, essentially those in multimedia, such As YouTube videos and online courses. Today such content is retrieved via Google primarily based on human-generated text labels, because Google can only retrieve text over the Internet. The goal of this project is to produce technologies to retrieve accurately and efficiently such spoken content directly based on the included audio sounds instead of text labels, because machines today can listen to human voice just as they can read the text. The long term goal is to create a spoken version of Google, which may revolutionize the ways in which humans access information and improve their knowledge. Professor Lin-shan Lee at National Taiwan University is leading this project. He has been a distinguished leader in the global scientific community for the area of teaching machines to speak and listen to human voice for many years.

Full Text