BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

Yu Zhang,Zhifeng Chen,Yuanzhong Xu,Jiahui Yu,Bhuvana Ramabhadran,James Qin,Zongwei Zhou,Yanping Huang,Ruoming Pang,Joel Shor,Wei Han,Yonghui Wu,Tara N Sainath,Liangliang Cao,Quoc V Le,Anmol Gulati,William Chan,Bo Li,Min Ma,Shibo Wang,Yongqiang Wang,Khe Chai Sim,Aren Jansen,Chung‐Cheng Chiu ,Françoise Beaufays ,Daniel Park

doi:10.1109/jstsp.2022.3182537

Abstract

We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio. We find that the combination of pre-training, self-training and scaling up model size greatly increases data efficiency, even for extremely large tasks with tens of thousands of hours of labeled data. In particular, on an ASR task with 34k hours of labeled data, by fine-tuning an 8 billion parameter pre-trained Conformer model we can match state-of-the-art (SoTA) performance with only 3% of the training data and significantly improve SoTA with the full training set. We also report on the universal benefits gained from using big pre-trained and self-trained models for a large set of downstream tasks that cover a wide range of speech domains and span multiple orders of magnitudes of dataset sizes, including obtaining SoTA performance on many public benchmarks. In addition, we utilize the learned representation of pre-trained networks to achieve SoTA results on non-ASR tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

Abstract

Talk to us

Similar Papers

More From: IEEE Journal of Selected Topics in Signal Processing

Lead the way for us

Journal: IEEE Journal of Selected Topics in Signal Processing	Publication Date: Oct 1, 2022
Citations: 76

Similar Papers

The use of discrete distributions with a very large codebook for automatic speech recognition and speaker verification
Guoli Ye
-
Guoli YeGuoli Ye
23 Dec 2014
23 Dec 2014

Exploring recurrent neural network based acoustic and linguistic modeling for children's speech recognition
Sreeram Ganji ... Rohit Sinha
-
Sreeram Ganji, et. al.Sreeram Ganji ... Rohit Sinha
01 Nov 2017
01 Nov 2017

Adapting Pre-Trained Self-Supervised Learning Model for Speech Recognition with Light-Weight Adapters
Xianghu Yue ... Haizhou Li
Electronics | VOL. 13
Xianghu Yue, et. al.Xianghu Yue ... Haizhou Li
01 Jan 2024
Electronics | VOL. 13

Using Auxiliary Sources of Knowledge for Automatic Speech Recognition

-

01 Jan 2004
01 Jan 2004

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

Abstract

Talk to us

Similar Papers

More From: IEEE Journal of Selected Topics in Signal Processing