Self-Supervised Speech Representation Learning: A Review

Abdelrahman Mohamed,Katrin Kirchhoff,Hung-Yi Lee,Karen Livescu,Tara N Sainath,Shinji Watanabe,Jakob D Havtorn,Joakim Edin,Christian Igel,Lasse Borgholt,Lars Maaloe,Shang-Wen Li

doi:10.1109/jstsp.2022.3207050

Abstract

Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and languages for which only limited labeled data is available. Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains. Such methods have shown success in natural language processing and computer vision domains, achieving new levels of performance while reducing the number of labels required for many downstream scenarios. Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods. Other approaches rely on multi-modal data for pre-training, mixing text or visual data streams with speech. Although self-supervised speech representation is still a nascent research area, it is closely related to acoustic word embedding and learning with zero lexical resources, both of which have seen active research for many years. This review presents approaches for self-supervised speech representation learning and their connection to other research areas. Since many current methods focus solely on automatic speech recognition as a downstream task, we review recent efforts on benchmarking learned representations to extend the application beyond speech recognition.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Journal of Selected Topics in Signal Processing	Publication Date: Oct 1, 2022
Citations: 120	License type: other-oa

R Discovery Prime

R Discovery Prime

Self-Supervised Speech Representation Learning: A Review

Abstract

Talk to us

Similar Papers

More From: IEEE Journal of Selected Topics in Signal Processing

Lead the way for us

Similar Papers

T2 Self-supervised Representation Learning for Speech Processing
...
-
, et. al. ...
05 Jul 2022
05 Jul 2022

Benchmarking Self-Supervised Contrastive Learning Methods for Image-Based Plant Phenotyping.
Franklin C Ogidi ... Ian Stavness
Plant phenomics (Washington, D.C.) | VOL. 5
Franklin C Ogidi, et. al.Franklin C Ogidi ... Ian Stavness
01 Jan 2023
Plant phenomics (Washington, D.C.) | VOL. 5

A Novel Multi-Task Self-Supervised Representation Learning Paradigm
Yinggang Li ... Qi Zhang
Control theory & applications | VOL. -
Yinggang Li, et. al.Yinggang Li ... Qi Zhang
28 May 2021
Control theory & applications | VOL. -

LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech
Solène Evain ... Alexandre Allauzen
-
Solène Evain, et. al.Solène Evain ... Alexandre Allauzen
30 Aug 2021
30 Aug 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Self-Supervised Speech Representation Learning: A Review

Abstract

Talk to us

Similar Papers

More From: IEEE Journal of Selected Topics in Signal Processing