A Survey on Text-Dependent and Text-Independent Speaker Verification

Youzhi Tu,Man-Wai Mak,Weiwei Lin

doi:10.1109/access.2022.3206541

Youzhi Tu, Man-Wai Mak + Show 1 more

Open Access

https://doi.org/10.1109/access.2022.3206541

Copy DOI

Abstract

Speaker verification (SV) aims to detect an individual’s identity from his/her voice. SV has been successfully applied in various areas such as access control, remote service customization, financial transactions, etc. Depending on whether the text content is pre-defined or not, SV can be text-dependent or text-independent. This paper reviews recent research on text-dependent SV (TD-SV) and text-independent SV (TI-SV). Because most modern SV systems apply deep learning methods to boost performance, we focus on the studies that use deep speaker embedding, a technique representing a person’s identity via a fixed-dimensional vector encoded from a variable-length utterance. Rather than detailing every existing SV system, we make an overview of the representative SV systems that have attracted wide attention. Furthermore, an increasing number of SV systems have been devoted to addressing real-world challenges such as reverberation and noise, and this has driven a large number of studies on practical SV. Therefore, the survey compares the existing SV systems in the Far-Field Speaker Verification Challenge 2020 (FFSVC 2020) to illustrate the most effective techniques for both TD-SV and TI-SV.

Full Text