Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification Using CTC-Based Soft VAD and Global Query Attention

Myunghun Jung,Hoirin Kim,Jahyun Goo,Youngmoon Jung

doi:10.21437/interspeech.2020-1420

Abstract

Keyword spotting (KWS) and speaker verification (SV) have been studied independently although it is known that acoustic and speaker domains are complementary. In this paper, we propose a multi-task network that performs KWS and SV simultaneously to fully utilize the interrelated domain information. The multi-task network tightly combines sub-networks aiming at performance improvement in challenging conditions such as noisy environments, open-vocabulary KWS, and short-duration SV, by introducing novel techniques of connectionist temporal classification (CTC)-based soft voice activity detection (VAD) and global query attention. Frame-level acoustic and speaker information is integrated with phonetically originated weights so that forms a word-level global representation. Then it is used for the aggregation of feature vectors to generate discriminative embeddings. Our proposed approach shows 4.06% and 26.71% relative improvements in equal error rate (EER) compared to the baselines for both tasks. We also present a visualization example and results of ablation experiments.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification Using CTC-Based Soft VAD and Global Query Attention

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Multi-task deep cross-attention networks for far-field speaker verification and keyword spotting
Xingwei Liang ... Ruifeng Xu
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2023
Xingwei Liang, et. al.Xingwei Liang ... Ruifeng Xu
01 Jul 2023
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2023

Improving the performance of GPLDA speaker verification using unsupervised inter-dataset variability compensation approaches
Ahilan Kanagasundaram
International Journal of Speech Technology | VOL. 21
Ahilan KanagasundaramAhilan Kanagasundaram
03 May 2018
International Journal of Speech Technology | VOL. 21

Short Utterance Speaker Recognition Using Time-Delay Neural Network
Muhammet Mesut Toruk ... Ramazan Gokay
-
Muhammet Mesut Toruk, et. al.Muhammet Mesut Toruk ... Ramazan Gokay
01 Mar 2019
01 Mar 2019

Maximum margin linear kernel optimization for speaker verification
Mohamed Kamal Omar ... Jason Pelecanos
-
Mohamed Kamal Omar, et. al.Mohamed Kamal Omar ... Jason Pelecanos
01 Apr 2009
01 Apr 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification Using CTC-Based Soft VAD and Global Query Attention

Abstract

Talk to us

Similar Papers