Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation

Gakuto Kurata,Kartik Audhkhasi

doi:10.21437/interspeech.2019-1952

Abstract

Conventional automatic speech recognition (ASR) systems trained from frame-level alignments can easily leverage posterior fusion to improve ASR accuracy and build a better single model with knowledge distillation. End-to-end ASR systems trained using the Connectionist Temporal Classification (CTC) loss do not require frame-level alignment and hence simplify model training. However, sparse and arbitrary posterior spike timings from CTC models pose a new set of challenges in posterior fusion from multiple models and knowledge distillation between CTC models. We propose a method to train a CTC model so that its spike timings are guided to align with those of a pre-trained guiding CTC model. As a result, all models that share the same guiding model have aligned spike timings. We show the advantage of our method in various scenarios including posterior fusion of CTC models and knowledge distillation between CTC models with different architectures. With the 300-hour Switchboard training data, the single word CTC model distilled from multiple models improved the word error rates to 13.7%/23.1% from 14.9%/24.1% on the Hub5 2000 Switchboard/CallHome test sets without using any data augmentation, language model, or complex decoder.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Distilling Attention Weights for CTC-Based ASR Systems
Takafumi Moriya ... Takanori Ashihara
-
Takafumi Moriya, et. al.Takafumi Moriya ... Takanori Ashihara
11 Apr 2020
11 Apr 2020

Confidence measures for CTC-based phone synchronous decoding
Zhehuai Chen ... Yimeng Zhuang
-
Zhehuai Chen, et. al.Zhehuai Chen ... Yimeng Zhuang
01 Mar 2017
01 Mar 2017

Comparable Study Of Modeling Units For End-To-End Mandarin Speech Recognition
Wei Zou ... Shuaijiang Zhao
-
Wei Zou, et. al.Wei Zou ... Shuaijiang Zhao
01 Nov 2018
01 Nov 2018

Speaker Adaptation for End-to-End CTC Models
Ke Li ... Yifan Gong
-
Ke Li, et. al.Ke Li ... Yifan Gong
01 Dec 2018
01 Dec 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation

Abstract

Talk to us

Similar Papers