Knowledge Distillation Using Soft and Hard Labels and Annealing for Acoustic Model Training

Yuuki Tachioka

doi:10.1109/gcce46687.2019.9015500

Knowledge Distillation Using Soft and Hard Labels and Annealing for Acoustic Model Training

Yuuki Tachioka

https://doi.org/10.1109/gcce46687.2019.9015500

Copy DOI

Journal: Control theory & applications	Publication Date: Oct 1, 2019
Citations: 6

Affiliation: Denso (Japan)

#Soft Labels #Larger Models + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

While larger acoustic models provide better speech recognition performance, smaller models are appropriate when computational resources are limited. Knowledge distillation is used to train small models on basis of soft labels obtained from larger models instead of hard labels obtained from reference transcriptions. In this work, we investigated two methods for using both types of labels: sequence-level distillation (SD), in which the loss function selected is related to the hard or soft labels, and sequence-level interpolation (SI), in which both loss functions are interpolated. Experiments showed that SI was consistently better than SD, and that SI with annealing performed the best.

Full Text