Abstract

This article introduces and evaluates Sampled Connectionist Temporal Classification (CTC) which connects the CTC criterion to the Cross Entropy (CE) objective through sampling. Instead of computing the logarithm of the sum of the alignment path likelihoods, at each training step the sampled CTC only computes the CE loss between the sampled alignment path and model posteriors. It is shown that the sampled CTC objective is an unbiased estimator of an upper bound for the CTC loss, thus minimization of the sampled CTC is equivalent to the minimization of the upper bound of the CTC objective. The definition of the sampled CTC objective has the advantage that it is scalable computationally to the massive datasets using accelerated computation machines. The sampled CTC is compared with CTC in two large-scale speech recognition tasks and it is shown that sampled CTC can achieve similar WER performance of the best CTC baseline in about one fourth of the training time of the CTC baseline.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.