Abstract

Connectionist temporal classification (CTC) has gained success in both end-to-end ASR model and as an auxiliary task for attention-based sequence-to-sequence (S2S) system. However, the special topological structure of CTC and the modeling form that a redundant blank symbol to be optionally inserted between each modeling units makes the CTC inclined to model blank symbols, resulting in a worse than expected model alignment effect, and frames are usually aligned with redundant symbols. In this paper, we design a new simple topology and propose a novel smooth alignment optimization method named soft bidirectional alignment cost (soft-BAC), which is an alternative to the CTC. We propose a scheme that only inserts identifiers between consecutive repetitive labels and solve the alignment problem between two time series of speech-transcription pair by minimizing all costs of the left-to-right and right-to-left alignment process. Experiments on the LibriSpeech corpus show that the proposed soft-BAC method achieves significant improvement in word error rate and alignment effect over the CTC-based baseline model.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.