Multi-Granularity Sequence Alignment Mapping for Encoder-Decoder Based End-to-End ASR

Jian Tang,Yan Song,Ian Mcloughlin,Li-Rong Dai,Jie Zhang

doi:10.1109/taslp.2021.3101921

Abstract

Encoder-decoder based automatic speech recognition (ASR) methods are increasingly popular due to their simplified processing stages and low reliance on prior knowledge. Conventional encoder-decoder based approaches usually learn a sequence-to-sequence mapping function from the source speech to target units (e.g., subwords, characters) in an end-to-end manner. However, it is still unclear how to choose the optimal target unit, or granularity of multiple units. In general, as increasing the information available for learning sequence-to-sequence mapping functions can improve modeling effectiveness, we therefore propose a multi-granularity sequence alignment (MGSA) approach. This aims to enhance cross-sequence interactions between different granularity units for both modeling and inference stages in the encoder-decoder based ASR. Specifically, a decoder module is designed to generate multi-granularity sequence predictions. We then exploit the latent alignment mapping among units having different levels of granularity, by utilizing the decoded multi-level sequences as input for model prediction. The cross-sequence interaction can also be employed to re-calibrate output probabilities in the proposed post-inference algorithm. Experimental results on both WSJ-80 hrs and Switchboard-300 hrs datasets show the superiority of the proposed method compared to traditional multi-task methods as well as to single granularity baseline systems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-Granularity Sequence Alignment Mapping for Encoder-Decoder Based End-to-End ASR

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Jan 1, 2021
Citations: 2

Similar Papers

Applications and use Cases of Multilevel Granularity for Network Traffic Classification
Faiz Zaki ... Nor Badrul Anuar
-
Faiz Zaki, et. al.Faiz Zaki ... Nor Badrul Anuar
01 Feb 2020
01 Feb 2020

Multiresolution texture analysis for human oocyte cytoplasm description
Laura Caponetti ... Gianluca Sforza
-
Laura Caponetti, et. al.Laura Caponetti ... Gianluca Sforza
01 May 2009
01 May 2009

Sometimes “Tomorrow” is “Sometime”
José Luiz Fiadeiro ... Tom Maibaum
-
José Luiz Fiadeiro, et. al.José Luiz Fiadeiro ... Tom Maibaum
01 Jan 1993
01 Jan 1993

Level of Modularity and Different Levels of System Granularity
Noemi Chiriac ... Katja Hölttä-Otto
Journal of Mechanical Design | VOL. 133
Noemi Chiriac, et. al.Noemi Chiriac ... Katja Hölttä-Otto
01 Oct 2011
Journal of Mechanical Design | VOL. 133

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-Granularity Sequence Alignment Mapping for Encoder-Decoder Based End-to-End ASR

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing