Abstract
A module using sliding window with deformablity, abbreviated as SWD, has been proposed for local feature enhancement. In particular, the proposed SWD module adopts windows with variable size based on the depth of the embedded network layers. Moreover, the proposed SWD module is inserted into the Transformer network, referred as LFEformer, for automatic speech recognition. Such network is particularly good at capturing both local and global features, and this is beneficial for model improvement. It is worth mentioning that the local and global features are extracted by SWD module and the attention mechanism in Transformer network, respectively. The effectiveness of the LFEformer has been validated on three widely used datasets, which are Aishell-1, HKUST and WSJ ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">dev93</i> / <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">eval92</i> ). The experimental results demonstrate that 0.5% CER, 0.8% CER and 0.7%/0.3% WER improvement can be obtained in the correspondent datasets.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.