Abstract
In this paper, a temporal feature extraction method based on convolutional neural network-bidirectional long-short term memory (CNN-BLSTM) and temporal pooling (TMPOOL) is proposed for language identification. First, the CNN-BLSTM model is employed as a front-end local feature extractor which learns temporal representation from acoustic features in both forward and backward direction. Then the temporal pooling unit, which is a non-linear support vector regression (SVR) machine, can efficiently learn the ordering relationship between the hidden states of BLSTM and its time indexes. At last, this ordering relationship is utilized as an utterance-level representation. Furthermore, we conducted the experiments on three tasks of the oriental language recognition (OLR-2019) challenge. Compared with other CNN (BLSTM) methods, the proposed method achieves comparable error reductions.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.