Speech Recognition of Accented Mandarin Based on Improved Conformer.

Xing-Yao Yang,Jiong Yu,Zi-Yang Li,Shao-Dong Zhang,Rui Xiao

doi:10.3390/s23084025

Abstract

The convolution module in Conformer is capable of providing translationally invariant convolution in time and space. This is often used in Mandarin recognition tasks to address the diversity of speech signals by treating the time-frequency maps of speech signals as images. However, convolutional networks are more effective in local feature modeling, while dialect recognition tasks require the extraction of a long sequence of contextual information features; therefore, the SE-Conformer-TCN is proposed in this paper. By embedding the squeeze-excitation block into the Conformer, the interdependence between the features of channels can be explicitly modeled to enhance the model's ability to select interrelated channels, thus increasing the weight of effective speech spectrogram features and decreasing the weight of ineffective or less effective feature maps. The multi-head self-attention and temporal convolutional network is built in parallel, in which the dilated causal convolutions module can cover the input time series by increasing the expansion factor and convolutional kernel to capture the location information implied between the sequences and enhance the model's access to location information. Experiments on four public datasets demonstrate that the proposed model has a higher performance for the recognition of Mandarin with an accent, and the sentence error rate is reduced by 2.1% compared to the Conformer, with only 4.9% character error rate.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Speech Recognition of Accented Mandarin Based on Improved Conformer.

Abstract

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)

Lead the way for us

Journal: Sensors (Basel, Switzerland)	Publication Date: Apr 16, 2023
License type: CC BY 4.0

Similar Papers

Automatic detection of epilepsy from EEGs using a temporal convolutional network with a self-attention layer
Leen Huang ... Jinxin Zhang
BioMedical Engineering OnLine | VOL. 23
Leen Huang, et. al.Leen Huang ... Jinxin Zhang
01 Jun 2024
BioMedical Engineering OnLine | VOL. 23

Temporal convolutional network with soft threshold and contractile self-attention mechanism for remaining useful life prediction of rolling bearings
Hao Ma ... Huaiqian Bao
Measurement Science and Technology | VOL. 35
Hao Ma, et. al.Hao Ma ... Huaiqian Bao
05 Sep 2024
Measurement Science and Technology | VOL. 35

Incorporating temporal multi-head self-attention convolutional networks and LightGBM for indoor air quality prediction
Yifeng Lu ... Hongbin Liu
Applied Soft Computing | VOL. 157
Yifeng Lu, et. al.Yifeng Lu ... Hongbin Liu
31 Mar 2024
Applied Soft Computing | VOL. 157

Hybrid model for short-term wind power forecasting based on singular spectrum analysis and a temporal convolutional attention network with an adaptive receptive field
Zhen Shao ... Shanlin Yang
Energy Conversion and Management | VOL. 269
Zhen Shao, et. al.Zhen Shao ... Shanlin Yang
01 Sep 2022
Energy Conversion and Management | VOL. 269

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Speech Recognition of Accented Mandarin Based on Improved Conformer.

Abstract

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)