Improving phoneme recognition of throat microphone speech recordings using transfer learning

M.A Tuğtekin Turan,Engin Erzin

doi:10.1016/j.specom.2021.02.004

Abstract

Throat microphones (TM) are a type of skin-attached non-acoustic sensors, which are robust to environmental noise but carry a lower signal bandwidth characterization than the traditional close-talk microphones (CM). Attaining high-performance phoneme recognition is a challenging task when the training data from a degrading channel, such as TM, is limited. In this paper, we address this challenge for the TM speech recordings using a transfer learning approach based on the stacked denoising auto-encoders (SDA). The proposed transfer learning approach defines an SDA-based domain adaptation framework to map the source domain CM representations and the target domain TM representations into a common latent space, where the mismatch across TM and CM is eliminated to better train an acoustic model and to improve the TM phoneme recognition. For the phoneme recognition task, we use the convolutional neural network (CNN) and the hidden Markov model (HMM) based CNN/HMM hybrid system, which delivers better acoustic modeling performance compared to the conventional Gaussian mixture model (GMM) based models. In the experimental evaluations, we observed more than 12% relative phoneme error rate (PER) improvement for the TM recordings with the proposed transfer learning approach compared to baseline performances.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improving phoneme recognition of throat microphone speech recordings using transfer learning

Abstract

Talk to us

Similar Papers

More From: Speech Communication

Lead the way for us

Journal: Speech Communication	Publication Date: Mar 10, 2021
Citations: 11

Similar Papers

Bottleneck feature-mediated DNN-based feature mapping for throat microphone speech recognition
Takahito Suzuki ... Jun Ogata
-
Takahito Suzuki, et. al.Takahito Suzuki ... Jun Ogata
01 Nov 2018
01 Nov 2018

A comparison of Gaussian Mixture Modeling (GMM) and Hidden Markov Modeling (HMM) based approaches for Automatic Phoneme Recognition in Kannada
Prashanth Kannadaguli ... Vidya Bhat
-
Prashanth Kannadaguli, et. al.Prashanth Kannadaguli ... Vidya Bhat
01 Mar 2015
01 Mar 2015

Effects of Mounting Position on Throat Microphone Speech Recognition
Takahito Suzuki ... Jun Ogata
-
Takahito Suzuki, et. al.Takahito Suzuki ... Jun Ogata
01 Oct 2019
01 Oct 2019

Combination of two-dimensional cochleogram and spectrogram features for deep learning-based ASR
Andros Tjandra ... Graham Neubig
-
Andros Tjandra, et. al.Andros Tjandra ... Graham Neubig
01 Apr 2015
01 Apr 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving phoneme recognition of throat microphone speech recordings using transfer learning

Abstract

Talk to us

Similar Papers

More From: Speech Communication