Abstract

In this paper, we propose a hybrid system based on a modified statistical GMM voice conversion algorithm for improving the recognition of esophageal speech. This hybrid system aims to compensate for the distorted information present in the esophageal acoustic features by using a voice conversion method. The esophageal speech is converted into a “target” laryngeal speech using an iterative statistical estimation of a transformation function. We did not apply a speech synthesizer for reconstructing the converted speech signal, given that the converted Mel cepstral vectors are used directly as input of our speech recognition system. Furthermore the feature vectors are linearly transformed by the HLDA (heteroscedastic linear discriminant analysis) method to reduce their size in a smaller space having good discriminative properties. The experimental results demonstrate that our proposed system provides an improvement of the phone recognition accuracy with an absolute increase of 3.40 % when compared with the phone recognition accuracy obtained with neither HLDA nor voice conversion.

Highlights

  • A total laryngectomy is a surgical procedure which consists in a complete removal of the larynx for the treatment of a cancer for example

  • Conclusion and future works In this paper, we present our hybrid system for improving the recognition of esophageal speech

  • This system is based on a simplified statistical GMM voice conversion that projects the esophageal frames into a clean laryngeal speech space

Read more

Summary

Background

A total laryngectomy is a surgical procedure which consists in a complete removal of the larynx for the treatment of a cancer for example. In (Tanaka et al 2014), a new hybrid method for alaryngeal speech enhancement based on noise reduction by spectral subtraction (Boll 1979) and using statistical voice conversion for predicting the excitation parameters was developed These two recent approaches aim to improve the estimation of acoustic features in order to reconstruct an enhanced signal with best intelligibility. In practice it is difficult for them to compensate for the differences existing in the alaryngeal acoustic parameters when compared with those of the laryngeal speech To overcome this drawback, we propose a new hybrid system for improving the recognition of esophageal speech based on a simple voice conversion algorithm. The Werghi’s algorithm has been used in this study as our basic voice conversion procedure

Training process
Conversion process
Findings
Conclusion and future works

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.