A preliminary study on improving the recognition of esophageal speech using a hybrid system based on statistical voice conversion.

Othman Lachhab,Joseph Di Martino,Ahmed Hammouch,Elhassane Ibn Elhaj

doi:10.1186/s40064-015-1428-2

Abstract

In this paper, we propose a hybrid system based on a modified statistical GMM voice conversion algorithm for improving the recognition of esophageal speech. This hybrid system aims to compensate for the distorted information present in the esophageal acoustic features by using a voice conversion method. The esophageal speech is converted into a “target” laryngeal speech using an iterative statistical estimation of a transformation function. We did not apply a speech synthesizer for reconstructing the converted speech signal, given that the converted Mel cepstral vectors are used directly as input of our speech recognition system. Furthermore the feature vectors are linearly transformed by the HLDA (heteroscedastic linear discriminant analysis) method to reduce their size in a smaller space having good discriminative properties. The experimental results demonstrate that our proposed system provides an improvement of the phone recognition accuracy with an absolute increase of 3.40 % when compared with the phone recognition accuracy obtained with neither HLDA nor voice conversion.

Highlights

A total laryngectomy is a surgical procedure which consists in a complete removal of the larynx for the treatment of a cancer for example
Conclusion and future works In this paper, we present our hybrid system for improving the recognition of esophageal speech
This system is based on a simplified statistical GMM voice conversion that projects the esophageal frames into a clean laryngeal speech space

Summary

Background

A total laryngectomy is a surgical procedure which consists in a complete removal of the larynx for the treatment of a cancer for example. In (Tanaka et al 2014), a new hybrid method for alaryngeal speech enhancement based on noise reduction by spectral subtraction (Boll 1979) and using statistical voice conversion for predicting the excitation parameters was developed These two recent approaches aim to improve the estimation of acoustic features in order to reconstruct an enhanced signal with best intelligibility. In practice it is difficult for them to compensate for the differences existing in the alaryngeal acoustic parameters when compared with those of the laryngeal speech To overcome this drawback, we propose a new hybrid system for improving the recognition of esophageal speech based on a simple voice conversion algorithm. The Werghi’s algorithm has been used in this study as our basic voice conversion procedure

Training process

Conversion process

Findings

Conclusion and future works

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: SpringerPlus	Publication Date: Oct 26, 2015
Citations: 12	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A preliminary study on improving the recognition of esophageal speech using a hybrid system based on statistical voice conversion.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: SpringerPlus

Lead the way for us

Similar Papers

Esophageal Speech Enhancement Based on Statistical Voice Conversion with Gaussian Mixture Models
Hironori Doi ... Hiroshi Saruwatari
IEICE Transactions on Information and Systems | VOL. E93-D
Hironori Doi, et. al.Hironori Doi ... Hiroshi Saruwatari
01 Jan 2009
IEICE Transactions on Information and Systems | VOL. E93-D

Multimodal voice conversion based on non-negative matrix factorization
Kenta Masaka ... Yasuo Ariki
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2015
Kenta Masaka, et. al.Kenta Masaka ... Yasuo Ariki
04 Sep 2015
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2015

Exploring HLDA based transformation for reducing acoustic mismatch in context of children speech recognition
Hemant Kumar Kathania ... Rohit Sinha
-
Hemant Kumar Kathania, et. al.Hemant Kumar Kathania ... Rohit Sinha
01 Jul 2014
01 Jul 2014

Audio-visual voice conversion using noise-robust features
Kohei Sawada ... Satoshi Tamura
-
Kohei Sawada, et. al.Kohei Sawada ... Satoshi Tamura
01 May 2014
01 May 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A preliminary study on improving the recognition of esophageal speech using a hybrid system based on statistical voice conversion.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: SpringerPlus