GBNF-VAE: A Pathological Voice Enhancement Model Based on Gold Section for Bottleneck Feature With Variational Autoencoder

Ganjun Liu,Tao Zhang,Biyun Ding,Ying Lv,Xiaohui Hou,Haoyang Guo,Yaqin Wu,Dehui Fu

doi:10.1016/j.jvoice.2023.03.012

Ganjun Liu, Tao Zhang + Show 6 more

https://doi.org/10.1016/j.jvoice.2023.03.012

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Speech enhancement has become a promising technique to accommodate demands of the improvement in quality of a degraded speech signal. The main works now focus on separating normal speech from noise, but have neglected the low quality of impaired speech influenced by anomalous glottis flow. In order to effectively enhance the pathological speech, it is essential to design a separation mechanism for extracting high-dimensional timbre features and speech features separately to suppress low-dimensional noises. In this paper, we propose an enhancement model GBNF-VAE to extract timbre efficiently by reducing anomalous airflow noise interference, and by combining the semantic features with timbre features to synthesize the enhanced speech. In particular, the bottleneck feature can characterize the timbre by the controlled number of nodes through the Golden Sectionmethod, which effectively improves computational efficiency. In addition, variational autoencoder is adopted to extract semantic features which are combined with the previous timbre features to synthesize the enhanced speech. Finally, spectrum observation, objective indicators and subjective evaluation all show the outstanding performance of GBNF-VAE in pathological speech quality enhancement.

Full Text