Loss of the larynx significantly alters natural voice production, requiring alternative communication modalities and rehabilitation methods to restore speech intelligibility and improve the quality of life of affected individuals. This paper explores advances in alaryngeal speech enhancement to improve signal quality and reduce background noise, focusing on individuals who have undergone laryngectomy. In this study, speech samples were obtained from 23 Lithuanian males who had undergone laryngectomy with secondary implantation of the tracheoesophageal prosthesis (TEP). Pareto optimized gated LSTM was trained on tracheoesophageal speech data to recognize complex temporal connections and contextual information in speech signals. The system was able to distinguish between actual speech and various forms of noise and artifacts, resulting in a 25% drop in the mean signal-to-noise ratio (SNR) compared to other approaches. According to acoustic analysis, the system significantly decreased the number of unvoiced frames (PVF) from 40% to 10% while maintaining stable proportions of voiced frames (PVS) and average voicing evidence (AVE), indicating the accuracy of the approach in selectively attenuating noise and undesired speech artifacts while preserving important speech information.
Read full abstract