Speech Refinement Using Custom Filter for Developing Robust S2S Dataset

Olaniyan Julius,Ayodele A Adebiyi,Esiefarienrhe B Michael,Ibidun C Obagbuwa

doi:10.1109/seb-sdg57117.2023.10124474

Abstract

Neural Network-based speech-to-speech (S2S) translators require a robust and well-refined dataset of audio signals, from which they make automatic translations. These signals, during recording, are usually accompanied by some noises, which normally alter the information conveyed by the original noiseless signal. This type of problem affects the accuracy of the translation system. Although many de-noising techniques have been proposed by researchers for removing noises from raw audio signals, however, this paper presents a novel approach for noise removal from the audio signal using custom filter based on Short-Time Fourier Transform (STFT). By using the LJ Speech Dataset publicly available on Kaggle website, experimental results show that the research technique boosts signal strength (SNR) by 1.001db on average, thus making it an effective method of eliminating noise from speech and other useful acoustic signals.

Full Text