Abstract

The accuracy of any speech translation system essentially depends on the quality of the audio signal inputted into it. Many researchers have worked on different approaches in an attempt to reduce the level of noise in audio signals. Such approaches, among others, include Wavelet, Fourier Transform (FT), and deep learning. These algorithms worked well on noisy speech to a certain degree, but their degree of accuracy is not sufficient enough for speech-to-speech (S2S) translation because the presence of just a little noise in the signal can alter the semantic representation of the underlying language. Since it is nearly impossible for any of this single algorithm to produce a perfect (noiseless) signal, this paper presents a layered approach for total noise removal by stacking Principal Component Analysis (PCA) on Short Time Fourier Transform (STFT). In this approach, a band-pass channel is created using STFT, which reduces the signal noise level to the barest minimum while the residual noise is completely removed by performing PCA on the refined signals. Experimental results clearly showed that this approach almost doubles the signal-to-noise ratio (SNR) of the output signal for all the 10 audio samples being tested, thus making it relatively than the aforementioned approaches in terms of quality of outputs, and suitable for accuracy-sensitive domains such as speech-to-speech translation system development.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call