Abstract

In a real-time environment, background noise frequently makes it harder for listeners to understand speech. Several speech enhancement algorithms based on supervised learning have been applied to improve the clarity and quality of speech. However, neural network-based speech enhancement techniques focus on the magnitude spectrogram while ignoring the phase misalignment between the noisy and clean voice samples. The primary goal of the proposed method is to boost learning language, eliminate echo, and enhance audio quality. In this paper Microphone sensor Array Source Time Difference, Eco canceller via Reconstructed Spiking neural network (MASTER) has been proposed. Initially, the input signals are pre-processed using Hanning Windowing & second-order Butterworth bandpass Filter. The direction of the preprocessed input signal is calculated based on the time delay of the source received between the microphone sensor pairs. The Direction of Arrival (DOA) of the input source is fed into the Reconstructed spiking Convolutional Neural network (RSpiCN Net) that consists of three modules namely spike generation, Masking, and reconstruction for classifying the signals into desired and interference sources. The interference source is fed into an Adaptive Least Mean Square (ALMS) Filter algorithm to get the enhanced signal from the selected source. The proposed method yields better SNR of 7.2 dB, SDR of 6.2 dB, SIR of 8 dB, and PESQ of 2.6 compared to the existing method. The SDR of the proposed method is 3.5%, 4.56%, and 7.85% better than the existing FLANN, R-CED, and Cycle GAN respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call