Abstract
The paper presents an auditory scene analyser that comprises of two joint simultaneous modules, namely binaural speech segregation and speaker recognition. The binaural speech segregation is realized by incorporating interaural time and level differences, interaural phase difference and interaural coherence along with direct-to-reverberant ratio into deep recurrent neural network. The performance of deep recurrent network-based speech segregation is validated in terms of source to interference ratio, source to distortion ratio and source to artifacts ratio and compared with existing architectures including deep neural network. It is observed that performance of conventional deep recurrent neural network can be improved further by involving discriminative objectives along with soft time–frequency masking as a layer in the network structure. The system also proposes a spectro-temporal extractor which is referred as Gabor–Hilbert envelope coefficients (GHEC). The proposed monaural feature is responsible for extracting discriminative acoustic information from segregated speech sources. The performance of GHEC is validated under various noisy and reverberant environments and the results are compared with existing monaural features. The results of binaural speech segregation have shown better signal-to-noise ratio at an average of 0.7 dB even in the presence of higher reverberation time, 0.89 s over other baseline algorithms.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.