Abstract
Correlogram is an important representation for periodic signals. It is widely used in pitch estimation and source separation. For these applications, major problems of correlogram are its low resolution and redundant information. This paper proposes a voiced speech segregation systembased on a newly introduced concept called dynamic harmonic function (DHF). In the proposed system, conventional correlograms are further processed by replacing the autocorrelation function (ACF) with DHF. The advantages of DHF are: 1) peak's width is adjustable by controlling the variance of the Gaussian function and 2) the invalid peaks of ACF, not at the pitch period, tend to be suppressed. Based on DHF, pitch detection and effective source segregation algorithms are proposed. Our system is systematically evaluated and compared with the correlogram-based system. Both the signal-to-noise ratio results and the perceptual evaluation of speech quality scores show that the proposed system yields substantially better performance.
Highlights
In realistic environment, speech is often corrupted by acoustic interference
This paper proposes a voiced speech segregation system based on a newly introduced concept called dynamic harmonic function (DHF)
The other reason is that the dataset has been widely used in evaluate computational auditory scene analysis (CASA)-based separation systems [8, 9, 15] which facilitates the comparison
Summary
Speech is often corrupted by acoustic interference. many applications have bad performance when handling the noisy speech. Numerous speech enhancement algorithms have been proposed in the literature [1] The methods, such as independent component analysis [2] or beam forming [3], require multiple sensors. Spectrum subtraction [4] and subspace analysis [5] proposed for monaural speech enhancement usually make strong assumptions on acoustic interference These methods are limited to some special environments. Meddis and Hewitt [14] implemented a similar computer model for harmonics perception Their model firstly simulated the mechanical filtering of basilar membrane to decompose the signal and the mechanism of neural transduction at hair cell. Their important innovation was to conduct the autocorrelation to model the neural firing rate analysis of human being These banks of autocorrelation functions (ACF) were called correlograms which provide a simple way to pitch estimation and source separation.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: EURASIP Journal on Audio, Speech, and Music Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.