Abstract
Computational auditory scene analysis (CASA) has significant role in speech segregation from monaural audio mixtures and generally a measure for performance of speech recognition systems. Pitch estimation has a substantial role in performance of CASA systems. This study presents a novel pitch estimation framework for speech segregation from monaural audio mixtures using cochleagram morphing. The proposed framework takes the rough estimation of target pitch from given audio mixtures containing speech and background interferences. Discrete set consisting morphed versions of cochleagram is obtained using k-Means clustering. The estimated pitch values are improved by validating and smoothing them to morphed cochleagram. Measure of refined estimated pitch contours along with harmonicity and temporal continuity are used to segregate target speech. The proposed framework produced 83.13% accuracy for MIR-1k dataset which is considerably higher than the existing methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.