Abstract

Overlapped speech is referred to a monophonic audio signal in which at least two speakers are present at the same time. In this study, the focus is on distinguishing overlapped from single-speaker speech, i.e., overlapped speech detection. We develop an overlap detection algorithm using an enhanced time-frequency representation, called Pyknogram, estimated directly from the input audio signal. Pyknograms use the Teager–Kaiser energy operator to detect resonant time-frequency units and thereby suppress nonharmonic structures. We show how the resulting Pyknograms provide high separability in terms of detecting the presence of interfering speech. Our proposed unsupervised Pyknogram-based detection results in over $30\%$ relative improvement in overlap detection error rates across different signal-to-interference ratios (SIR) compared to baseline systems. In addition, a case study is presented where we evaluate speaker verification performance under different overlap conditions using the GRID database and observe that speaker verification equal error rates (EER) vary from $2\%$ to $30\%$ , depending on the average SIR values introduced to train and test sets. In order to estimate the reliability of speaker verification scores across different trials, overlap detection results are interpreted as low-level information and stack ed alongside verification outputs. The resulting high-dimensional space is passed through a support vector machine classifier to find the separating hyperplane between target and imposter scores. Combining overlap detection scores with speaker verification on average yields $20\%$ relative decrease in EER. We also provide an upper bound for this approach using existing overlap labels, which yields $23\%$ relative improvement.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.