Abstract

Automatic segmentation of audio streams according to speaker identities and environmental and channel conditions has become an important preprocessing step for speech recognition, speaker recognition, and audio data mining. In most previous approaches, the automatic segmentation was evaluated in terms of the performance of the final system, like the word error rate for speech recognition systems. In many applications, like online audio indexing, and information retrieval systems, the actual boundaries of the segments are required. We present an approach based on the cumulative sum (CuSum) algorithm for automatic segmentation which minimizes the missing probability for a given false alarm rate. We compare the CuSum algorithm to the Bayesian information criterion (BIC) algorithm, and a generalization of the Kolmogorov-Smirnov test for automatic segmentation of audio streams. We present a two-step variation of the three algorithms which improves the performance significantly. We present also a novel approach that combines hypothesized boundaries from the three algorithms to achieve the final segmentation of the audio stream. Our experiments, on the 1998 Hub4 broadcast news, show that a variation of the CuSum algorithm significantly outperforms the other two approaches and that combining the three approaches using a voting scheme improves the performance slightly compared to using the a two-step variation of the CuSum algorithm alone.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.