Efficient audio stream segmentation via the combined T/sup 2/ statistic and Bayesian information criterion

Bowen Zhou Bowen Zhou,J.H.L Hansen

doi:10.1109/tsa.2005.845790

Abstract

In many speech and audio applications, it is first necessary to partition and classify acoustic events prior to voice coding for communication or speech recognition for spoken document retrieval. In this paper, we propose an efficient approach for unsupervised audio stream segmentation and clustering via the Bayesian Information Criterion (BIC). The proposed method extends an earlier formulation by Chen and Gopalakrishnan. In our formulation, Hotelling's T/sup 2/-Statistic is used to pre-select candidate segmentation boundaries followed by BIC to perform the segmentation decision. The proposed algorithm also incorporates a variable-size increasing window scheme and a skip-frame test. Our experiments show that we can improve the final algorithm speed by a factor of 100 compared to that in Chen and Gopalakrishnan's while achieving a 6.7% reduction in the acoustic boundary miss rate at the expense of a 5.7% increase in false alarm rate using DARPA Hub4 1997 evaluation data. The approach is particularly successful for short segment turns of less than 2 s in duration. The results suggest that the proposed algorithm is sufficiently effective and efficient for audio stream segmentation applications.

Full Text