Abstract

The conventional speaker change detection (SCD) method using Bayesian Information Criterion (BIC) has been widely used. However, its performance relies on the choice of penalty factor and suffers from mass calculation. The twostep SCD is less time consuming but generates more detection errors. The limitation of conventional method’s performance originates from the two adjacent data windows. We propose a strategy that inserts an interval between the two adjacent fixed-size data windows in each analysis window. The dissimilarity value between the data windows is regarded as the probability of a speaker identity change within the interval area. Then this analysis window is slid along the audio by a large step to locate the areas where speaker change points may appear. Afterwards we only focus on these areas and locate precisely where the change points are. Other areas where a speaker change point unlikely appears are abandoned. The proposed method is computationally efficient and more robust to noise and penalty factor compared with conventional method. Evaluated on the corpus of China Central Television (CCTV) news, the proposed method obtains 74.18% reduction in calculation time and 22.24% improvement in F1-measure compared with the conventional approach.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call