Abstract

This paper studies the segmentation and clustering of speaker speech. In order to improve the accuracy of speech endpoint detection, the traditional double-threshold short-time average zero-crossing rate is replaced by a better spectrum centroid feature, and the local maxima of the statistical feature sequence histogram are used to select the threshold, and a new speech endpoint detection algorithm is proposed. Compared with the traditional double-threshold algorithm, it effectively improves the detection accuracy and antinoise in low SNR. The k-means algorithm of conventional clustering needs to give the number of clusters in advance and is greatly affected by the choice of initial cluster centers. At the same time, the self-organizing neural network algorithm converges slowly and cannot provide accurate clustering information. An improved k-means speaker clustering algorithm based on self-organizing neural network is proposed. The number of clusters is predicted by the winning situation of the competitive neurons in the trained network, and the weights of the neurons are used as the initial cluster centers of the k-means algorithm. The experimental results of multiperson mixed speech segmentation show that the proposed algorithm can effectively improve the accuracy of speech clustering and make up for the shortcomings of the k-means algorithm and self-organizing neural network algorithm.

Highlights

  • Speech segmentation is an essential basic work in speech recognition and speech synthesis, and its quality has a huge impact on the follow-up speech recognition

  • In order to improve the accuracy of speech endpoint detection, this paper proposes a new speech endpoint detection algorithm, which replaces the traditional double-threshold short-time average zero-crossing rate with a better spectral centroid feature, smoothes the feature curve by median filter, and selects the threshold value by counting the local maxima of the feature sequence histogram

  • E k-means algorithm has the advantages of convenient, fast calculation, and accurate results, but it needs to give the number of clusters in advance, and the results are greatly affected by the choice of the initial cluster center, so it is easy to fall into local optimum. e self-organizing neural network (SOM) has the advantages of strong explanatory, strong learning ability, and visualization, but the convergence speed is slow, and it cannot provide accurate clustering information, clustering accuracy for nonlarge volume of samples is poor

Read more

Summary

Research Article

An Improved Speech Segmentation and Clustering Algorithm Based on SOM and K-Means. is paper studies the segmentation and clustering of speaker speech. An Improved Speech Segmentation and Clustering Algorithm Based on SOM and K-Means. E k-means algorithm of conventional clustering needs to give the number of clusters in advance and is greatly affected by the choice of initial cluster centers. The self-organizing neural network algorithm converges slowly and cannot provide accurate clustering information. An improved k-means speaker clustering algorithm based on self-organizing neural network is proposed. E number of clusters is predicted by the winning situation of the competitive neurons in the trained network, and the weights of the neurons are used as the initial cluster centers of the k-means algorithm. E experimental results of multiperson mixed speech segmentation show that the proposed algorithm can effectively improve the accuracy of speech clustering and make up for the shortcomings of the k-means algorithm and self-organizing neural network algorithm An improved k-means speaker clustering algorithm based on self-organizing neural network is proposed. e number of clusters is predicted by the winning situation of the competitive neurons in the trained network, and the weights of the neurons are used as the initial cluster centers of the k-means algorithm. e experimental results of multiperson mixed speech segmentation show that the proposed algorithm can effectively improve the accuracy of speech clustering and make up for the shortcomings of the k-means algorithm and self-organizing neural network algorithm

Introduction
Unmark voice start
Postprocessing stage
Primitive Algorithm Improved Algorithm
Error analysis
Competitive layer
Input layer
Comparative analysis and draw the conclusion
Findings
Categories c b a a
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call