Abstract
This paper studies the segmentation and clustering of speaker speech. In order to improve the accuracy of speech endpoint detection, the traditional double-threshold short-time average zero-crossing rate is replaced by a better spectrum centroid feature, and the local maxima of the statistical feature sequence histogram are used to select the threshold, and a new speech endpoint detection algorithm is proposed. Compared with the traditional double-threshold algorithm, it effectively improves the detection accuracy and antinoise in low SNR. The k-means algorithm of conventional clustering needs to give the number of clusters in advance and is greatly affected by the choice of initial cluster centers. At the same time, the self-organizing neural network algorithm converges slowly and cannot provide accurate clustering information. An improved k-means speaker clustering algorithm based on self-organizing neural network is proposed. The number of clusters is predicted by the winning situation of the competitive neurons in the trained network, and the weights of the neurons are used as the initial cluster centers of the k-means algorithm. The experimental results of multiperson mixed speech segmentation show that the proposed algorithm can effectively improve the accuracy of speech clustering and make up for the shortcomings of the k-means algorithm and self-organizing neural network algorithm.
Highlights
Speech segmentation is an essential basic work in speech recognition and speech synthesis, and its quality has a huge impact on the follow-up speech recognition
In order to improve the accuracy of speech endpoint detection, this paper proposes a new speech endpoint detection algorithm, which replaces the traditional double-threshold short-time average zero-crossing rate with a better spectral centroid feature, smoothes the feature curve by median filter, and selects the threshold value by counting the local maxima of the feature sequence histogram
E k-means algorithm has the advantages of convenient, fast calculation, and accurate results, but it needs to give the number of clusters in advance, and the results are greatly affected by the choice of the initial cluster center, so it is easy to fall into local optimum. e self-organizing neural network (SOM) has the advantages of strong explanatory, strong learning ability, and visualization, but the convergence speed is slow, and it cannot provide accurate clustering information, clustering accuracy for nonlarge volume of samples is poor
Summary
An Improved Speech Segmentation and Clustering Algorithm Based on SOM and K-Means. is paper studies the segmentation and clustering of speaker speech. An Improved Speech Segmentation and Clustering Algorithm Based on SOM and K-Means. E k-means algorithm of conventional clustering needs to give the number of clusters in advance and is greatly affected by the choice of initial cluster centers. The self-organizing neural network algorithm converges slowly and cannot provide accurate clustering information. An improved k-means speaker clustering algorithm based on self-organizing neural network is proposed. E number of clusters is predicted by the winning situation of the competitive neurons in the trained network, and the weights of the neurons are used as the initial cluster centers of the k-means algorithm. E experimental results of multiperson mixed speech segmentation show that the proposed algorithm can effectively improve the accuracy of speech clustering and make up for the shortcomings of the k-means algorithm and self-organizing neural network algorithm An improved k-means speaker clustering algorithm based on self-organizing neural network is proposed. e number of clusters is predicted by the winning situation of the competitive neurons in the trained network, and the weights of the neurons are used as the initial cluster centers of the k-means algorithm. e experimental results of multiperson mixed speech segmentation show that the proposed algorithm can effectively improve the accuracy of speech clustering and make up for the shortcomings of the k-means algorithm and self-organizing neural network algorithm
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.