A Method of Combining Gaussian Mixture Model and K-Means for Automatic Audio Segmentation of Popular Music

Ing-Jr Ding

doi:10.1007/978-94-007-5860-5_94

Abstract

In this study, a hybrid scheme that combines Gaussian mixture model (GMM) and the k-means approach, called GMM-kmeans, is proposed for automatic audio segmentation (AAS) of popular music. Generally, the structure of a popular music is composed of verse, chorus and non-repetitive (such as intro, bridge and outro) segments. The combined GMM-kmeans scheme including mainly two developed algorithms, GMMAAS and SFS, will efficiently divide a song into these three parts. In GMM-kmeans, the GMM classifier is to recognize the vocal segments and then calculate the section boundary between them and non-repetitive sections first. The song with vocal segments extracted by GMM, containing only the remaining verse and chorus sections, is then analyzed by the k-means clustering algorithm where the verse section is further discriminated from the chorus section. In classification of verse and chorus by k-means, the developed switching frame search (SFS) algorithm with the devise of verse group-of-frames (Verse-GoF) and Chorus-GoF will accurately estimate the separation boundary of verse and chorus sections. Experimental results obtained from a musical data set of numerous Chinese popular songs show the superiority of both proposed GMMAAS and SFS.

Full Text