The paper describes the use of an unsupervised learning method based on self-organizing incremental neural networks for the problem of speaker clustering. I uses a set of mel-frequency cepstral coefficients as a user model. This set is obtained by applying a special filter to the sound signal frequency, which was transferred into the mel-frequency scale (mel is an abbreviation of “melody”). The main difference of this work is the consideration of the dynamics of mel-frequency cepstral coefficients changing, which also contains information about the user. The possibility of new unique users emergence in the system while operating makes it impossible to use the major-ity of neural network classes, because learning on a new data set will lead to malfunction, “forgetting” of prior learn-ing. Neural networks for on-line learning impose a limit on the maximum number of clusters, that is unknown for this problem, and, in general, they require a priori knowledge of the input data (to establish thresholds, etc.) that is difficult to achieve in practice. Self-organizing incremental neural networks allow lifetime learning, that means learning during the operation stage, and do not require any a priori knowledge about the users or their quantity. A dynamic neural net-work structure makes it possible to create an unlimited number of new clusters for new previously unregistered users. Thus, this method allows building a flexible speaker clustering system that adapts itself to the changing input data.
Read full abstract