As a reasonable statistical learning model for curve clustering analysis, the two-layer mixtures of Gaussian process functional regressions (TMGPFR) model has been developed to fit the data of sample curves from a number of independent information sources or stochastic processes. Since the sample curves from a certain stochastic process naturally form a curve cluster, the model selection of TMGPFRs, i.e., the selection of the number of mixtures of Gaussian process functional regressions (MGPFRs) in the upper layer, corresponds to the discovery of the cluster number and structure of the curve data. In fact, this is rather challenging because the conventional model selection criteria, such as BIC and cross-validation, cannot lead to a stable result in practice even with a heavy burden of repetitive computation. In this paper, we improve the original TMGPFR model and propose a Bayesian Ying-Yang (BYY) annealing learning algorithm for the parameter learning of the improved model with automated model selection. The experimental results of both synthetic and realistic datasets demonstrate that our proposed algorithm can make correct model selection automatically during parameter learning of the model.
Read full abstract