Discriminant analysis with Gaussian graphical tree models

Gonzalo Perez-De-La-Cruz,Guillermina Eslava-Gomez

doi:10.1007/s10182-015-0256-6

Abstract

We consider Gaussian graphical tree models in discriminant analysis for two populations. Both the parameters and the structure of the graph are assumed to be unknown. For the estimation of the parameters maximum likelihood is used, and for the estimation of the structure of the tree graph we propose three methods; in these, the function to be optimized is the J-divergence for one and the empirical log-likelihood ratio for the two others. The main contribution of this paper is the introduction of these three computationally efficient methods. We show that the optimization problem of each proposed method is equivalent to one of finding a minimum weight spanning tree, which can be solved efficiently even if the number of variables is large. This property together with the existence of the maximum likelihood estimators for small group sample sizes is the main advantage of the proposed methods. A numerical comparison of the classification performance of discriminant analysis using these methods, as well as three other existing ones, is presented. This comparison is based on the estimated error rates of the corresponding plug-in allocation rules obtained from real and simulated data. Diagonal discriminant analysis is considered as a benchmark, as well as quadratic and linear discriminant analysis whenever the sample size is sufficient. The results show that discriminant analysis with Gaussian tree models, using these methods for selecting the graph structure, is competitive with diagonal discriminant analysis in high-dimensional settings.

Full Text