Abstract

In the last decade, the renaissance of interest in discriminant analysis has been primarily motivated by possible applications to tumor classification using highdimensional microarray-based data. In this thesis, we do three things: 1. First, we introduce a new regularizing covariance estimation procedure we refer to as SHIP: SHrinking and Incorporating Prior knowledge. The resulting covariance estimator is based on the shrinkage estimator by Ledoit and Wolf [31, 33, 32], but additionally incorporates prior knowledge on gene functional groups extracted from the database KEGG. In order to integrate this knowledge into the shrinkage estimator, we develop multiple options. Instead of using a standard cross-validation procedure for determining the optimal shrinkage intensity, we determine it analytically as introduced by Ledoit and Wolf. 2. Second, we propose a variant of regularized linear discriminant analysis. This method generalizes the idea of the shrinkage estimator from above into the linear discriminant analysis (LDA). 3. Third, we apply our method to public gene expression data sets and examine the classification performance in both the binary and the c-nary case, where c > 2. We choose the diagonal linear discriminant analysis and the nearest shrunken centroids method [15] as competitors. It is shown that the rlda.TG one of our variants of LDA ‘via the SHIP’ performs well in all classification problems and even outperforms, albeit marginally, the competitors in some situations. Unexpectedly, we find that another variant of LDA which is based on the shrinkage estimator by Ledoit and Wolf and which does not incorporate any biological knowledge is as competitive as the rlda.TG.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call