Supervised Discriminative Sparse PCA for Com-Characteristic Gene Selection and Tumor Classification on Multiview Biological Data.

Chun-Mei Feng,Chun-Hou Zheng,Jin-Xing Liu,Ying-Lian Gao,Yong Xu

doi:10.1109/tnnls.2019.2893190

Abstract

Principal component analysis (PCA) has been used to study the pathogenesis of diseases. To enhance the interpretability of classical PCA, various improved PCA methods have been proposed to date. Among these, a typical method is the so-called sparse PCA, which focuses on seeking sparse loadings. However, the performance of these methods is still far from satisfactory due to their limitation of using unsupervised learning methods; moreover, the class ambiguity within the sample is high. To overcome this problem, this paper developed a new PCA method, which is named the supervised discriminative sparse PCA (SDSPCA). The main innovation of this method is the incorporation of discriminative information and sparsity into the PCA model. Specifically, in contrast to the traditional sparse PCA, which imposes sparsity on the loadings, here, sparse components are obtained to represent the data. Furthermore, via the linear transformation, the sparse components approximate the given label information. On the one hand, sparse components improve interpretability over the traditional PCA, while on the other hand, they are have discriminative abilities suitable for classification purposes. A simple algorithm is developed, and its convergence proof is provided. SDSPCA has been applied to the common-characteristic gene selection and tumor classification on multiview biological data. The sparsity and classification performance of SDSPCA are empirically verified via abundant, reasonable, and effective experiments, and the obtained results demonstrate that SDSPCA outperforms other state-of-the-art methods.

Full Text