Abstract
In the context of cancer, internal “checkerboard” structures are normally found in the matrices of gene expression data, which correspond to genes that are significantly up- or down-regulated in patients with specific types of tumors. In this paper, we propose a novel method, called dual graph-regularization principal component analysis (DGPCA). The main innovation of this method is that it simultaneously considers the internal geometric structures of the condition manifold and the gene manifold. Specifically, we obtain principal components (PCs) to represent the data and approximate the cluster membership indicators through Laplacian embedding. This new method is endowed with internal geometric structures, such as the condition manifold and gene manifold, which are both suitable for bi-clustering. A closed-form solution is provided for DGPCA. We apply this new method to simultaneously cluster genes and conditions (e.g., different samples) with the aim of finding internal “checkerboard” structures on gene expression data, if they exist. Then, we use this new method to identify regulatory genes under the particular conditions and to compare the results with those of other state-of-the-art PCA-based methods. Promising results on gene expression data have been verified by extensive experiments.
Highlights
1.1 Biological analysis of principal component analysis (PCA)With the development of molecular biology, the gene chip has become one of the most important technologies of gene functional annotation in the post-genomic era [1]
On the left is the checkerboard structure of the leukemia data, where each column corresponds to a sample; in the center are the principal directions; and on the right are the projected samples in the new subspace
In heat map (a), the arrangement of the 38 samples is generally based on the three types of labels: acute myelogenous leukemia (AML), T- and B-cells
Summary
With the development of molecular biology, the gene chip has become one of the most important technologies of gene functional annotation in the post-genomic era [1]. Without losing the original data, principal component analysis (PCA) transforms the data to a low-dimensional linear or nearly linear subspace constituted by principal components (PCs) [3]. This method overcomes the limitations of bioinformatics methods in gene chip analysis and provides new. The selected information simplifies the complexity of the gene chip variable and clusters the obtained data. This method provides the basis for early diagnosis and subtyping of cancer
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.